This is a template project following the guidelines from organizing-your-work.
- JHPCE location:
/dcs04/lieber/yourTeam/someProject_LIBDcode/yourRepositorysuch as/dcs04/lieber/lcolladotor/spatialDLPFC_LIBD4035/spatialDLPFC. This template is located at/dcs04/lieber/lcolladotor/_jhpce_org_LIBD001/template_project.
For realistic examples, check:
- LieberInstitute/ranger_metrics
- LieberInstitute/Visium_IF_AD
- LieberInstitute/DLPFC_snRNAseq
- LieberInstitute/spatialDLPFC
template_project.Rproj: an RStudio project file with some non-default settings that we use frequently. They protect us from making some hard to reproduce errors (involving.RDatafiles) as well as ensure that we can work from winOS (Microsoft Windows) and macOS (Apple) without issues. These are:- Restore .RData into workspace at startup:
No - Save workspace to .RData on exit:
No - Always save history (even if not saving .RData):
No- These 3 options help prevent some irreproducible results/errors.
- Tab width:
4- Default tab width used in Bioconductor projects.
- Line ending conversion:
Posix (LF).- Selecting this option makes it easy to work with the same repo on both winOS and linux/macOS.
- Restore .RData into workspace at startup:
raw-data: example organization for theraw-datafiles.- This is data that is not produced by any code in this repository and that we should back up. Using the same
raw-datalocation in every project makes it easy for us to identify directories we need to back up across all projects. - Contains example
.gitignoreandREADME.mdfiles.
- This is data that is not produced by any code in this repository and that we should back up. Using the same
code: example organization for thecodefiles.- Contains example files for 2 analysis steps and 3 example R scripts.
code/run_all.sh: example bash script for re-running all analyses in this project.code/update_style.R: common R script we use for automatically styling all code we write, which makes it easier to read across projects. It usesstylerandbiocthisto accomplish this.
processed-data: example organization for theprocessed-datafiles.- Any data that can be recreated with the contents from
code(which are version controlled) andraw-data(which ideally should be backed up). It might take some time to recreate this data, but in theory we should have the resources to do this. - We version control small output summary files (typically < 25 Mb), such as files that end up being supplementary files of a paper.
- Any data that can be recreated with the contents from
plots: example organization for theplotsfiles.- We typically only version control small plot files (about < 25 Mb). We can then share the specific plot links on Slack or elsewhere, such that if we later update the plot(s), the links will still work. We can also see how they changed over time thanks to having version controlled them.
- Create a new repository (
yourRepository) following the instructions from GitHub on creating a repository from a template. - Rename
template_project.Rprojto match the name of the new repository you created (yourRepository.RprojforyourRepository) - Update the JHPCE location of the clone you made of
yourRepository- The
yourTeamportion will likely belcolladotorormarmaypagor some other LIBD team. - Ask which LIBD code this project belongs to in order to figure out the
someProject_LIBDcode
- The
- Update the JHPCE permissions of your user group. For example, by following the instructions on the
#libd_jhu_spatialSlack private channel ororganizing-your-work.html#dcs04-scripts(public).- Summary:
sh /dcs04/lieber/lcolladotor/_jhpce_org_LIBD001/update_permissions_spatialteam.sh /dcs04/lieber/yourTeam/someProject_LIBDcode/yourRepository
- Summary:
- Add any
raw-datarelated to your project.- See
raw-data/FASTQ/README.mdfor information about creating soft links (symbolic links) for external raw-data files. - See
raw-data/FASTQ/.gitignorefor information about making very specific.gitignorefiles. - See
raw-data/sample_info/README.mdabout where to document sample sheets (typically Excel files) we use for sequencing orders with the JHU Single Cell & Transcriptomics Core.
- See
- Start your first code directory, like
code/01_something. Typically this first step reads in data fromraw-dataand imports it into R.- See
code/01_read_data_to_r/01_read_data_to_r.Ras an example R script.
- See
- Use
slurmjobs::job_single()to create a companion shell script for your R script, such that you can usesbatchto run it at JHPCE.- See
code/01_read_data_to_r/01_read_data_to_r.shas an example bash script created withslurmjobs::job_single().
- See
- Edit the
code/run_all.shscript that specifies how you can re-run all analyses.- This file is useful for reproducibility and for cases when we do need to re-run all or part of the analyses. For example, after a bug fix in a package the analysis depends on.
- This file also acts as a detailed
README.md.
- Once you have read your data into R, you will start having scripts that depend on the output of previous ones. See
code/02_boxplotsfor an example of this case.code/02_boxplots/01_ggpairs.Ruses the data created in the previous step. Now how01_ggpairs.Rstarts again at01since this is the first script on this second analysis code step (02_boxplots).
Good luck!! If you have any questions about how to organize files, feel free to schedule a Data Science guidance session (DSgs) with any team member.