Bioinformatics resources and best-practices for plant breeders
This material is maintained by Najla Ksouri, Chesco Montardit, Rubén Sancho, Ernesto Igartua, Ricardo Ramírez González, Bruno Contreras Moreira and the Ensembl outreach team
Summary
Here we review some bioinformatics resources and databases which can be useful in plant breeding and genomics. We will use both standalone and Web-based tools and will also review reproducible analysis practices and software benchmarks. Test data used in sessions 1-5 can be obtained from https://github.com/eead-csic-compbio/bioinformatics.
Docker image
A Docker image is available with most of the software used in the sessions, excluding R, which we expect to be installed elsewhere. After installing Docker, it can be run as follows, note that you might require sudo:
docker pull csicunam/bioinformatics_iamz
# persistent folder for results files
mkdir $HOME/vep_data
chmod a+w $HOME/vep_data
sudo docker run -t -i -v $HOME/vep_data:/data -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=$DISPLAY csicunam/bioinformatics_iamz:latest
Contents
session | title | required time | URL |
---|---|---|---|
1 | Annotation of coding sequences | 4h | session 1 |
2 | Analysis of non-coding sequences | 4h | session 2 |
3 | Analysis of transcriptomes | 4h | session 3 |
4 | Benchmarks | 2h | session 4 |
5 | Mapping, variant calling & effect prediction | 6h | session 5 , session 5a |
6 | Genotyping | 3h | session 6 |
7 | Genome-Wide Association Analysis | 2h | session 7 |
R | Reproducible analysis practices | 2h | session R |
Exercises and report
Each session contains exercises (Exe1, Exe2, …) which you can solve and document in a report. When we teach this material, we ask students to create a GitHub repository, with a dedicated folder per session, explaining how each exercise was solved adding code, comments, even figures, and literature references if needed. Moreover, any AI resources used in your work, such as ChatGPT, should be properly cited and the relevant queries included in the report. Markdown is learned quickly and is a good format for reports, as it is supported by GitHub.
The idea is that students log their work as they go, as opposed to uploading a set of solutions on the last day, so that the thought process and progress is visible, as well as challenges. The resulting repo can be evaluated by teachers but also serves as a portfolio of skills and knowledge for potential employers.
session R provides examples on setting up a GitHub repository and using Git for version control, and also on the slightly advanced Rmarkdown format.
More resources
We post regularly about these and related bioinformatics topics at the #!/perl/bioinfo blog, mostly in Spanish.
Check also this course to learn how to script in Linux.