View on GitHub

bioinformatics

Bioinformatics resources and best-practices for beginners

Bioinformatics resources and best-practices for plant breeders

This material is maintained by Najla Ksouri, Chesco Montardit, Rubén Sancho, Ernesto Igartua, Ricardo Ramírez González, Bruno Contreras Moreira and the Ensembl outreach team

Summary

Here we review some bioinformatics resources and databases which can be useful in plant breeding and genomics. We will use both standalone and Web-based tools and will also review reproducible analysis practices and software benchmarks. Test data used in sessions 1-5 can be obtained from https://github.com/eead-csic-compbio/bioinformatics.

Docker image

A Docker image is available with most of the software used in the sessions, excluding R, which we expect to be installed elsewhere. After installing Docker, it can be run as follows, note that you might require sudo:

docker pull csicunam/bioinformatics_iamz

# persistent folder for results files
mkdir $HOME/vep_data 
chmod a+w $HOME/vep_data

sudo docker run -t -i -v $HOME/vep_data:/data -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=$DISPLAY csicunam/bioinformatics_iamz:latest

Contents

session title required time URL
1 Annotation of coding sequences 4h session 1
2 Analysis of non-coding sequences 4h session 2
3 Analysis of transcriptomes 4h session 3
4 Benchmarks 2h session 4
5 Mapping, variant calling & effect prediction 6h session 5 , session 5a
6 Genotyping 3h session 6
7 Genome-Wide Association Analysis 2h session 7
R Reproducible analysis practices 2h session R

Exercises and report

Each session contains exercises (Exe1, Exe2, …) which you can solve and document in a report. When we teach this material, we ask students to create a GitHub repository, with a dedicated folder per session, explaining how each exercise was solved adding code, comments, even figures, and literature references if needed. Moreover, any AI resources used in your work, such as ChatGPT, should be properly cited and the relevant queries included in the report. Markdown is learned quickly and is a good format for reports, as it is supported by GitHub.

The idea is that students log their work as they go, as opposed to uploading a set of solutions on the last day, so that the thought process and progress is visible, as well as challenges. The resulting repo can be evaluated by teachers but also serves as a portfolio of skills and knowledge for potential employers.

session R provides examples on setting up a GitHub repository and using Git for version control, and also on the slightly advanced Rmarkdown format.

More resources

We post regularly about these and related bioinformatics topics at the #!/perl/bioinfo blog, mostly in Spanish.

Check also this course to learn how to script in Linux.