1 Introduction

In this session we will continue annotating sequences, but now with a focus on transcripts, which can be coding and non-coding. In fact, the same gene can encode both coding and non-coding transcripts (Poliseno, Lanza, and Pandolfi 2024):

TBD including short and long reads, assembly with/wo ref, trinity, stringtie, spatial transcriptomics, expression dbs, etc

2 Transcript-based pangenome analysis

We will follow the introduction and protocol at:

https://eead-csic-compbio.github.io/get_homologues/plant_pangenome/protocol.html

For the exercises you will need

Note that the software GET_HOMOLOGUES-EST is a part of the GET_HOMOLOGUES package, which you installed in session 2.

3 Coexpression network analysis

The files needed this for this session are:

Note that Rmd files are to be opened with Rstudio.

3.1 Your report

Please make a folder named ‘transcripts/’ in the same GitHub repo of session 1, and write a brief report. See more recommendations here.

  • If you did the pantranscriptome analysis the report should include:
    • A simulation of pan leaf pantranscriptome after comparing 14 barleys, similar to Fig. 6 in the GET_HOMOLOGUES-EST protocol
    • A table of Pfam domains enriched in core and accessory transcripts
    • An overall recapitulation of the analysis.
  • If you perform the co-expression network analysis, it should include:
    • How you solved the exercises.
    • The results obtained.
    • The main specific inputs and outputs used at each step.
    • The problems you encountered.
    • Briefly propose an experiment in which these analyses could be applied.

Bibliography

Poliseno, Laura, Martina Lanza, and Pier Paolo Pandolfi. 2024. “Coding, or Non-Coding, That Is the Question.” Cell Research 34 (9): 609—629. https://doi.org/10.1038/s41422-024-00975-8.