1 Introduction

In this session we will continue annotating sequences, but now with a focus on transcripts, which can be coding and non-coding. In fact, the same gene can encode both coding and non-coding transcripts (Poliseno, Lanza, and Pandolfi 2024):

TBD including short and long reads, assembly with/wo ref, trinity, stringtie, spatial transcriptomics, expression dbs, etc

2 Transcript-based pangenome analysis

We will follow the GET_HOMOLOGUES-EST protocol at:

https://eead-csic-compbio.github.io/get_homologues/plant_pangenome/protocol.html

3 Coexpression networks

3.1 Your report

Your work is to run both protocols and write a brief report with:

  • a summary of the pantranscriptome analysis, including
    • A simulation of pan leaf pantranscriptome after comparing 14 barleys, similar to Fig. 6 in the GET_HOMOLOGUES-EST protocol
    • A table of Pfam domains enriched in core and accessory transcripts
    • An overall recapitulation of the analysis.
  • ..

Please make a folder named ‘transcripts/’ in the same GitHub repo of session 1, and write a report explaining i) how you solved the exercises, explaining the results, and ii) the problems you faced. See more recommendations here.

References

Poliseno, Laura, Martina Lanza, and Pier Paolo Pandolfi. 2024. “Coding, or Non-Coding, That Is the Question.” Cell Research 34 (9): 609—629. https://doi.org/10.1038/s41422-024-00975-8.