Annotating genomes

Table of contents
  1. Slides: Annotating genomes
  2. Activity 1: Review of command-line
  3. Activity 2: Annotating your phage genome
  4. Activity 3: Building a phylogenetic tree

Slides: Annotating genomes

Activity 1: Review of command-line

Activity 1

See Command line basics from yesterday.

Activity 2: Annotating your phage genome

Rationale

Now that we have whole genome assemblies, we can find coding sequences in our phage’s genome, and annotate what we know about the proteins we find.

Activity 2.1

  • Generating the annotation
  1. Open the Terminal Preview app
  2. Write:
    bash
    
  3. Go to the folder that has your assembly:
    cd <name_of_your_folder>
    cd assembly
    
  4. Write: ls
    • You should see the file assembly.fasta.
  5. To enable prokka, write:
    conda activate prokka
    
  6. To look at the options, write:
    prokka -h
    
    • What do the following options do:
      • --prefix
      • --hmms
  7. Finally, to run the annotation you should write:
    prokka --prefix <name_of_your_sample> --hmms phrogs/all_phrogs.hmm --compliant assembly.fasta
    
  8. Wait for the software to finish annotating
  9. Once it’s done, a new folder should appear in your assembly folder.
  10. Open the folder in the file browser and look for a file ending in .gbk
    • It will be <name_of_your_sample>.gbk
  11. Right-click on that file and open it with Sublime Text
    • Does it look familiar?
    • Why is the organism data wrong?

Activity 2.2

  • Visualizing your genome
  1. Now, open the program UGENE
  2. Click on Open File(s), and select your <name_of_your_sample>.gbk file.
    • What do you see? Let’s review the main sections of this program.
  3. Go to the bottom panel of the window, click on the arrow > next to gene
  4. Right-click on the first gene and click Disable gene highlighting
  5. Click on the arrow > next to CDS
    • Remember what CDS means?
    • What is the number in parenthesis next to CDS? (Take notes!)
  6. Click on the Show circular view button (top of the window)
  7. Click on the Annotations Highlighting button (right side of the window)
    • In the text box under Show value of qualifier erase the contents and write: product
  8. Click on the Circular View Settings button (right side of the window)
  9. Make the font bigger for the Title and Annotations
  10. Explore your annotation
    • How many proteins does your whole genome assembly have in total?
    • How many are “hypothetical protein”?
    • How many are not?
  11. Save the genome map by clicking on the Camera icon (left side of the window)
    • Disable the two check boxes
    • Select format PNG
    • Make it at least 3000px x 3000px
    • Save it in the folder that you created in the Desktop
  12. Click on the Statistics button (right side of the window)
    • Record length, GC content

Keep UGENE open, we will need it for Activity 3.

Activity 3: Building a phylogenetic tree

Rationale

To learn more about how our phage relates to other phages, we are going to build a small phylogenetic tree. These trees represent relationships between organisms, and we are going to use a single protein of our phage, look for related proteins with BLAST, and build a tree using the program seaview

Activity 3.1

To get a sense of how trees are built, we will have an activity with candy!

Activity 3.2

  1. Go back to UGENE and with the help of the map, look for the capsid protein. (It might also be called major head protein, coat protein)
  2. Click on the arrow representing the protein in the map. It should get highlighted.
  3. Then, right-click on the protein, and select Copy/Paste>Copy annotation amino acids
  4. Open Sublime Text and write:
    > myphage
    
  5. Press enter to make a new line, and paste the protein sequence you had copied.
  6. Save this file in your folder in the Desktop and call it capsid_proteins.fasta
  7. Open the BLAST website and click on Protein BLAST
  8. In the query box, paste the protein sequence you have in Sublime Text. Click BLAST.
  9. Look at your results.
  10. Uncheck the select all box
  11. Select 5-10 interesting results by clicking on the selection boxes to the left.
  12. Click on the Alignments tab, and click the Download>FASTA (aligned sequences) option.
  13. This will download a file called seqdump.txt into your Downloads folder. Open this file.
  14. Copy the contents of seqdump.txt and paste them into your capsid_proteins.fasta file.
  15. Save the file again.
  16. Open the software seaview
  17. Open the capsid_proteins.fasta file, by selecting File>Open and select your file.
  18. Align your sequences by clicking on Align>Align all. When it’s done, click OK.
    • What happened to your sequences?
  19. Build a tree by clicking on Trees>PhyML and then clicking Run. When it’s done, click OK
    • Where did your protein (myphage) land?
  20. Save your tree by clickin on File>Save as PDF. Save it in your folder in the Desktop