Annotating genomes
Table of contents
Slides: Annotating genomes
Activity 1: Review of command-line
Activity 1
See Command line basics from yesterday.
Activity 2: Annotating your phage genome
Rationale
Now that we have whole genome assemblies, we can find coding sequences in our phage’s genome, and annotate what we know about the proteins we find.
Activity 2.1
- Generating the annotation
- Open the Terminal Preview app
- Write:
bash
- Go to the folder that has your assembly:
cd <name_of_your_folder> cd assembly
- Write:
ls
- You should see the file
assembly.fasta
.
- You should see the file
- To enable
prokka
, write:conda activate prokka
- To look at the options, write:
prokka -h
- What do the following options do:
--prefix
--hmms
- What do the following options do:
- Finally, to run the annotation you should write:
prokka --prefix <name_of_your_sample> --hmms phrogs/all_phrogs.hmm --compliant assembly.fasta
- Wait for the software to finish annotating
- Once it’s done, a new folder should appear in your
assembly
folder. - Open the folder in the file browser and look for a file ending in
.gbk
- It will be
<name_of_your_sample>.gbk
- It will be
- Right-click on that file and open it with Sublime Text
- Does it look familiar?
- Why is the organism data wrong?
Activity 2.2
- Visualizing your genome
- Now, open the program UGENE
- Click on Open File(s), and select your
<name_of_your_sample>.gbk
file.- What do you see? Let’s review the main sections of this program.
- Go to the bottom panel of the window, click on the arrow
>
next to gene - Right-click on the first gene and click Disable gene highlighting
- Click on the arrow
>
next to CDS- Remember what CDS means?
- What is the number in parenthesis next to CDS? (Take notes!)
- Click on the Show circular view button (top of the window)
- Click on the Annotations Highlighting button (right side of the window)
- In the text box under Show value of qualifier erase the contents and write: product
- Click on the Circular View Settings button (right side of the window)
- Make the font bigger for the Title and Annotations
- Explore your annotation
- How many proteins does your whole genome assembly have in total?
- How many are “hypothetical protein”?
- How many are not?
- Save the genome map by clicking on the Camera icon (left side of the window)
- Disable the two check boxes
- Select format PNG
- Make it at least 3000px x 3000px
- Save it in the folder that you created in the Desktop
- Click on the Statistics button (right side of the window)
- Record length, GC content
Keep UGENE open, we will need it for Activity 3.
Activity 3: Building a phylogenetic tree
Rationale
To learn more about how our phage relates to other phages, we are going to build a small phylogenetic tree. These trees represent relationships between organisms, and we are going to use a single protein of our phage, look for related proteins with BLAST
, and build a tree using the program seaview
Activity 3.1
To get a sense of how trees are built, we will have an activity with candy!
Activity 3.2
- Go back to UGENE and with the help of the map, look for the capsid protein. (It might also be called major head protein, coat protein)
- Click on the arrow representing the protein in the map. It should get highlighted.
- Then, right-click on the protein, and select
Copy/Paste
>Copy annotation amino acids
- Open Sublime Text and write:
> myphage
- Press enter to make a new line, and paste the protein sequence you had copied.
- Save this file in your folder in the
Desktop
and call itcapsid_proteins.fasta
- Open the BLAST website and click on Protein BLAST
- In the query box, paste the protein sequence you have in Sublime Text. Click BLAST.
- Look at your results.
- Uncheck the select all box
- Select 5-10 interesting results by clicking on the selection boxes to the left.
- Click on the Alignments tab, and click the
Download
>FASTA (aligned sequences)
option. - This will download a file called
seqdump.txt
into yourDownloads
folder. Open this file. - Copy the contents of
seqdump.txt
and paste them into yourcapsid_proteins.fasta
file. - Save the file again.
- Open the software seaview
- Open the
capsid_proteins.fasta
file, by selectingFile
>Open
and select your file. - Align your sequences by clicking on
Align
>Align all
. When it’s done, click OK.- What happened to your sequences?
- Build a tree by clicking on
Trees
>PhyML
and then clickingRun
. When it’s done, click OK- Where did your protein (
myphage
) land?
- Where did your protein (
- Save your tree by clickin on
File
>Save as PDF
. Save it in your folder in theDesktop