Assembling whole phage genomes

Table of contents
  1. Slides: Assembling genomes
  2. Activity 1: Command line basics
  3. Activity 2: Exploring your read files from the command line
  4. Activity 3: Assembling your genome with Unicycler
  5. Activity 4: Looking at your assembled genome

Slides: Assembling genomes

Activity 1: Command line basics

Rationale

We will learn the basics of how to use the terminal to give direct instructions to the computer.

Activity 1

  • Follow along and take notes in your printed cheat sheet:
  1. Open the Terminal Preview app
    • (Make the window half of the size of the screen so that you can see the Desktop)
  2. Write:
    bash
    
  3. Write
    ls
    
    • What shows up in the Terminal?
    • Take notes on your cheat sheet.
  4. Write:
    mkdir myfolder
    
    • What showed up in the Desktop?
    • Open myfolder/ by clicking on it on the Desktop.
    • Take notes on your cheat sheet
  5. Write:
    cd myfolder
    
    • What changed in the terminal?
  6. Write:
    ls
    
    • Why happened? Why?
  7. Write
    mkdir myfolder2
    
  8. Write
    ls
    
  9. Write
    touch hello.txt
    
  10. Open hello.txt on the Sublime Text app.
  11. Inside Sublime Text, write: “Hello world!”
  12. Save the file by going to File > Save. You can close Sublime Text now.
  13. Go back to the Terminal Preview
  14. Write
    less hello.txt
    
  15. Exit the less view by typing the letter:
    q
    
  16. Write
    rm text_file.txt
    
  17. Write:
    ls
    
    • What happened to hello.txt?
  18. Write:
    cd ..
    
    • Where did we go?
  19. Write:
    ls
    
  20. You can close the Terminal Preview app now.

Activity 2: Exploring your read files from the command line

Rationale

We will download your read files and learn how to look at your reads in the terminal.

Activity 2

  1. Download your read files from Google Drive
    • Visit the Google Drive and open the folder with your name
    • Click on the little arrow in the top right of the page
    • Click download. This will get saved in your Downloads/ folder
    • Move the folder from your Downloads/ to your Desktop/
    • If the folder is “zipped”, open it and move the folder inside into the Desktop/.
  2. Open the Terminal Preview app.
  3. Write
    ls
    
  4. Write
    cd <name of your folder>
    
    • Here, replace <name of your folder> with the name of the folder you downloaded!!
    • Instructions between angle brackets < like this > mean that you have to write the name of the files YOU have.
  5. Write
    ls
    
    • Do you see your read files?
    • There should be two, one that ends with _R1.sub.fastq.gz and another that ends with _R2.sub.fastq.gz
  6. To make it human readable, write
    gunzip -c <name_of_your_reads>_R1.sub.fastq.gz > reads_R1.fastq
    
    • (You don’t need to remember this one.)
  7. To see it, write:
    less reads_R1.fastq
    
    • What do you see?
    • How long is the first read?
    • You can exit this view by writing q
  8. To know how many reads are in the file, write:
    echo $(cat reads_R1.fastq|wc -l)/4|bc
    
    • This is another “special command” to count the number of reads in our file. (You don’t need to remember this one.)
    • How many reads are in one of your read files?

Activity 3: Assembling your genome with Unicycler

Rationale

We will use a software called Unicycler to assemble our reads into whole genomes.

Activity 3

  1. Go to the class computer
  2. Write
    ls
    
  3. Locate your folder, and enter the folder that contains your reads
    cd <name of your folder>
    
  4. Then write:
    unicycler -h
    
    • This is the software’s “help”. It tells us how to use it.
    • Scroll up and down to see the options.
  5. Now we will run the assembly. Write in a single line:
    unicycler
    -1 <name_of_your_reads>_R1.sub.fastq.gz
    -2 <name_of_your_reads>_R2.sub.fastq.gz
    -o assembly
    
    • What are the options -1 and -2?
    • What is the -o option?
  6. Let your assembly run.

Activity 4: Looking at your assembled genome

Rationale

Once the assembler has run, you should have larger pieces of DNA sequence, which hopefully corresponds to a whole genome sequence.

Activity 4

  1. Download your assembly from Google Docs. Put it inside the folder with your reads.
  2. Open the Terminal Preview app
  3. Write
    bash
    
  4. Go to the folder in the Desktop that contains your reads
    cd <name of your folder>
    
  5. Go to the folder that contains the results from the assembly:
    cd assembly
    
  6. Write:
    ls
    
    • There, you will find many files, but the one we care about is assembly.fasta
  7. The assembly might contain more than one large piece of DNA. Let’s check how many it has:
    grep -c "^>" assembly.fasta
    
    • How many pieces of DNA does your assembly contain?
    • (This is another special command you don’t need to remember.)
  8. To extract only the first piece. Write
    cat assembly.fasta | awk "/^>/ {n++} n>1{exit} 1" > contig1.fasta
    
    • (This is another special command you don’t need to remember.)
  9. Write:
    less contig1.fasta
    
    • How long is the assembly you got?
  10. Exit this by typing
    q
    
  11. Open the contig1.fasta file in the Sublime Text app.
  12. Open the BLAST website
  13. Copy the sequence inside contig1.fasta and paste it into the Query box of the BLAST website.
  14. Search by clicking “BLAST” and wait for the results.
  15. Look at the 5 best hits and make note about:
    • Description
    • Query Cover
    • Percentage ID
    • In your opinion, how similar is your phage to the previously known phages?
  16. Select one of the hits, and go to its genome record. Just like yesterday, try to find:
    • How big is the genome?
    • Is the genome DNA or RNA, linear or circular?
    • What type of phage is it? (Siphoviridae, Podoviridae, Myoviridae, other?)
    • What is the bacterial host?