Assembling whole phage genomes
Table of contents
Slides: Assembling genomes
Activity 1: Command line basics
Rationale
We will learn the basics of how to use the terminal to give direct instructions to the computer.
Activity 1
- Follow along and take notes in your printed cheat sheet:
 
- Open the Terminal Preview app 
- (Make the window half of the size of the screen so that you can see the Desktop)
 
 - Write: 
bash - Write 
ls- What shows up in the Terminal?
 - Take notes on your cheat sheet.
 
 - Write: 
mkdir myfolder- What showed up in the Desktop?
 - Open 
myfolder/by clicking on it on the Desktop. - Take notes on your cheat sheet
 
 - Write: 
cd myfolder- What changed in the terminal?
 
 - Write: 
ls- Why happened? Why?
 
 - Write 
mkdir myfolder2 - Write 
ls - Write 
touch hello.txt - Open 
hello.txton the Sublime Text app. - Inside Sublime Text, write: “Hello world!”
 - Save the file by going to File > Save. You can close Sublime Text now.
 - Go back to the Terminal Preview
 - Write 
less hello.txt - Exit the 
lessview by typing the letter:q - Write 
rm text_file.txt - Write: 
ls- What happened to 
hello.txt? 
 - What happened to 
 - Write: 
cd ..- Where did we go?
 
 - Write: 
ls - You can close the Terminal Preview app now.
 
Activity 2: Exploring your read files from the command line
Rationale
We will download your read files and learn how to look at your reads in the terminal.
Activity 2
- Download your read files from Google Drive 
- Visit the Google Drive and open the folder with your name
 - Click on the little arrow in the top right of the page
 - Click download. This will get saved in your 
Downloads/folder - Move the folder from your 
Downloads/to yourDesktop/ - If the folder is “zipped”, open it and move the folder inside into the 
Desktop/. 
 - Open the Terminal Preview app.
 - Write 
ls - Write 
cd <name of your folder>- Here, replace 
<name of your folder>with the name of the folder you downloaded!! - Instructions between angle brackets 
< like this >mean that you have to write the name of the files YOU have. 
 - Here, replace 
 - Write 
ls- Do you see your read files?
 - There should be two, one that ends with 
_R1.sub.fastq.gzand another that ends with_R2.sub.fastq.gz 
 - To make it human readable, write 
gunzip -c <name_of_your_reads>_R1.sub.fastq.gz > reads_R1.fastq- (You don’t need to remember this one.)
 
 - To see it, write: 
less reads_R1.fastq- What do you see?
 - How long is the first read?
 - You can exit this view by writing 
q 
 - To know how many reads are in the file, write: 
echo $(cat reads_R1.fastq|wc -l)/4|bc- This is another “special command” to count the number of reads in our file. (You don’t need to remember this one.)
 - How many reads are in one of your read files?
 
 
  Activity 3: Assembling your genome with Unicycler 
 Rationale
We will use a software called Unicycler to assemble our reads into whole genomes.
Activity 3
- Go to the class computer
 - Write 
ls - Locate your folder, and enter the folder that contains your reads 
cd <name of your folder> - Then write: 
unicycler -h- This is the software’s “help”. It tells us how to use it.
 - Scroll up and down to see the options.
 
 - Now we will run the assembly. Write in a single line: 
unicycler -1 <name_of_your_reads>_R1.sub.fastq.gz -2 <name_of_your_reads>_R2.sub.fastq.gz -o assembly- What are the options 
-1and-2? - What is the 
-ooption? 
 - What are the options 
 - Let your assembly run.
 
Activity 4: Looking at your assembled genome
Rationale
Once the assembler has run, you should have larger pieces of DNA sequence, which hopefully corresponds to a whole genome sequence.
Activity 4
- Download your assembly from Google Docs. Put it inside the folder with your reads.
 - Open the Terminal Preview app
 - Write 
bash - Go to the folder in the Desktop that contains your reads 
cd <name of your folder> - Go to the folder that contains the results from the assembly: 
cd assembly - Write: 
ls- There, you will find many files, but the one we care about is 
assembly.fasta 
 - There, you will find many files, but the one we care about is 
 - The assembly might contain more than one large piece of DNA. Let’s check how many it has: 
grep -c "^>" assembly.fasta- How many pieces of DNA does your assembly contain?
 - (This is another special command you don’t need to remember.)
 
 - To extract only the first piece. Write 
cat assembly.fasta | awk "/^>/ {n++} n>1{exit} 1" > contig1.fasta- (This is another special command you don’t need to remember.)
 
 - Write: 
less contig1.fasta- How long is the assembly you got?
 
 - Exit this by typing 
q - Open the 
contig1.fastafile in the Sublime Text app. - Open the BLAST website
 - Copy the sequence inside 
contig1.fastaand paste it into the Query box of theBLASTwebsite. - Search by clicking “BLAST” and wait for the results.
 - Look at the 5 best hits and make note about: 
- Description
 - Query Cover
 - Percentage ID
 - In your opinion, how similar is your phage to the previously known phages?
 
 - Select one of the hits, and go to its genome record. Just like yesterday, try to find: 
- How big is the genome?
 - Is the genome DNA or RNA, linear or circular?
 - What type of phage is it? (Siphoviridae, Podoviridae, Myoviridae, other?)
 - What is the bacterial host?