Assembling whole phage genomes
Table of contents
Slides: Assembling genomes
Activity 1: Command line basics
Rationale
We will learn the basics of how to use the terminal to give direct instructions to the computer.
Activity 1
- Follow along and take notes in your printed cheat sheet:
- Open the Terminal Preview app
- (Make the window half of the size of the screen so that you can see the Desktop)
- Write:
bash
- Write
ls
- What shows up in the Terminal?
- Take notes on your cheat sheet.
- Write:
mkdir myfolder
- What showed up in the Desktop?
- Open
myfolder/
by clicking on it on the Desktop. - Take notes on your cheat sheet
- Write:
cd myfolder
- What changed in the terminal?
- Write:
ls
- Why happened? Why?
- Write
mkdir myfolder2
- Write
ls
- Write
touch hello.txt
- Open
hello.txt
on the Sublime Text app. - Inside Sublime Text, write: “Hello world!”
- Save the file by going to File > Save. You can close Sublime Text now.
- Go back to the Terminal Preview
- Write
less hello.txt
- Exit the
less
view by typing the letter:q
- Write
rm text_file.txt
- Write:
ls
- What happened to
hello.txt
?
- What happened to
- Write:
cd ..
- Where did we go?
- Write:
ls
- You can close the Terminal Preview app now.
Activity 2: Exploring your read files from the command line
Rationale
We will download your read files and learn how to look at your reads in the terminal.
Activity 2
- Download your read files from Google Drive
- Visit the Google Drive and open the folder with your name
- Click on the little arrow in the top right of the page
- Click download. This will get saved in your
Downloads/
folder - Move the folder from your
Downloads/
to yourDesktop/
- If the folder is “zipped”, open it and move the folder inside into the
Desktop/
.
- Open the Terminal Preview app.
- Write
ls
- Write
cd <name of your folder>
- Here, replace
<name of your folder>
with the name of the folder you downloaded!! - Instructions between angle brackets
< like this >
mean that you have to write the name of the files YOU have.
- Here, replace
- Write
ls
- Do you see your read files?
- There should be two, one that ends with
_R1.sub.fastq.gz
and another that ends with_R2.sub.fastq.gz
- To make it human readable, write
gunzip -c <name_of_your_reads>_R1.sub.fastq.gz > reads_R1.fastq
- (You don’t need to remember this one.)
- To see it, write:
less reads_R1.fastq
- What do you see?
- How long is the first read?
- You can exit this view by writing
q
- To know how many reads are in the file, write:
echo $(cat reads_R1.fastq|wc -l)/4|bc
- This is another “special command” to count the number of reads in our file. (You don’t need to remember this one.)
- How many reads are in one of your read files?
Activity 3: Assembling your genome with Unicycler
Rationale
We will use a software called Unicycler
to assemble our reads into whole genomes.
Activity 3
- Go to the class computer
- Write
ls
- Locate your folder, and enter the folder that contains your reads
cd <name of your folder>
- Then write:
unicycler -h
- This is the software’s “help”. It tells us how to use it.
- Scroll up and down to see the options.
- Now we will run the assembly. Write in a single line:
unicycler -1 <name_of_your_reads>_R1.sub.fastq.gz -2 <name_of_your_reads>_R2.sub.fastq.gz -o assembly
- What are the options
-1
and-2
? - What is the
-o
option?
- What are the options
- Let your assembly run.
Activity 4: Looking at your assembled genome
Rationale
Once the assembler has run, you should have larger pieces of DNA sequence, which hopefully corresponds to a whole genome sequence.
Activity 4
- Download your assembly from Google Docs. Put it inside the folder with your reads.
- Open the Terminal Preview app
- Write
bash
- Go to the folder in the Desktop that contains your reads
cd <name of your folder>
- Go to the folder that contains the results from the assembly:
cd assembly
- Write:
ls
- There, you will find many files, but the one we care about is
assembly.fasta
- There, you will find many files, but the one we care about is
- The assembly might contain more than one large piece of DNA. Let’s check how many it has:
grep -c "^>" assembly.fasta
- How many pieces of DNA does your assembly contain?
- (This is another special command you don’t need to remember.)
- To extract only the first piece. Write
cat assembly.fasta | awk "/^>/ {n++} n>1{exit} 1" > contig1.fasta
- (This is another special command you don’t need to remember.)
- Write:
less contig1.fasta
- How long is the assembly you got?
- Exit this by typing
q
- Open the
contig1.fasta
file in the Sublime Text app. - Open the BLAST website
- Copy the sequence inside
contig1.fasta
and paste it into the Query box of theBLAST
website. - Search by clicking “BLAST” and wait for the results.
- Look at the 5 best hits and make note about:
- Description
- Query Cover
- Percentage ID
- In your opinion, how similar is your phage to the previously known phages?
- Select one of the hits, and go to its genome record. Just like yesterday, try to find:
- How big is the genome?
- Is the genome DNA or RNA, linear or circular?
- What type of phage is it? (Siphoviridae, Podoviridae, Myoviridae, other?)
- What is the bacterial host?