Are you interested in the intersection of computer science and biology?
Are you looking for a 7th-9th grade science project that is intellectually challenging but quick to perform?
My son wanted to learn about bioinformatics, so, for his school science project, I came up with this nifty computer-based experiment. There are many ways for you to creatively adapt the basic idea to suit your interests. All you need is a computer with an internet connection.
Note: A clever student can use the Internet to learn what a cDNA is, and to learn about sequence alignments, gene homology, etc. She can probably do this project on her own with some trial and error. However, assistance from a knowledgable person (biologist or high school teacher) would help.
If you use this project, please give credit to ScienceThrillers.com.
The Question: How closely related are various organisms to humans (Homo sapiens)? Do similarities we can see with our eyes correlate with similarity at the genetic level?
The Idea: Common sense gives us the general idea that gorillas are closer relatives to us than fruit flies. Use various taxonomic criteria to guess, in rank order, how similar various organisms are to humans. Examples of criteria to use:
- Prokaryote / Eukaryote
- Single-celled / Multicellular
- Plant / Animal
- Invertebrate / Vertebrate
- Gills / Lungs
- Reptile / Bird / Amphibian / Mammal etc.
Once you have guessed the relationships, use free, powerful, public DNA sequence analysis programs to compare similarity at the genetic level. Discover whether various organisms have homologous (very similar) versions of a human gene, and if so, what percent of the DNA sequence is identical.
Do the relationships you predicted hold up to molecular scrutiny?
The Ensembl Genome Browser is a free library of genome sequences from dozens of species. Select the organisms you will rank and compare to humans from choices in the drop-down list “All genomes.”
1. Open Ensembl Genome Browser (http://uswest.ensembl.org/index.html)
2. Choose species from the drop-down search menu or “All genomes” button.
The hardest part of this project is finding the right DNA sequences to compare. You might think, well, I’ll just compare the entire hedgehog genome to the entire human genome.
Bad idea. Comparing entire genomes is extremely complex. Genomes are very big (billions of nucleotides). If they are very, very similar (such as human and chimpanzee), you can get a pretty clear idea of the percent identity (percent identical DNA sequence). But if they are more dissimilar, there are a multitude of issues that arise with doing sequence alignments. So let’s forget about that approach and go for something manageable: comparing individual genes.
(Make sure you understand what a gene is. You should also know something about the basic structure of DNA–the four nucleotide bases, and the base-pairing rules. To understand cDNAs, you’ll also need to know what is an intron. Not absolutely necessary, but it wouldn’t hurt to know the central dogma of molecular biology: DNA-RNA-protein. If you find some helpful online resources explaining these terms, share them in the comments.)
Do this project with cDNA sequences instead of genes. A cDNA is “reverse transcribed” from a messenger RNA. It contains only the part of the gene that codes for protein (“exons”)–no “junk” or regulatory DNA sequence (“introns”). You can also attempt with whole genes but it will be trickier.
What genes to compare:
I think the easiest way to do this project is to pick a gene (usually given the same name as the protein it codes for), find its sequence in the genome of the organism you’re looking at, and then find a similar sequence in the human genome. You can search for any gene that you can name. (This is where it helps to have a biology person to advise you.)
If you don’t know the names of any proteins or genes, here are three genes/proteins to try. All three perform functions essential to life in most cells. Because they’re so important, versions of them are found in almost all organisms, allowing you to compare the percent identity all the way from humans to yeast (and beyond).
A) DNA polymerase: a molecular machine that copies DNA, allowing cells to reproduce. One part of this machine is called DNA polymerase beta (or DNApol beta or polB). You can try some kind of RNA polymerase, too, if you wish.
B) ATP synthase: this molecular machine performs the last step in cellular respiration (making ATP from food in the presence of air). Search for the ATP synthase gamma subunit.
C) Glucokinase: this protein performs the first step in glycolysis. Search for glucokinase 4 or its cousin, hexokinase.
3. Search for gene of interest in the appropriate genome on ensembl. (Select the species from the drop-down search menu, and type in the name of the gene you’re looking for.)
4. Select the best matching transcript from the list of results. To get the cDNA sequence, click “cDNA seq.” under the transcript you chose. In a moment, you will see all the nucleotides in the sequence, translated into amino acids for protein (those random-seeming letters under each set of three nucleotides).
5. From the bottom of the left menu, click “BLAST this sequence”
6. Under “Enter the query sequence” find the sequence; highlight and copy it.
You could do a comparison to the human genome here in ensembl but I think the way the data are presented is confusing. Instead, use the National Institute of Health’s BLAST program.
7. In another window, open BLAST program (http://blast.ncbi.nlm.nih.gov/Blast.cgi)
8. Under Basic Blast choose “nucleotide blast”. This means you are going to compare a nucleotide sequence to a nucleotide sequence. (It is also possible to compare protein sequences to predicted DNA sequences, and vice-versa.)
9. Paste the cDNA sequence into the box “Enter query sequence / Enter accession number(s) etc.” For the record, you are entering in FASTA format.
10. Under “Choose Search Set” select “Human genomic + transcript”. That means you’ll be looking in the human genome for DNA sequences that are similar to the cDNA you entered from your test organism.
11. Under “Program Selection,” try “Highly similar sequences” first. If you don’t get a match, repeat with “Somewhat similar sequences (blastn)”
12. Click “BLAST” and wait.
Ta-da! A whole bunch of data will appear. You can ignore most of it, including the colorful graphic at the top. Here’s what you should look for:
- Under “Descriptions” / “Sequences producing significant alignments”, look at the first item in the list. It will probably say Homo sapiens and the name of the cDNA you searched, mRNA. This is your match, the sequence that the BLAST program pulled out of the entire ~3 billion nucleotides of the human genome. The two numbers of interest are Query Cover % and Ident %. Query cover is a measure of how much of the sequence you tested overlaps with human sequence. In some cases, only a tiny bit of the query sequence is found in humans. Ident % is the percent of nucleotides in the overlapping area that are absolutely identical between humans and the query species. Generally, both query cover and percent identity go up or down together.
- Under “Alignments”, look at the first one. It shows the actual DNA sequence you entered (Query) lined up with a human gene (Sbjct). Here you can see where the percentages for cover and identity were calculated.
Print out the Descriptions and the first alignment for your data.
Now do it all over again for more species, or more cDNAs. Graph your results. If you rank the percent identities from highest to lowest, do they match your predictions for relatedness to humans? What does this suggest about evolution?
Sometimes the Ident% is strangely high because the Query cover is very low. This means that the sequences were pretty similar but across a very short region. To account for this, you might want to rank your species using a product of Query Cover and Ident% multiplied together.
Bioinformatics Science Project courtesy of ScienceThrillers.com ©Dr. Amy Rogers, 2014
Like to read novels with real science in the story? Or maybe you want to hear about the latest STEM contests for kids? It’s all in the ScienceThrillers quarterly e-newsletter. Subscribe now and you’ll always be eligible to enter ScienceThrillers.com book giveaways.
What if genetically-altered bacteria ate all the gasoline in Los Angeles?
Petroplague, a science thriller novel by Amy Rogers
Use the real science in this page-turning tale to engage your bio/microbio/chem students in grades 9-16. The new Petroplague Teacher’s Guide makes it easy to use the book in your classroom.
What critics say about Petroplague:
“Compellingly written, technically literate…”
“Every scientific concept in Petroplague is not only accessible, it’s crystal clear. Not like other science thrillers where you just go along with it.”
“Amy Rogers nails every aspect of L.A.”
“a great example of lab lit in the Crichtonesque School of epic science disaster writing.”
“in the top five on my best of year list”