Monday, May 18, 2015

Reflections

I experienced a lot of challenges in my STEM internship this year. My mentor and I had pretty different schedules, and in the first semester, we only met in person every other week. That slowed down our progress a little bit, but by the second semester, we'd figured out a good plan for meeting, which made the research a lot easier. We also ran into some challenges with the research itself. Even though we were relatively sure that the model was a good fit for the data, we still couldn't get conclusive results. It took a lot of trial and error, and experimenting with parameters. However, on the very last day of my internship, we finally got a good distribution!!

Dr. Miller was a wonderful teacher, and even though the concepts were hard, I learned a lot about evolutionary biology and the process of research itself. My proudest moment was when we finally got a distribution that had convergence around the mean, even though I'd accepted by that point that we might not get the results we were looking for. Overall, I had a wonderful experience, and I know that I'll go into college prepared to do high-level scientific research.

For next year's Signature participants, especially for the STEM interns, I would advise them to be prepared for lots of trial and error in the research. We often didn't know what was causing our problems, so it just took a lot of experimentation and a little bit of luck to get the results we were looking for.

I'd like to thank Dr. Miller for a wonderful experience this year. Good luck to next year's Signature participants!

Friday, March 6, 2015

Phylogenetic Trees


Here are two of the trees I made from the data Dr. Miller gave me. The trees were both constructed on the program Mesquite. One tree was made with the PaupRAT algorithm, and the other was made with the RaxML algorithm. 

PaupRAT tree

RaxML tree

As you can see, the two trees vary pretty significantly. Different algorithms interpret the same data in different ways, which is why it is necessary to construct and compare multiple trees from the same data. At this point, we haven't done much analysis, and in the coming weeks, I'm going to construct and look at some more trees before Dr. Miller and I make any conclusions.

Thursday, March 5, 2015

Avian Mycobacteriosis

Dr. Miller and I are using phylogenetic analysis to determine whether "avian mycobacteriosis" (which is basically an infection in birds) comes from the environment or is transmitted from bird to bird.

Avian mycobacteriosis affectes all bird species, and has a relatively low prevalence in captive species (1.2%). The prevalent belief is that this bacteria is transmitted from bird to bird via fecal contamination. If one bird in a population is affected by avian mycobacteriosis, then zoos generally euthanize the whole population. However, this practice has had several negative consequences, including a decrease in genetic diversity, and a decrease in effectivity of breeding and conservation programs.

Dr. Miller and I are working with a team in San Diego to determine whether the belief that the disease is transmitted from bird-to-bird is even correct. If the source of the infection is the environment, then there's actually no point in killing all of the possibly exposed birds. We're going to look at the similarities between the different strains of the infection observed in birds-- if the strains are nearly identical, then we can confirm a bird-to-bird transmission. His work with Peggy last year resulted in pretty convincing evidence that it wasn't, and now Dr. Miller and I are working with additional genetic data to either confirm or fail to confirm last year's results.

In the past, genetic samples have been taken from fingerprinting methods, which includes the RAPD method (gel banding pattern analysis, which is less reproducible and comparable), and the MLST method (captures 0.1% of a bacterial genome). We're working with whole genome sequencing, which is much more reliable and reproducible, although it takes more time to complete.

The data we've looked at so far has been supportive of an environmental source in most of the observed enclosures, but the results aren't completely conclusive. In the time we have left, we're going to construct additional phylogenetic trees with the new data to more accurately determine the source of the infection.

Friday, January 30, 2015

Annotated Bibliography part 2

Here's part 2 of my annotated bibliography:

Jarvis, Erich, and Siavash Mirarab. "Whole-genome Analyses Resolve Early Branches in the Tree of Life of Modern Birds." Science 346, no. 6215 (2015): 1320.

I discussed this article earlier in my blog. The paper summarizes the findings of a research project in which scientists performed a genome-scale phylogenetic analysis of 48 bird species. It outlines the various programs the scientists used to perform the analysis, and explains the research methods quite thoroughly. This article will serve as a great resource for explaining some of the current phylogenetic research, and the paper's explanation of this team's analysis will likely help my mentor and me as we perform our own phylogenetic analysis on the San Diego zoo data. 

Huelsenbeck, John, Bruce Rannala, and John Masly. "An Introduction to Bayesian Inference of Phylogeny." Science.

My mentor shared this article with me. It's a very in-depth explanation of Bayesian Inference, which is a statistical method used to analyze evolutionary trees. The paper gives several examples of how to apply BI to evolutionary problems, and I think it will help me as I use BI to analyze the San Diego Zoo data. 
Drummond, Alexei J, and Andrew Rambaut. "BEAST: Bayesian Evolutionary Analysis by Sampling Trees." BMC Evolutionary Biology: 214.

My mentor and I will be using the BEAST program to perform our analysis, which uses Bayesian Inference to create the phylogenetic trees (as opposed to Maximum Likelihood or Parsimony). This article uses statistical and evolutionary principles to explain how the program works, and compares it to other existing programs that use Bayesian Inference. It will help me understand how to use BEAST, and will also help me understand the principles and assumptions that the program is built on.

"Analysing BEAST Output." Analysing BEAST Output. Accessed January 30, 2015. http://beast.bio.ed.ac.uk/analysing-beast-output.

This is a simple tutorial for BEAST. It actually takes you step-by-step through the BEAST program, providing helpful screenshots and including lots of detail about how the program works. This particular tutorial teaches you how to analyze BEAST output. 

Wednesday, January 21, 2015

Week of January 14th

This week, Dr. Miller and I discussed an article that I read over break, "Whole-genome analyses resolve early branches in the tree of life of modern birds" (published in Science, December 2014). We could only meet for 45 minutes because Dr. Miller had to catch his plane, but I still got a pretty thorough understanding of recent evolutionary biology research.

This paper discusses the findings of a mammoth research project in which scientists performed a genome-scale phylogenetic analysis of 48 bird species. These 48 genomes were computationally aligned and evaluated to create the most reliable avian evolutionary tree yet produced. Previous research efforts only analyzed selected genes, but this project analyzed whole genomes, which is why it is so accurate.

Most bird species became extinct soon after dinosaurs underwent their large-scale extinction, and as a result, the branches of the avian evolutionary tree were extremely muddled. By choosing species that represented the broadest possible diversity of birds, and by using whole-genome data, these scientists created an accurate picture of avian "family history." The paper also reveals why avian genomes tend to be small compared to those of other vertebrates: because they have lost a lot of genes and have far fewer repeat sequences. 

Here is an example of one of the several figures in the paper:


Reading and discussing this paper gave me a good understanding of the current evolutionary biology research occurring. The researchers who led this project applied the same techniques that I've been learning this year, and I think learning about this research and the methods they used will help me when I start the project with the San Diego Zoo this semester. 

Friday, December 12, 2014

Annotated Bibliography

Here is my annotated bibliography for my STEM internship. I will use these resources to gain a better understanding of evolutionary and computational biology, and the methods scientists use in doing phylogenetic analysis.

Farris, James. The logical basis of phylogenetic analysis. na, 1983.

            This book gives an overview of the principles of phylogenetic analysis. It explains the relative merits and drawbacks of each method of phylogenetic analysis, as well as explaining assumptions needed in constructing an evolutionary tree. It will help me gain an in-depth understanding of phylogenetic analysis, which is what my project is based on, and will equip me to construct the tree most suitable for my data.

William, J., and O. Ballard. "Combining Data in Phylogenetic Analysis." Trends in Ecology & Evolution, 1999, 334.


            In my project, I’ll have to learn how to incorporate different types of data (ie. nucelotide sequences vs. amino acid sequences) in order to construct the most suitable evolutionary tree. In constructing evolutionary trees, one often has access to more data than she can use, and this article will help me understand the various approaches to combining/partitioning data, and which approach will be the most suitable for my analysis.

Wednesday, November 19, 2014

Ch. 4: Neighbor Joining Trees

Today, I continued learning about the major methods for estimating phylogenetic trees. The Neighbor Joining Method is one of the most popular distance algorithmic method. It produces a "single, strictly bifurcating tree," which means that each internal node has exactly two branched descending from it. I downloaded another file of data for practice to construct the tree.

First I opened up the file LargeData.meg from MEGA. The window shows DNA sequence alignment. 



Next, I had to determine whether the data was even suitable for estimating a Neighbor Joining Tree. In the book, it said that if the average pairwise Jukes-Cantor Distance is more than 1.0, the data isn't suitable for making NJ trees and another phylogenetic method should be used. To find the Jukes-Cantor distance, I computed the overall mean of the distances, and found the average distance to be 0.534, which is suitable for making NJ trees.

I constructed the tree by choosing Construct/Test Neighbor Joining Tree from Phylogeny menu. After using the "bootstrapping" method to test the reliability of the tree, and making sure that the tree was being constructed using the Neighbor Joining Method, I produced the following tree using the program:


There are many different ways to represent the same data in a tree. I played around with some of the various options and also got this circular tree: