Friday, December 12, 2014

Annotated Bibliography

Here is my annotated bibliography for my STEM internship. I will use these resources to gain a better understanding of evolutionary and computational biology, and the methods scientists use in doing phylogenetic analysis.

Farris, James. The logical basis of phylogenetic analysis. na, 1983.

            This book gives an overview of the principles of phylogenetic analysis. It explains the relative merits and drawbacks of each method of phylogenetic analysis, as well as explaining assumptions needed in constructing an evolutionary tree. It will help me gain an in-depth understanding of phylogenetic analysis, which is what my project is based on, and will equip me to construct the tree most suitable for my data.

William, J., and O. Ballard. "Combining Data in Phylogenetic Analysis." Trends in Ecology & Evolution, 1999, 334.


            In my project, I’ll have to learn how to incorporate different types of data (ie. nucelotide sequences vs. amino acid sequences) in order to construct the most suitable evolutionary tree. In constructing evolutionary trees, one often has access to more data than she can use, and this article will help me understand the various approaches to combining/partitioning data, and which approach will be the most suitable for my analysis.

Wednesday, November 19, 2014

Ch. 4: Neighbor Joining Trees

Today, I continued learning about the major methods for estimating phylogenetic trees. The Neighbor Joining Method is one of the most popular distance algorithmic method. It produces a "single, strictly bifurcating tree," which means that each internal node has exactly two branched descending from it. I downloaded another file of data for practice to construct the tree.

First I opened up the file LargeData.meg from MEGA. The window shows DNA sequence alignment. 



Next, I had to determine whether the data was even suitable for estimating a Neighbor Joining Tree. In the book, it said that if the average pairwise Jukes-Cantor Distance is more than 1.0, the data isn't suitable for making NJ trees and another phylogenetic method should be used. To find the Jukes-Cantor distance, I computed the overall mean of the distances, and found the average distance to be 0.534, which is suitable for making NJ trees.

I constructed the tree by choosing Construct/Test Neighbor Joining Tree from Phylogeny menu. After using the "bootstrapping" method to test the reliability of the tree, and making sure that the tree was being constructed using the Neighbor Joining Method, I produced the following tree using the program:


There are many different ways to represent the same data in a tree. I played around with some of the various options and also got this circular tree:




Wednesday, November 12, 2014

Major Methods for Estimating Phylogenetic Trees

Today, I worked through Chapter 5 in my book, which gives a good overview of the major methods for estimating phylogenetic trees.

There are two primary approaches to tree estimation: algorithmic and tree-searching. The algorithmic approach uses an algorithm to estimate a tree from the data. The tree-searching method estimates many trees, then uses some criterion to decide which is the best tree.

The algorithmic approach has two advantages. It is fast, and it yields only a single tree from any given data set. The Neighbor Joining method is the most common algorithmic method, and I'll be learning how to use it next week.

All the other currently used approaches are tree-searching methods. They are generally slower, and some will produce several equally good trees. Methods such as Parsimony, Maximum Likelihood, and Bayesian analysis search for the tree that best meets the criteria by evaluating individual trees. Maximum Likelihood looks for the tree that, under some model of evolution, maximizes the likelihood of observing the data. Bayesian Inference is a recent variant of Maximum Likelihood. Instead of seeking the tree that maximizes the likelihood of observing the data, it seeks those trees with the greatest likelihoods given the data, and produces a set of trees with roughly equal likelihoods. Parsimony is the simplest method, and it looks for the tree or trees with the minimum number of changes.

 It's almost impossible to evaluate each possible tree because of the sheer number of possibilities (even with only 10 taxa, there are more than 34 million rooted trees). Therefore, something called a "branch-addition algorithm" is used to find each of the possible trees, which I won't go into detail here.

It's important to realize that since we don't know what happened in the past, we can never be entirely sure how accurate the tree is. In addition, there is no "right" tree-- we can only hope to find the tree that most closely approximates what happened in the past.

Friday, November 7, 2014

Aligning Sequences

Once you acquire sequences (which I covered in my previous blog post), you must then align the sequences before constructing the phylogenetic tree. In MEGA, Two alignment methods are provided: ClustalW and MUSCLE. Either can be used, but in general MUSCLE is preferable. In the Alignment menu in MEGA, I chose the MUSCLE button, and then I clicked on the "Align Codons" button. There were two choices: Align DNA and Align Codons, and since my sequence was a DNA coding sequence I chose Align Codons, which ensure that the sequences are aligned by codons, a much more realistic approach than direct alignment of the DNA sequences because that avoids introducing gaps into positions that would result in frame shifts in the real sequences. Once I started the alignment process in the program, it took about two minutes. Next, I exported my file in the correct format so I would be able to use it to estimate my phylogenetic tree later. 

Here is my final aligned file:


Once I completed the alignment, I noticed that gaps were introduced into the sequences. Those gaps represent historical insertions or deletions, and their purpose is to bring homologous sites into alignment in the same column. Justas a phylogenetic tree is an “estimate” of relationships among sequences, an alignment is just an estimate of the positions of historical insertions and deletions. 

Wednesday, October 22, 2014

Acquiring Sequences

My mentor was in San Diego this week, so I worked through Chapter 4: Acquiring Sequences of the book Phylogenetic Trees Made Easy on my own.

The most time-consuming part of estimating a phylogenetic tree is acquiring the sequences that will be the tips of the tree. Today, I learned how to find related sequences, and what criteria I should consider as I decide which sequences to include. The BLAST search in the MEGA program is the primary tool for identifying sequences that are homologous to your sequence of interest, the query sequence, and I used it today to acquire sequences related to the E. Coli bacteria.

Here is a screenshot of all the DNA sequences I acquired in BLAST:


It took me some time to figure out how to correctly acquire sequences. When you enter the query sequence, BLAST pulls up all the related DNA sequences, but it's up to you to determine which results will actually be useful. Then, you have to click on each sequence of interest and go through a process in which you add the sequence to your list in the correct format. Since the book was written for an earlier version of MEGA than the one I have, it took a lot of trial and error to correctly acquire all the sequences.

Wednesday, October 15, 2014

Phylogenetic Trees

This week, I started learning phylogenetic analysis, which biologists use to understand evolutionary and molecular relationships. Dr. Miller gave me a book called "Phylogenetic Trees Made Easy," and I completed the first tutorial in it, which outlined the steps for creating a simple phylogenetic tree.



To make the tree, I first had to download a program called MEGA 5, which analyzes and aligns sequences from related organisms. Specifically, I looked at a sequence from the bacterium Thermotoga petrophila. I used a "search engine" called BLAST to help me find sequences that produced similar alignments to the original sequence. I chose the 6 most closely related sequences, and then used the program to produce a DNA alignment:


The final step, which was to construct a phylogenetic tree based on the alignment, was actually pretty simple. All I did was click on a button labelled "Neighbor Joining Tree," and the program created a phylogenetic tree for me! Getting the alignments was the hard part, because it's so easy to make simple mistakes, which can mess up the whole alignment. It took Dr. Miller and me several tries to get it right. 
Phylogenetic tree!


Saturday, October 4, 2014

Internship Revealed!

My STEM internship for 2014-2015 is (drumroll, please)... evolutionary biology with Dr. Mark Miller! Dr. Miller is currently working on a research project called Next Generation Tools for Biology at the San Diego Supercomputer Center (he commutes to Albany each week). He develops software tools and infrastructure for biomedical applications. This internship is perfect for me, because I can build on the programming skills I learned last year but also explore the field of evolutionary biology.

I don't know yet what research problem we'll be working on-- right now, I'll just be learning the basics of creating phylogenetic trees with him. We'll start on a research project next semester. Last year, Peggy and Dr. Miller worked on a project with the San Diego Zoo to trace the origins of a disease affecting a species of birds there. It was really cool and I look forward to what I'll be doing!

Friday, September 12, 2014

New year, new STEM internship!

I'm excited to continue doing a STEM internship this year! Unfortunately, I won't get to continue working on my particle physics research with Dr. Bellis, because he isn't a mentor this year, but I'm looking forward to a new experience.

I'm interested in a STEM internship because I love questions that don't have answers. Science research is all about exploring the unknown, and that's something that has always fascinated me. I had a wonderfully challenging experience last year, and I definitely learned a lot about having to work through and find solutions to unexpected obstacles that come up. This year, I want to both build on these skills and develop new ones. I'm going to work towards improving my experimental technique, and I want to get more practice doing basic lab work. I also want to improve my reasoning skills, which is important not only in scientific research but pretty much any field a person can go into.

It's going to be a great year! I can't wait to find out which mentor I get placed with.

Tuesday, May 20, 2014

Looking back

I am so grateful that I got the chance to participate in the STEM program this year and work with Dr. Bellis. Before this year, I had never even taken a physics class before. It was definitely challenging to start my internship without a lot of background knowledge, especially when I had to do independent work at school. I’d also always thought of scientific research as being in the lab, but my work with Dr. Bellis taught me otherwise. I spent a lot of time on my computer, reading published papers and writing code. It was frustrating at times because I couldn’t really see where all of my work was taking me, but by the end all the pieces started to come together and I really understood what I was studying. I also learned to not let minor obstacles deter me from reaching my research goal. Overall, this year has been a very rewarding experience. I’m looking forward to continuing to do scientific research.

From my poster:

“Particle physicists study the smallest possible things in the universe, and ask the biggest questions. There might be no immediate practical applications to discovering the hybrid meson, but it will take us one step closer to understanding what we’re made of. My internship has given me both practical research skills and a deeper understanding of the unknown. I am no longer scared of the biggest questions. Instead, they push me to keep learning.”


Wednesday, April 2, 2014

Week of April 2nd

Unfortunately, I couldn't travel to Siena today because Dr. Bellis is busy preparing for the American Physical Society conference in Georgia. He's presenting his research on dark matter at the conference, and I'm excited to hear how it goes!

Wednesday, February 26, 2014

Week of February 26th


I dove into my dark matter research today, and started looking up all of the various dark matter experiments. For each experiment, I read all the major papers it had published, as well as general news articles. It was really challenging to get through the publications, because there was so much technical jargon as well as concepts that I haven't encountered yet. By focusing on the titles of the papers and the abstracts, I was able to get a general understanding of what the paper was about. I also learned a lot just by looking at the graphs and tables each paper included, since I tend to understand concepts better when I see visual representations of them.

There are six major dark matter experiments going on right now: The LUX (Large Underground Xenon) experiment in South Dakota, the Xenon10 and Xenon100 experiments at Columbia, the Dama/Libra experiment in Italy, the CRESST (Cryogenic Rare Event Search with Superconducting Thermometers) experiment, and SuperCDMS (cryogenic dark matter search) in California. 

Today, I researched the LUX experiment and the Xenon10 and Xenon100 experiments, since they both use liquid xenon in their detectors. I put together a page-long summary for each experiment for Dr. Bellis, but I will just briefly recap what I learned here.

LUX experiment:

Detector: LUX is trying to directly detect dark matter (specifically WIMP particles, which most scientists believe constitute dark matter) with a liquid xenon time-projection chamber. A time-projection chamber is a type of particle detector that places electric and magnetic fields parallel to each other in the detector, which means that electrons travel in a straight line (instead of a curved path, which would occur if the electric and magnetic fields were perpendicular to each other). This allows scientists to create a three-dimensional picture of collisions in the detector, and determine the energy and momentum of particles. Xenon is used because it's very pure and because it fluoresces when struck by a charged particle. From the official LUX experiment page:

"Interactions inside the xenon will create an amount of light proportional to the amount of energy deposited. That light can be collected on arrays of light detectors sensitive to a single photon, lending the LUX detector a low enough energy threshold to stand a good chance of detecting the tiny bump of a dark matter particle with an atom of xenon."

Results: In February 2014, scientists at LUX concluded that from the first 90 days of data, there was no statistically significant evidence of WIMPS. This is surprising since it's considered to be the most sensitive detector to date, and other less sensitive detectors have found hints of the particles.

Xenon100:

The Xenon100 detector is very similar to the LUX detector. It's a time-projection chamber and it uses liquid xenon. The setup of all the dark matters are similar, but the difference is in the constraints set on the WIMP interactions (basically, where the scientists are looking for the dark matter interactions to happen). 

Results: The Xenon100 experiment found no evidence for dark matter interactions.


Wednesday, February 19, 2014

Week of February 19th

Today, I finally returned to Siena! First, Dr. Bellis and I went over all the work I did independently for the past two weeks. He explained the concept of cross sections more in detail to me, and talked about some of its applications in dark matter theory. We also reviewed WIMP (Weakly Interacting Massive Particles) Theory.

Dr. Bellis also took me on a tour of Siena's revamped particle physics lab. Siena just got some new hardware for their computers, including a processor called (I think) the Tesla K40. This enables researchers to perform complicated calculations in a fraction of the time that it used to take them. It's the same kind of hardware that animators at Disney and Pixar use to create movies, except in this case researchers are using it for modeling and programming. Dr. Bellis demonstrated some of the different functions the new computers could perform. It was really cool to see all of the new equipment and learn more about how scientists do their research.

Since I haven't worked on my Python skills in a while, I also practiced some more programming. Dr. Bellis introduced me to Rosalind and Project Euler, which are two sites that have Python coding problems. Project Euler has mathematics-based problems, and Rosalind deals with bioinformatics, which is a field that develops methods for analyzing biological data. I'm interested in biology, so I tackled the first problem on the site:



The website also provides helpful reviews of molecular biology, which was great because I've forgotten a lot since AP Bio! My Python skills were a little bit rusty, so it took a while to figure out how to solve this problem. Eventually, I figured out that I had to create counters for each base and then use an "if/then" statement to define how the computer should return the four integers.

Wednesday, February 12, 2014

Week of February 12th

Unfortunately, Dr. Bellis' son was still under the weather, so I did some more independent research from school. My first task was to read and understand the Wikipedia page for a cross section in particle physics. There was a lot of scary calculus and formulas, but eventually I understood the basic concept.

In particle physics, a cross section is simply the probability that two particles will interact. Say that you're randomly throwing darts at a target on the wall, with the hope that some of them will hit the target. A cross section is equivalent to the probability that the dart will interact with the target and not the wall (which is basically the ratio of the area of the target and the area of the wall).

The cross section of a particle determines how long that particle will annihilate (which means to convert into radiant energy). Cross sections are used the WIMP (Weakly Interacting Massive Particles) Theory of dark matter, so I'm sure that taking the time to understand this concept will serve me well when I go back to my internship.

In addition to learning about cross sections, I also started researching the current dark matter experiments going on right now. There are about 5 different major experiments, each of which use different elements to detect WIMPs. In my next meetings with Dr. Bellis, I'll start reading the papers that each of these experiments have published, but right now I just wanted to get an overview of the current research. Here's a great video I found about one of the dark matter labs:


I'm looking forward to meeting with Dr. Bellis again!

Wednesday, February 5, 2014

Week of February 5th

Now that the first part of our hybrid meson project is complete, Dr. Bellis is having me help him with the dark matter research he's working on. Unfortunately, his son was sick today, so I wasn't able to come to Siena to work. However, I still worked from school, and answered a few questions about dark matter that he prepared me. (Source is Wikipedia for all of the answers)

Why do we believe dark matter exists?

Scientists first hypothesized the existence of dark matter to account for the discrepancy between the mass of large astronomical objects determined from their gravitational effects and the mass calculated from the "luminous matter" (stars, gas, etc.) they contain. We believe that the reason for this difference is that there is another type of matter, dark matter, which isn't reactant to light. There is no direct evidence that it exists, so we can't be entirely sure that it is the reason for the discrepancy, but we can infer its existence from its gravitational effects on visible matter and radiation.
Who was Vera Rubin?
Vera Rubin is an American astronomer who, in the 1970s, discovered convincing evidence for the existence of dark matter. She observed that stars in spinning galaxies were all rotating at roughly the same velocity (with the distance from the galactic center having no visible effect on the velocity). The stars were not rotating around the visible center of the galaxy but around many unknown centers, all providing additional gravitational attraction. This contradicted Kepler's law of planetary motion.
Rubin's observation of the velocity of stars in spinning galaxy provided convincing evidence for dark matter, because the velocity curves that she observed could only occur if huge amounts of invisible matter were causing additional gravitational attraction.
What does the "weak" in Weakly Interacting Massive Particle (WIMP) refer to?
The weak in WIMP refers to the weak nuclear force, one of the four fundamental interactions of nature. It's responsible for the radioactive decay and nuclear fusion of subatomic particles. It is caused by the emission and absorption of W and Z bosons. The weak interaction is also capable of changing the flavor of quarks.
How "massive" are these WIMPs believed to be? How big is that relative to the mass of a proton?

10-100 GeV/c^2

proton mass ~ 1 GeV/c^2

WIMPS are predicted to be between 1-100x mass of a proton.


Wednesday, January 29, 2014

Week of January 29

Yayyy!!! I finished our calculations!!! I'm one step closer to becoming an actual scientist :) Basically, our calculations predict the number of D mesons produced from one billion B mesons. They are so simple that anyone who knows the laws of probability can do them. The key is just knowing the decays for each particle and what numbers to actually multiply together.

Here are the decays each B meson goes through to get to a D meson:


Number of B Mesons: 1 billion
B => B0 Bbar0: 500 million
B => B+ B- =500 million
B+- => K+- psi/g =1%
B0 => K0 psi/g= 1%
psi/g => D2(2460) D
psi/g => D2(2460) D*
psi/g => D1(2420) D
psi/g => D1(2420) D*

So, as you can see, we start off with one billion B mesons. 500 million mesons decay to B neutral Bbar neutral mesons, and 500 million decay to B+ and B- mesons. The charged B mesons then decay to charged kaons and psi/g particles, and the neutral B mesons decay to neutral kaons and psi/g particles. The psi/g particles then decay to our four different types of particles: the D2(2460), D1(2420), D, and D* mesons. (The number in parentheses just tells the mass of the meson.)

The BaBar detector isn't 100% efficient, and its efficiency varies depending on the particle. So, while the efficiency for the charged kaon was a solid 95%, the efficiency for a neutral kaon was only 40%. We had to take all of this into account when making our predictions. Here are all of our efficiencies:


B+- => K+- psi/g =1%efficiency for K+-: 95%
B0 => K0 psi/g= 1%efficiency for K0: 40%

However, just because the detector will be able to detect a certain decay doesn't mean that that decay will actually happen in the first place. We also have to take the branching ratios of the decays into account. As I explained in an earlier blog post, the branching ratio is the probability that a certain decay will actually take place. Here are the branching ratios we came up with:


psi/g => D2(2460) DBR: 25%
psi/g => D2(2460) D*BR: 20%
psi/g => D1(2420) DBR: 25%
psi/g => D1(2420) D*BR: 20%

Once I had all the assumptions, I could finally get started on the actual calculations! This was probably the easiest part of the whole process; all I had to do was multiply the right percentages together to get the final number of D mesons produced. Here are the calculations:


Calculations:
(5.0 x 10^8 charged B mesons)(1%)= 5.0 x 10^6 K+- psi/g(5 x 10^8)(1% BR)(40% efficiency)= 2.0 x 10^6 K0 psi/g
(.95)(5.0 x 10^6)= 4.75 x 10^6 K+- psi/g
psi/g => D2(2460) D
(.25)(.01)(.05)(4.75 X 10^6)= 593.75(.25)(.01)(.05)(2.0 x 10^6)= 250
psi/g => D2(2460) D*
(.2)(.01)(.03)(4.75 x 10^6)= 285(.2)(.01)(.03)(2.0 x 10^6)= 120
psi/g => D1(2420)D
(.25)(.05)(.01)(4.75 x 10^6) = 593.75(.25)((.05)(.01)(2.0 x 10^6)= 250
psi/g => D1(2420)D*
(.2)(.03)(.01)(4.75 x 10^6)= 285(.03)(.01)(4.0 x 10^5)= 120

Our numbers were actually a lot better than we were expecting. From one billion B mesons, about 2500 D mesons will be produced (and actually detected). This is a good enough number to continue on with our work.

Of course, this all depends on if the assumptions we made were correct. To get a second opinion, Dr. Bellis is going to send the assumptions and calculations we made to the BaBar researchers at Stanford. If they agree with what we have come up with, then they will send us the rest of the BaBar data, which is when the real fun will begin. That's when all my Python and data analysis skills will come in handy. Fingers crossed!

Wednesday, January 22, 2014

Week of January 22nd

Q: What did Donald Duck say in his graduate physics class?
A: Quark, quark, quark!

Happy Lame Physics Jokes Day! Today, I was only at my internship for about an hour because of MLK Service Day and because my mentor had to leave a little bit early. I was also recovering from a cold, so I wasn't completely mentally functional. I still got some work done though!

To do our calculations for the hybrid meson, we have to come up with a list of assumed efficiencies for the particles involved in the decay. I started working on our list of assumptions today. The Particle Data Review was, as always, a really helpful source, but it was sometimes difficult to decipher all of the different numbers. I also used the paper Dr. Bellis gave me last week to come up with efficiencies for the D(2420) and D(2460) mesons. Next week, I'll be using these assumptions to actually finish the calculations, which I'm excited about!

Something cool that happened this week was that my particle physics booklet from the Particle Data Group came in the mail. This tiny booklet contains almost everything you need to know about the universe. It has the decays, branching ratios, masses, and pretty much any information you could ever want about a particle. Dr. Bellis has the full-sized version of this booklet in his office, but I ordered this free booklet so I can look up information when I'm working from school. Here's a picture of what the booklet looks like:

Wednesday, January 15, 2014

Week of January 15

Happy New Year! At the end of last semester, I was at a pretty good place with my internship. I'd spent a good amount of time learning technical skills, and by the last few meetings, I had started writing code for data analysis. This semester,  my first goal is to finish the preliminary calculations for the hybrid meson. Here's a quick recap to remind you:

From 1999 to 2008, the SLAC National Accelerator Laboratory at Stanford conducted the BaBar experiment, which involved hundreds of researchers using the BaBar detector, a multilayer particle detector, to study the difference or disparity between the matter and antimatter content in the universe. The experiment is no longer running, but there are years of data that have yet to be analyzed. My job is to run a code on Python that analyzes a very specific section of the data to determine whether or not it is worth further analysis. In particular, I am searching for evidence of a new type of particle called an exotic meson, which has already been predicted to exist.

The calculations are actually much simpler than the data analysis on Python I've been doing. All I have to do is figure out how many of these mesons we can expect to see in the data (if they do in fact exist). But, before I do my calculations, Dr. Bellis wants me to understand a little bit more about the BaBar experiment itself. 

So, to give me an idea of the sort of work that scientists at BaBar did, he gave me a paper that he co-authored at Stanford to read. It's called (get ready for it, it's a mouthful) "Observation of new resonances decaying to D/pi and D*/pi in inclusive e+e- collisions near s=10.58 GeV." The paper basically describes a certain type of decay that the BaBar detector measured. It was really cool to read the paper and see that they included mass distribution graphs for the particles, which are the same graphs that I've been making in Python!

The detector measures the momentum of charged particles using a huge magnet, which contains an SVT or silicon vertex tracker and a DCH or drift chamber. The SVT consists of five layers of double-sided silicon detectors, which transmit the position measurements of the particles to an integrated circuit. The DCH is a gas-filled chamber that provides the momentum measurements for charged particles. Here's some more information about the components of the detector: http://www.slac.stanford.edu/BFROOT/www/doc/workbook/detector/detector.html

The link also explains why the magnet is important: "Without a magnetic field, a tracking device could not measure charge or momentum, but only position. But when a magnetic field is present, the charged tracks curve, and the charge and momentum of the particle can be determined from the direction and curvature of the track."

Although I won't be directly using this information, it was still fascinating to learn, and it gave me a good sense of how intricate this experiment really was.

The paper also provided a great table listing the resonance, efficiency, mass, and width (among other things) of certain D mesons. I'll be using some of these numbers in my calculations, so it was really helpful to see where these numbers actually came from and how they were calculated.

I spent most of my time trying to make sense of the paper. It was pretty hard to get through, especially since I didn't understand every other word used. At first, I was intimidated by all the technical language and crazy looking graphs, but then I realized that a lot of the calculations that the physicists did just came from the mass-energy-momentum equivalence. Most of particle physics is intimidating at first, but, like the universe, it is remarkably simple at its core.