Because parents transmit chromosomes directly to their children, we should be able to determine relationships between individuals based on their genotypes. Looking at similarities between alleles in two individuals, I want to be able to determine a sibling relationship.
Hey, I'm Samir Uppaluru, a sophomore Computer Science student. My interest in genetics and human evolution led me to take this course. I'm a huge music buff and I play the drums in my spare time.
To construct a method for determining whether 2 individuals are siblings based on their genotypes.
Week 1: Research background, familiarize with problem
Week 2: Check out HapMap, figure out how to use data
Week 3: Plan out an algorithm to code a sibling checker
Week 4: Implement code and test
Did preliminary research on project. Read up on the genetics and biology behind the problem to give me a better understanding. Familiarized myself with the HapMap website. Sketched out some very basic ideas on how I would theoretically go about solving this problem.
A- for the week, did what I set out to do but started later than I should have.
Found scientific articles on relatedness through Google Scholar. Tried to follow them through to develop a similar method for finding sibling relationships, but found the articles extremely complicated and hard to follow. Their methods are far more complex than the general ideas I came up with last week.
Tried to get HapMap data to work with, but its much harder to work with then I assumed. Very few sibling pairs, if any, I'd have to scour through all the relationship data to hopefully find some. Doesn't seem like enough test cases to fully test any method I might build.
B for the week, didn't do much real work as any method I come up with seems inefficient for the problem and I don't really know how I'm going to work with the HapMap data for the project.
Went to Professor Eskin's hours to ask for help with the problem. Turns out I was thinking about it in the wrong way. I don't even have to use HapMap data, I'll create my own hypothetical data using my system. He explained a method for solving the problem that is within my coding capabilities.
Planned out the classes for my system: UnrelatedMatrix, RelatedMatrix, PairData, and the SiblingEstimator function. Decided Python should be a suitable language to code in, if only because I could use the practice.
Setup my development environment using Eclipse/Pydev extension. Coded and tested the UnrelatedMatrix and RelatedMatrix classes.
I'd give myself an A for the week, lots of progress made.
Implemented the PairData and SiblingEstimator funcitons. While testing, found bugs with the RelatedMatrix class—had to go back and rethink and reimplement the code for that class. Finally, had a working system.
Started running tests using the system, changing variables and recording results in a spreadsheet. Used data to make graphs regarding variability in number of SNPs used per pair and the minor allele frequency.
Created powerpoint presentation documenting problem, the process, and the results.
The code, data, and presentation are attached.
A for the week.