Project Description
Parents transmit 1 chromosome to each child. This result in approximately 50% resemblance of DNA between siblings. Relatedness between 2 or more people can be estimated utilizing this fact.
About us
KangWon Lee: Second year M.S Student in Computer Science.
Alfred Heu : Second year M.S Student in Computer Science.
Project Goals
1. Given the genotypes of two individuals, how can we tell if they are siblings or not?
2. SNPs of which MAFs are informative?
3. How many SNPs should be taken into consideration?
-Simulation on randomly generated data based on probability matrix
Probability Matrix
1. Unrelated
A SNP with minor frequency of 0.1
P(AA) = (0.9)2
P(AG) = (0.9)*(0.1)
P(GG) = (0.1)2
The probability of having the allele on both chromosomes
Let ind1 has GG and ind2 has GG. Assume that they are not related.
Pu(GGGG) = P(GG) * P(GG) = (0.1)2 *(0.1)2 =0.0001
2. Related(Full-siblings)
Let ind1 has GG and ind2 has GG. Assume that they are related.
Consider with parents of these two siblings
There are 4 cases of parents that have the children GG and GG
Pr(GGGG) = Pu(AGAG) * 0.25 * 0.25 + Pu(AAAG) * 0.5 * 0.5 * 2 + Pu(GGGG)*1*1 = 0.003025
3. Example
MAF = 0.1 unrelated
MAF = 0.1 related
MAF = 0.4 unrelated
MAF = 0.4 related
Simulations & Results
We randomly generated 10,000 pairs of nonrelated and siblings genotype samples based on the probability matrix.
ex) MAF = 0.1 unrelated 5 SNPs
Estimation Method
Result
- 10,000 pairs of siblings and 10,000 pairs of unrelated individuals
- Number of SNPs
1, 2, 5, 10, 20, 30, 50, 100
- MAFs tested
0.05, 0.1, 0.2, 0.3, 0.4
0.4 had the best result.
Conclusion
40 SNPs(MAF = 0.2~0.4) Error rate < 0.05
0.4 had the best result.
Notable point
Why do the results of lower MAFs have higher error rate?
If we know that sample data has minor allele, lower MAF definitely helps.
But related or not, most of the pairs will fall under AA AA, which makes the result more ambiguous.
Week 10
Progress
Completed coding.
Completed analysis.
Completed presentation preparation.
Plans
Presentation this week.
Grade: A
Week 9
Progress
Completed coding, completed simulation.
Plans
Analyse and prepare for the presentation.
Grade: A
Problems that arose this week
Presentation data.
Problems that were solved this week
Analysis of the output data.
Week 8
Progress
Finished coding for the calculation of the relatedness between Full-siblings and non-related individuals.
Discussed the way to create data for the Siblings and Non-related individuals.
Plans
Complete coding and run simulation to analyse.
Grade: A
Problems that arose this week
How many SNPs should we take into account?
Simulation specific questions arose.
Problems that were solved this week
Decided how to create simulation data.
Week 7
Progress
Asked the proffesor for the method and figured out the simple method to solve relatedness.
Stared coding.
Plans
Code for probability matrix.
Grade: A
Problems that arose this week
How shoud we make the random simulation data?
Problems that were solved this week
Finally decided the method to solve relatedness.
Week 6
Progress
Read published papers
Plans
Make specific plans for coding and simulations.
Grade: A
Problems that arose this week
Published papers were harder to read. Although the papers had many ways to apply the method, most of them
were to hard to apply right in the project term.
Problems that were solved this week
Probability matrix.
Week 5
Progress
Solved basic problems on our slides. Understood the meaning of difference in MAF and relationship.
We understood how MAF could help to find relatedness between two individuals.
Plans
Read published papers
(Estimation of Pairwise Relatedness With Molecular Markers, etc)
Grade: A
Problems that arose this week
Method to solve relatedness with multiple SNPs.
Problems that were solved this week
Basic method to relate relatedness with SNP.
Week 4
Progress
We decided project topic and made out line plans.
Made the project page in wiki.
Plans
Background research.
Grade: A
Problems that arose this week
N/A.
Problems that were solved this week
N/A.