Relatedness Estimator - KangWon Lee, Alfred Heu

# Project Description

Parents transmit 1 chromosome to each child. This result in approximately 50% resemblance of DNA between siblings. Relatedness between 2 or more people can be estimated utilizing this fact.

KangWon Lee: Second year M.S Student in Computer Science.
Alfred Heu : Second year M.S Student in Computer Science.

# Project Goals

1. Given the genotypes of two individuals, how can we tell if they are siblings or not?
2. SNPs of which MAFs are informative?
3. How many SNPs should be taken into consideration?
-Simulation on randomly generated data based on probability matrix

# Probability Matrix

1. Unrelated

A SNP with minor frequency of 0.1
P(AA) = (0.9)2
P(AG) = (0.9)*(0.1)
P(GG) = (0.1)2
The probability of having the allele on both chromosomes
Let ind1 has GG and ind2 has GG. Assume that they are not related.
Pu(GGGG) = P(GG) * P(GG) = (0.1)2 *(0.1)2 =0.0001

2. Related(Full-siblings)
Let ind1 has GG and ind2 has GG. Assume that they are related.
Consider with parents of these two siblings
There are 4 cases of parents that have the children GG and GG

Pr(GGGG) = Pu(AGAG) * 0.25 * 0.25 + Pu(AAAG) * 0.5 * 0.5 * 2 + Pu(GGGG)*1*1 = 0.003025

3. Example
MAF = 0.1 unrelated

MAF = 0.1 related

MAF = 0.4 unrelated

MAF = 0.4 related

# Simulations & Results

We randomly generated 10,000 pairs of nonrelated and siblings genotype samples based on the probability matrix.
ex) MAF = 0.1 unrelated 5 SNPs

Estimation Method

Result
- 10,000 pairs of siblings and 10,000 pairs of unrelated individuals
- Number of SNPs
1, 2, 5, 10, 20, 30, 50, 100
- MAFs tested
0.05, 0.1, 0.2, 0.3, 0.4

# Conclusion

40 SNPs(MAF = 0.2~0.4) Error rate < 0.05

Notable point
Why do the results of lower MAFs have higher error rate?
If we know that sample data has minor allele, lower MAF definitely helps.
But related or not, most of the pairs will fall under AA AA, which makes the result more ambiguous.

Week 10
Progress

Completed coding.
Completed analysis.
Completed presentation preparation.

Plans
Presentation this week.

Week 9
Progress

Completed coding, completed simulation.

Plans
Analyse and prepare for the presentation.

Problems that arose this week
Presentation data.

Problems that were solved this week
Analysis of the output data.

Week 8
Progress

Finished coding for the calculation of the relatedness between Full-siblings and non-related individuals.
Discussed the way to create data for the Siblings and Non-related individuals.

Plans
Complete coding and run simulation to analyse.

Problems that arose this week
How many SNPs should we take into account?
Simulation specific questions arose.

Problems that were solved this week
Decided how to create simulation data.

Week 7
Progress

Asked the proffesor for the method and figured out the simple method to solve relatedness.
Stared coding.

Plans
Code for probability matrix.

Problems that arose this week
How shoud we make the random simulation data?

Problems that were solved this week
Finally decided the method to solve relatedness.

Week 6
Progress

Plans
Make specific plans for coding and simulations.

Problems that arose this week
Published papers were harder to read. Although the papers had many ways to apply the method, most of them
were to hard to apply right in the project term.

Problems that were solved this week
Probability matrix.

Week 5
Progress

Solved basic problems on our slides. Understood the meaning of difference in MAF and relationship.
We understood how MAF could help to find relatedness between two individuals.

Plans
(Estimation of Pairwise Relatedness With Molecular Markers, etc)

Problems that arose this week
Method to solve relatedness with multiple SNPs.

Problems that were solved this week
Basic method to relate relatedness with SNP.

Week 4
Progress

We decided project topic and made out line plans.
Made the project page in wiki.

Plans
Background research.