# About me

My name is Frank Chen, a 4th year graduating senior in Computer Science. If everything goes on track, I plan to graduate and take a nice break from school by working and playing. Then it's graduate school. :)

# End of Quarter Goals

I hope to learn more about Computational Genetics. In particular, I hope to use my background in Computer Science statistical methods I have learned in CS 112 and CS 170 to aid my project. These two classes taught me how to statistically look at different plots of data and how to write papers on topics relating to statistical methods and analysis.

# Project Description

I plan on implementing a system that will find the association between diseases for any individual.

The tasks I am currently looking at:

- Easy: Compute risks for multiple disease mutations assuming independence.
- Medium: Estimate the variance of the risk. Figure out when it is worth measuring.
- Hard: Handle different models of interactions between mutations and different disease prevalence levels. Figure out ultimate potential by estimating effect size spectrum of disease.

# Weekly Schedule

## «Week ending 5/1»

I had forgotten to create a wiki, though I had reviewed some of my notes on genetics and statistical methods. I need to figure out how to implement the system, I believe I have a good idea on what methods to use to at least implement the easy project.

For this week, I grade myself: A

## «Week ending 5/8»

I began drafting the analysis and design of the paper, which will help me clarify my own methodologies and priorities on the project.

The answers I began to describe were in the Introduction of the paper:

- How do we do this?

- What assumptions do we need to make?

- When will this be useful?

I need to figure out what language I need to use. I am reviewed the R homework and implemented a Bayesian calculator based on the sample description in the project slides.

For this week, I grade myself: A

## «Week ending 5/15»

I talked to the professor during OH about how to proceed in the project.

I received feedback on what to use for my sample data and how to create a proper report for our class.

For this week, I grade myself: A

## «Week ending 5/22»

I ported what I began on the paper into my power point. However, I am still somewhat confused what exactly to do in terms of measuring differnet SNPs. I am currently having problems getting valid power and ncp values. Anything I try to form with N (population size) that is large seems to skew my data into values that do not make sense.

For this week, I grade myself: A

## «Week ending 5/29»

Unfortunately, due to two papers being due, I still need to visit Professor office hours. I generally know how to form my slideshow and how long to structure my presentation due to the first presentations I saw. The real progress of this week was to restructure some of what I had previously into a presenable material.

For this week, I grade myself: A-

## «Week ending 6/6»

I finally gave my presentation and cleared up some of the finer details on Personalized Medicine I had. I believe my presentation was clear and to the point. The final details I need to put on this page are my actual code and results.

For this week, I grade myself: A

# Results

## Code

```
pas = c(0.05, 0.1, 0.2, 0.3, 0.4);
risk <- function(index, pa, N) {
risk = choose(N,index) * (pa^index)*(1-pa)^(N-index);
return (risk);
}
# 3 case
snps = c(1,2,3);
results3 = outer(snps, pas, risk, 3);
# 4 case
snps = c(1,2,3,4);
results4 = outer(snps, pas, risk, 4);
# 5 case
snps = c(1,2,3,4,5);
results5 = outer(snps, pas, risk, 5);
# 6 case
snps = c(1,2,3,4,5,6);
results6 = outer(snps, pas, risk, 6);
# Results for 3 SNP case
# [,1] [,2] [,3] [,4] [,5]
#[1,] 0.135375 0.243 0.384 0.441 0.432
#[2,] 0.007125 0.027 0.096 0.189 0.288
#[3,] 0.000125 0.001 0.008 0.027 0.064
# Results for 4 SNP case
# [,1] [,2] [,3] [,4] [,5]
#[1,] 0.17147500 0.2916 0.4096 0.4116 0.3456
#[2,] 0.01353750 0.0486 0.1536 0.2646 0.3456
#[3,] 0.00047500 0.0036 0.0256 0.0756 0.1536
#[4,] 0.00000625 0.0001 0.0016 0.0081 0.0256
# Results for 5 SNP case
# [,1] [,2] [,3] [,4] [,5]
#[1,] 0.2036265625 0.32805 0.40960 0.36015 0.25920
#[2,] 0.0214343750 0.07290 0.20480 0.30870 0.34560
#[3,] 0.0011281250 0.00810 0.05120 0.13230 0.23040
#[4,] 0.0000296875 0.00045 0.00640 0.02835 0.07680
#[5,] 0.0000003125 0.00001 0.00032 0.00243 0.01024
# Results for 6 SNP case
# [,1] [,2] [,3] [,4] [,5]
#[1,] 2.321343e-01 0.354294 0.393216 0.302526 0.186624
#[2,] 3.054398e-02 0.098415 0.245760 0.324135 0.311040
#[3,] 2.143437e-03 0.014580 0.081920 0.185220 0.276480
#[4,] 8.460938e-05 0.001215 0.015360 0.059535 0.138240
#[5,] 1.781250e-06 0.000054 0.001536 0.010206 0.036864
#[6,] 1.562500e-08 0.000001 0.000064 0.000729 0.004096
```

## Conclusion

The horizontal axis are each pA value. Each vertical axis represents how many SNPs are activated. I include a correction for different ways to choose k of N SNPs.

Each matrix value represents the probability this many SNPs are activated at this pA

Now, the only step is to determine the threshold risk at each stage.

For example, in the 3 SNP case (look at Results 3):

- If each SNP has risk = 2 Gamma

- If threshold = 4 Gamma

We would include SNP riskiness for 2 (4 Gamma) and 3 (6 Gamma) SNP case

- For pA = .3 (column # 4), we would sum 0.187 + 0.027 = 21.6% disease risk.

Slideshow:FChen Personalized Medicine Powerpoint