# Project Description

This project is designed to compute the risk for a certain individual of having various diseases based on their genetic makeup assuming independence. The goal is to give individuals insight into what diseases they are more prone to and hopefully give them and their physicians more information and more importantly specific information that can be used to increase the quality of their care. This project will be attempted by using known information about genetic information and looking at how at how applying it specifically to particular genetic codes allows us to extract this desired information.

### About Me

My name is William Schwarz. I am a senior computer science undergraduate student at UCLA. I believe this project will be quite interesting especially in analyzing whether or not this type of analysis proves useful. I am hoping to be able to find a useful, efficient and reusable methodology for completing these types of tasks in a general manner.

Final Goal: To be able to make verifiable statements about how risk can be identified and if the information is useful.

## Weekly Goals

Week 6 - Identify diseases and known information about the genetic codes that increase risk factors for these diseases.

Week 7 - Gather data to analyze

Week 8 - Research different methods of analyzing risk.

Week 9 - Generate a method of calculating risk that can be used for this type of analysis.

## Weekly Progress

### Week 6

1) I have planned out the project and begun to gather general knowledge about the methods and practices in completing this kind of task.

2) Set up the wiki.

3) Grade: A

### Week 7

1) Researched the various companies that do personalized medicine based on genetics and found some useful information pertaining to the methods that they use. Interestingly, information gleaned from association studies is not given as relative risk (the value we used for calculations in class). I did find a method for converting odds ratios (which association studies presents risk information as) into relative risk. While this doesn't specifically apply to this project, I'm going to include it in the project because I found it to be interesting and useful. (See slides)

2) Last week I formed a general outline for how I was going to approach this project.

3) Compared to my plan I differed pretty significantly because I found out that I don't need specific data to complete this project since this is more of an analysis of the process rather than an analysis of the data. Instead, I researched the methodologies companies use to perform these tasks to give a better picture of how I can approach the problem.

4) Grade: A

### Week 8

1) Worked out the general algorithm and methodology behind how multiple SNPs interact and how they affect the overall risk of contracting a disease. They way I performed the analysis of multiple SNPs was to use the multiplicative model. This means that joint conditional probabilities can be simplified into products yet still provide a very good representation of the value we are looking for(risk). This means that if we have 2 SNPs both having a RR of 1.2 then the overall risk to an individual would be RR(x1)RR(x2) = 1.44 (assuming the individual has both of the risk alleles). Now this method requires a very important assumption of independence because if the SNPs are correlated in some way this representation would not be accurate.

2) Last week I did research on different companies that do genetic testing on individuals and give them their risk factors

3) This week I completed what I had planned.

4) Grade: A

### Week 9

1) Prepared the presentation and wrote out the slides. This required to some number crunching which I tried to complete in R but found the overhead of using R unnecessary for the calculations I needed to perform. Instead I used Excel because this comparison lead itself well to table formats.

2) Last week I came up with a methodology to attack this problem.

3) I sort of did this week's plan last week so I went ahead and prepared the presentation instead.

4) Grade: A

### Week 10

1) Will present the presentation on Wednesday. Still need to finish up the wiki as well.

2) Last week I prepared the presentation.

3) Did not plan a tenth week out.

4) Grade: Final Grade: ?

## Final Notes and Information

### Starting Assumptions

- 2 disease mutations each increasing an individual's risk by 20%
- Assume a 5% frequency for the disease in the population
- Assume MAF of 25%

### Qualities for Worthiness

Chosen Arbitrarily

- >5% prevalence in the population
- RR > 2.0

### Notations

- RR = relative risk
- x1,x2 = SNPs 1 and 2
- F = disease prevalance
- p = allele frequency

Risk Calculation

Individual has | RR(x1) | RR(x2) | Total RR | F | Total Risk |

(-x1,-x2) | 1 | 1 | 1 | 0.05 | 0.05 |

(x1, -x2) | 1.2 | 1 | 1.2 | 0.05 | 0.06 |

(-x1, x2) | 1 | 1.2 | 1.2 | 0.05 | 0.06 |

(x1, x2) | 1.2 | 1.2 | 1.44 | 0.05 | 0.072 |

Allele Frequency

Individual has | MAF x1 | MAF x2 | total MAF |

(-x1, -x2) | 0.75 | 0.75 | 0.5625 |

(x1, -x2) | 0.25 | 0.75 | 0.1875 |

(-x1, x2) | 0.75 | 0.25 | 0.1875 |

(x1, x2) | 0.25 | 0.25 | 0.0625 |

Larger Example

A MAF=0.2 RR=2.0

B MAF=0.4 RR=1.5

(see slides)

The MAF calculation is done using [p=probability of the risk allele] and the three possible situations per allele: having both risk alleles, having one of each and having both non-risk alleles.

The calculation then becomes having poth: p^2, having one of each: 2(p)(1-p), and having both non-risk alleles: (1-p)^2.

For more than one SNP consider all combinations i.e. [both risk alleles of SNP A and one of each on SNP B]. Then multiply the values above. i.e. (p(A)^2)*(2*(p(B))*(1-p(B))). This scales to any number of SNPs.

To calculate the risk take the risk values for each SNP combination. i.e. [having both means RR:1.2, neither has RR:1.0]. Multiply all the RR for the combination in question for multiple SNPs and multiply by F (disease prevalence).

This will give you the overall risk of having the disease for each SNP possibility.

As we can see multiple mutations has a multiplicative effect on the total risk. This means the more SNPs we can analyze the clearer the risk will be.

It is important to note that the RR of 2.0 per risk allele is really high and we will probably not find a risk allele with a RR this high. Most RR for risk alleles are ~1.2 or lower. This means that in order to meet our threshold of total RR of 2.0 we would need the individual to have 4 risk alleles of RR: 1.2. This is generally very rare because if we have a MAF of 25% (which is really too high as well). The chances of an individual having all 4 would be 0.00390625. This is well below our prevalence threshold of 5%. So for the average case an individual is not going to meet our threshold of risk that implies that it is worth testing them.

This doesn't mean no one should be tested and in some cases it will give an individual valuable information about their disease risk which may save them from suffering later in life. However, for the average case this kind of testing is not statistically worthwhile.

Mini Bibliography:

Navigenics

23 and me

Part of a series of articles about Navigenics

article about deCODEme (not sure some of its criticisms are valid)

Association study of coronary heart disease (I wanted to see how the association study claimed results)

Link to the specific SNP I used as an example in the slides

Information about one of the companies policies

Google Book about genetic testing

Risk Calculation History (shows how much things are changing)

Presentation:

Presentation Slides