# Introduction

**Project Member**

My name is Sharon. I'm a 4th year undergrad in Computer Science and Engineering. If nothing goes wrong, I'll be graduating after this quarter, so this is one of my last classes, yay! After graduation, I'm going to start working full-time at, coincidentally, the UCLA School of Medicine's Department of Human Genetics. So I'll be staying around here for a while longer….My personal hobbies and interests don't really overlap with the things I do here at school, although I do think CS is fun. Hopefully, this project will be as well.

**Project Description**

I am going to do project 13, which is personalized medicine. Since I'm an undergrad, I'll probably start off with the easy part and then see if I have time to tackle anything more complex. I really have no idea how to start, actually….so I may be a little screwed over. In fact, I'm still trying to decide if I'll stick with this topic. But since I've made the page, I don't think I can go back anymore.

**Goal for the Quarter**

Hopefully my results will be pretty good and I'll have time to try for the more difficult parts of the project as well. Since I work at Human Genetics, it would be cool to learn more about human genetics as well.

# Project Log

**Progress**

WEEK 4 - 04/23/09

I have R on my computer, although I still don't quite know how to use it well. Again, like I said before, I don't really have a clue on how to start. However, I plan to go over the lecture slides to see if anything in there will help me out. Also, I'll be trying to do some research on this topic as well. It seems like personalized medicine is a topic that is pretty well-established in this field, so I don't think it'll be too difficult to find information about it. I only worry that they'll all be very dense papers that I won't be able to comprehend. Anyway, since the assignment this week was to create the wiki page and since I have done so, I'll give myself an A-. The minus is for the fact that I haven't started the research yet. I think I'll blame it on the fact that the midterm is on Monday and I'm not so prepared.

WEEK 5 - 04/30/09

I have done some research on the topic and tried to look for more information online. Although I'm still a bit hazy on how to start the project, I have looked at some tutorials for R and the documentation for some of the Bioconductor packages to see if any of it will be helpful to me. For this next week, I will continue to do background research on the topic and see if I can figure out what methods I'll use to make the calculations. I'll also look into the different kinds of diseases that I can work on for the project. For this week, I give myself an A.

WEEK 6 - 5/07/09

I tried to download data from the HapMap website but was confused about what exactly I should be downloading and in what format. I toyed around with it on my own and emailed the TA but unfortunately didn't get a reply =(. After looking at the slides for lecture 6, I was able to figure out what data to download in order for me to continue. However, I've been experiencing some trouble with getting the BioMart to work on my browser so the data still isn't in my hand yet. I will continue to try on different machines until I can get the data. I will also talk to the TA about what I should do once I attain the data. In addition, I have started drafting up an outline of the paper/report for this project. This has allowed me to understand clearly all the things I will have to accomplish and also the methodologies that I will need to employ in order to accomplish those things. For this week, I give myself an B+.

WEEK 7 - 5/14/09

So…I forgot to update this wiki on Thursday….=(…..hope Saturday isn't too late. After receiving help from the Professor regarding how to go about this project, I am much more confident and certain of the direction I should be taking. I regret that I spent two weeks or more without asking for help, since I would have saved a lot of time if I wasn't trying to get the HapMap website to work, trying to figure out what data to download, etc, etc. Oh well, it's not too late. Since I am much clearer on my direction, I believe I will be able to complete this project in no time. I had originally started drafting up a report but the Professor recently said that there is no report necessary. Thus, I will probably try to move some of the content I already have into the powerpoint slides that I am preparing for the presentation. Of course, I will continue with writing up the equations and math-related things that are needed for the project. After getting my results, I will organize them neatly into a powerpoint for easy and clear presenting. For this week, I give myself a B+, since I updated late =(.

WEEK 8 - 5/21/09

So…I was originally planning to finish and present tomorrow in discussion. However, since I wasn't completely sure about my work and if I had done enough, I decided to wait another week. Progress is going well, I'm almost done with my presentation slides. I have some calculations to do and also I have to check in with the professor to make sure that my work so far isn't completely wrong. For this week, I'll give myself an A.

WEEK 9 - 5/28/09

So…again, I was planning to finish by this week, but after talking with the professor, I realized I had to change a few things in my work. Now I have a good idea of what exactly needs to be done and what to present. I just have to finish preparing my slides and calculations. Hopefully I will be done by early next week and get the presentation over with. For this week, I give myself an A.

WEEK 10 - 6/4/09

Yay! I presented yesterday! I am pretty much finished with the project and the slides. Luckily, I was able to complete the work that the Professor expected for the easy project. I don't think there's anything else I need to add. I'm very happy that I was able to fully understand the project and what I did, instead of just doing the work without full understanding. I will be adding my work/slides onto this wiki page shortly. For this week, I give myself an A.

# Project Work

**Personalized Medicine
Sharon Tang
Com Sci M124: Computational Genetics
UCLA Spring 2009**

**Introduction**

- With the technological advances of today, humans can now easily (although not necessarily cheaply) submit their DNA samples and get their genes examined.

- How has this newfound power affected us as humans? Has it helped us?

- What are its benefits from the personal health standpoint? How about from the reproductive and hereditary standpoints?

**Project Focus**

- For this specific project, we will be talking about one aspect in which the technology of today can benefit us.

- Using the devices and methods we have, we can utilize genetic mutation (SNP) examinations in order to predict and determine the risk of certain individuals having some particular disease or disorder.

**Project Specification**

- Consider 2 disease-causing SNPs, each increasing the disease risk by 20%.

- Assume the disease has prevalence of 5% in the human population.

- What assumptions do we need to make regarding how they interact to measure the risk of an individual that has both? That has only one? That has neither?

- Does an individual with both mutations have enough risk to make it worth it to test?

- Companies based on this idea: 23andMe, Navigenics.

- My focus: Compute risks for multiple disease mutations assuming independence AKA the Easy project

**Relative Risk Refresher**

- Disease prevalence (F) is the probability that a random person has some disease

- Minor allele frequency (p) is the probability that a random person has some SNP

- Relative risk (γ) is the amount of times more likely that a random person will have some disease given that they have some SNP versus not having it

**Formulas**

- F0 = P(+|S=0) = risk of disease without the SNP

- F1 = P(+|S=1) = risk of disease with the SNP

- (1-p) of the population have F0 risk, p of the population have F1 risk

- γ= F1/F0 = P(+|S=1) / P(+|S=0)

and thus

F1 = F0 * γ

**Conditions**

- Not all SNPs are easy to examine or even beneficial… where should we draw the line? How to determine if it’s worth the money or not?

Some example conditions:

- When relative risk is significantly high (e.g. γ> 2.0), meaning that having the SNP more than doubles one’s risk for the disease

- When the SNP is relatively common (e.g. p > 0.20), meaning that more than 20% of the population has this SNP

**The Single SNP Case**

This is simple and straightforward.

- If you set up a standard for when something is worthy to be tested (say, γ > 3.0 and p > 0.2), then in the 1-SNP case, naturally γ > 3.0 and p > 0.2

- There is no extra math involved because there are no SNPs working together to change the values of risk and frequency.

- Obviously, this changes in cases where you are studying multiple SNPs.

**The 2-SNP Case**

Now we consider the case where we have 2 SNPs affecting the same disease

- Now there are 4 possibilities and 2 values of γ

Case # |
SNP 1 |
SNP 2 |
risk |
proportion |

1 | N | N | F0 | (1-p1)(1-p2) |

2 | N | Y | γ2F0 | (1-p1)(p2) |

3 | Y | N | γ1F0 | (p1)(1-p2) |

4 | Y | Y | γ1γ2F0 | (p1)(p2) |

**The Multi-SNP Case**

When we consider cases with more than 1 SNP, the individual values of γand p do not need to be as high in order for disease risk to be significant

- There must be general formulas with which we can more simply calculate the important values for multi-SNP cases

- What are these formulas then?

**More Formulas**

When looking at a case with SNP 1 through SNP n, then the general formulas are as follows:

- Proportion of population falling into each category = ∏1 to n (Pi Si • (1- Pi )(1 – Si))

- New disease risk of each unique combination of SNPs = F0 • ∏1 to n (γi)Si

- In both these cases, Si is equal to 1 if the person has the SNP and 0 if the person does not have the SNP

**The 3-SNP Case**

Case # |
SNP 1 |
SNP 2 |
SNP 3 |
Risk |
Proportion |

1 | N | N | N | F0 | (1-p1)(1-p2)(1-p3) |

2 | Y | N | N | γ1F0 | (p1)(1-p2)(1-p3) |

3 | Y | Y | N | γ1γ2F0 | (p1)(p2)(1-p3) |

4 | N | Y | N | γ2F0 | (1-p1)(p2)(1-p3) |

5 | N | Y | Y | γ2γ3F0 | (1-p1)(p2)(p3) |

6 | N | N | Y | γ3F0 | (1-p1)(1-p2)(p3) |

7 | Y | N | Y | γ1γ3F0 | (p1)(1-p2)(p3) |

8 | Y | Y | Y | γ1γ2γ3F0 | (p1)(p2)(p3) |

**The 4-SNP Case**

Case # |
SNP 1 |
SNP 2 |
SNP 3 |
SNP 4 |
Risk |
Proportion |

1 | N | N | N | N | F0 | (1-p1)(1-p2)(1-p3)(1-p4) |

2 | Y | N | N | N | γ1F0 | (p1)(1-p2)(1-p3)(1-p4) |

3 | Y | Y | N | N | γ1γ2F0 | (p1)(p2)(1-p3)(1-p4) |

4 | Y | N | Y | N | γ1γ3F0 | (p1)(1-p2)(p3)(1-p4) |

5 | Y | N | N | Y | γ1γ4F0 | (p1)(1-p2)(1-p3)(p4) |

6 | Y | Y | Y | N | γ1γ2γ3F0 | (p1)(p2)(p3)(1-p4) |

7 | Y | Y | N | Y | γ1γ2γ4F0 | (p1)(p2)(1-p3)(p4) |

8 | Y | N | Y | Y | γ1γ3γ4F0 | (p1)(1-p2)(p3)(p4) |

9 | N | Y | N | N | γ2F0 | (1-p1)(p2)(1-p3)(1-p4) |

10 | N | Y | Y | N | γ2γ3F0 | (1-p1)(p2)(p3)(1-p4) |

11 | N | Y | N | Y | γ2γ4F0 | (1-p1)(p2)(1-p3)(p4) |

12 | N | Y | Y | Y | γ2γ3γ4F0 | (1-p1)(p2)(p3)(p4) |

13 | N | N | Y | N | γ3F0 | (1-p1)(1-p2)(p3)(1-p4) |

14 | N | N | Y | Y | γ3γ4F0 | (1-p1)(1-p2)(p3)(p4) |

15 | N | N | N | Y | γ4F0 | (1-p1)(1-p2)(1-p3)(p4) |

16 | Y | Y | Y | Y | γ1γ2γ3γ4F0 | (p1)(p2)(p3)(p4) |

**Defining Our Conditions**

As we said before, we have to set some standard conditions and define what is useful to us

- For our specific case, we try and set γ> 4.0, meaning that having the SNP must more than quadruple one’s risk for the disease

- We also set p > 0.10, meaning that more than 10% of the population must have this

**Applying Our Conditions**

Obviously, for the single-SNP case, the specific conditions will have to be γ> 4.0 and p > 0.10

For the 2-SNP case, here are some possible values that would make it worth testing:

- γ1 = 4.0 and γ2 = 4.0 and p1 = 0.06 and p2 = 0.06

Case |
Risk |
Frequency |

Neither | F0 | 88.36% |

SNP 1 | 4F0 | 5.64% |

SNP 2 | 4F0 | 5.64% |

Both | 16F0 | 0.36% |

- This satisfies because the frequency of cases in which the risk is greater than 4.0 is 11.64% which is greater than 10%

- In this case, the individual MAF values of the SNPs don't need to be as high as 0.10 but can still achieve the criteria.

**Applying Our Conditions**

How about another example?

- γ1= 2.0 and γ2= 2.0 and p1 = 0.32 and p2 = 0.3 2

Case |
Risk |
Frequency |

Neither | F0 | 46.24% |

SNP 1 | 2F0 | 21.76% |

SNP 2 | 2F0 | 21.76% |

Both | 4F0 | 10.24% |

- This satisfies because the frequency of cases in which the risk is greater than 4.0 is 10.24% which is greater than 10%

- In this case, the individual risk values of the SNPs don’t need to be as high as 4.0 but can still achieve the criteria.

**Applying Our Conditions**

The 3-SNP case gets even more interesting.

- γ1 = 3.0 and γ2 = 3.0 and γ3 = 3.0

- p1 = 0.2 and p2 = 0.2 and p3 = 0.2

Case |
Risk |
Frequency |

None | F0 | 51.2% |

One out of the three | 3F0 | 3*(.2)^1*(.8)^2 = 38.4% |

Two out of the three | 9F0 | 3*(.2)^2*(.8)^1 = 9.6% |

All three | 27F0 | 0.8% |

- This satisfies our criteria because 10.4% of the population have a risk greater than 4.0 and that is higher than 10%

**Applying Our Conditions**

How about another example?

- γ1 = 4 and γ2 = 4.0 and γ3 = 4.0

- p1 = 0.05 and p2 = 0.05 and p3 = 0.05

Case |
Risk |
Frequency |

None | F0 | 85.74% |

One out of the three | 4F0 | 3*(.05)^1*(0.95)^2 = 13.54% |

Two out of the three | 16F0 | 3*(.05)^2*(0.95)^1 = 0.71% |

All three | 64F0 | 0.01% |

- This satisfies our criteria because 14.26% of the population have a risk greater than 4.0 and that is higher than 10%

**Applying Our Conditions**

The 4-SNP case gets very interesting and exciting.

- γ1 = 2.0 and γ2 = 2.0 and γ3 = 2.0 and γ4 = 2.0

- p1 = 0.15 and p2 = 0.15 and p3 = 0.15 and p4 = 0.15

Case |
Risk |
Frequency |

None | F0 | 52.20% |

One out of the four | 2F0 | 4*(.15)^1*(0.85)^3 = 36.85% |

Two out of the four | 4F0 | 6*(.15)^2*(0.85)^2 = 9.75% |

Three out of the four | 8F0 | 4*(.15)^3*(0.85)^1 = 1.15% |

All four | 16F0 | 0.05% |

- 10.95% of the population have a risk greater than 4.0

- The values needed for individual γ is lower than before.

**Conclusions**

- As the number of SNPs increases, the individual values for γ and p needed to make it worth testing will vary.

- Hypothesis: As number of SNPs increases, the values that are required will decrease

- It requires pretty high numbers for risk and MAF in order to be “worth testing.” Unfortunately, in reality, many SNPs don’t have such high values.

- Thus, while personalized medicine and the study of gene SNPs to determine disease risk is useful, it is not as effective as it is made to seem.

**Closing Comments**

When interested in testing your genes for potential risk of disease and disorders, there are a few things to keep in mind:

- Not all diseases are necessarily worth checking for (prevalence may be very low, disease is very rare)

- Not all SNPs are necessarily worth checking on (relative risk may be low, SNP may be rare)

Of course, these are just generalizations. If you have a good reason to check for something and have the financial ability to do so, by all means, GO FOR IT! Especially if it’ll make you feel better about your health.