**Project Member**

Lin Song, one year graduate student from the ACCESS program.

**Project Discription**

Multiple Phenotypes are often collected to study certain diseases, and some of them are intermediate phenotypes. The goal of the project is to develop a technique for association for intermediate phenotypes.

**Goal for the quarter**

Complete the medium project, which is to develop a technique for association for intermediate phenotypes, and test the technique using simulated data.

**First week schedule (project started from Apr 22nd)**

- Background reading:

Pleiotropy and Principal Components of Heritability Combine to Increase Power for Association Analysis.

**May 5, 2009**

- I read the background paper in the last 10 days, and found it's really hard, especially the maths part. So I talked to Prof. Eskin about the project, and he gave me an overview of the project which is pretty helpful.
- Derive association statistics of quantitative trait.
- Derive association statistics of one intermediate phenotype of quantitative trait. The result may look like tag SNP association
- Derive association statistics of one intermediate phenotype corresponding to several quantitative trait.

- Grade: A

**May 7, 2009**

**Quantitative trait locus association analysis**

Assume N individuals are mapped to study a particular disease, and all samples’ trait value t are measured. Since each individual has 2 chromosomes, we have totally 2N chromosomes to test. A quantitative trait is usually decided by both genetic and environmental factors, and many genes may have effect on the trait, so the trait is approximately normal distributed.

Assume SNP A (allele A and allele a) is the SNP to be analyzed and the allele frequency of allele A is p. In the 2N chromosomes, set as the number of A alleles, as the number of a alleles.

The trait value distribution for chromosome containing A is .Here, V is the components of phenotype variance excluding the contribution of SNP A. Similarly, we have .

Because we don’t know the value of V, we use its estimation to replace V. In this way, when , T=

The null hypothesis is, there’s no association, namely . Under the null hypothesis, compute the test statistic T. If or , there’s association at significance level α.

**Grade:A**

**May 25, 2009**

**Intermediate phenotype association**

Now we consider an association analysis of the genotype with intermediate phenotype, and compare it with the above QTL association analysis. Assume the intermediate phenotype is I, the phenotype variance is .

We construct the following disease model. A is the risk allele if SNP A has any effect on the disease trait. Since the genotype influence intermediate phenotype more than the disease trait, we assume that intermediate phenotype depends on the genotype, and the trait value depends on the intermediate phenotype.

Here, I is the intermediate phenotype value; t is the trait value;εrepresent the residue effect and is normally distributed; A is the allele A count; is the A allele effect; is the intermediate phenotype affect on the trait.

Using the above model, we have,

Since both and are normally distributed, is also normally distributed.

As indicated in the first part, is normally distributed and the non-centrality parameter

Similarly,

power is decided by .

If we want to obtain the same power of the two analyses, .

In another aspect, we could calculate the correlation coefficient between I and t.

So more samples are needed to obtain the same power when measuring quantitative trait rather than corresponding intermediate phenotype.

**Grade:A**

**May 27, 2009**

**Intermediate phenotype association**

I make some improvement of the intermediate phenotype association part, based on the discussion with Prof. Eskin.

**Grade:A**

**May 28, 2009**

Change all annotations to relatively formal format.

**Grade:A**

**June 1, 2009**

I did final presentation. See files for the Powerpoint.

**Grade:A**

**June 5, 2009**

**Multiple phenotypes association analysis**

From the discussion above, we know that intermediate phenotype association analyses have more power than common disease phenotype analyses. However, in many cases, we only have access to multiple disease phenotypes, but not intermediate phenotypes. One of the interesting questions in this kind of analyses is, can we still increase the power by using multiple phenotypes information?

In order to answer this question, we construct the following disease model. A genotype determines an intermediate phenotype, and the intermediate phenotype then determines 2 different, but related disease phenotypes (Fig 1).

Fig 1. Multiple phenotypes disease model

Based on the discussion in section 1 and 2, we have

Here, and are the effects of intermediate phenotype on the 2 disease phenotype, but we don’t know whether the effects are positive or negative. In addition, and are the correlation coefficient between intermediate phenotype and disease phenotype M or N, and is the correlation coefficient between M and N. Assume .

As a first step, we could test if power is increased when the average of phenotype M and N are used. We ignore the multiple testing problem here.

(i)

and have the same sign. We could assume .

Based on the disease model, the average of the disease phenotypes is

∴

If we only use disease phenotype M or N,

variance is always greater than 0, and have the same sign.

So, we could gain power or at least get the same power when using in this case.

(ii)

and have different signs. We could assume .

If we only use disease phenotype M or N,

So, we could gain power or at least get the same power when using in this case.

In sum, we could gain power or at least get the same power when combining the information of 2 disease phenotypes.

**Grade:A**

**June 16, 2009**

I check all parts and correct some mistakes.

**Multiple phenotypes association analysis**

Take 2 special cases for example.

First, consider , . ( are not necessary to be 1.) That means M and N always have the same value, just like the same phenotype. So using both of them won’t add any information. Thus, the power is predicted to be the same. According to the above equations, when , which is consistent with out prediction.

Second, consider , which indicates that phenotype M and N are not related to the intermediate phenotype, and thus not related to the disease gene. So the power should be 0. According to the above equations, when , , . This is, again, consistent with our prediction.

**Grade:A**

**I still don't know how to show the equations written by Mathtype here. I'm sorry about that. See files for the complete version.**