My name is Kha Chu, I'm a fourth year Computer Science & Engineering major.

I plan my final project to be about imputation. Due to fiscal limitations, an association study doesn't get every SNP from an individual, and thus data can be missing when trying to find the causes of diseases. However, some missing SNPs may be closely related to SNPs that were taken, and those can be related to others. With this, we may be able to infer the entire genome, with increasing uncertainty as more inferrences are made.

I will take info from the HapMap, getting all of the SNPs for one individual. I will use any imputation software I may find on a random number of SNPs and compare the inferred genome to the one I received from the HapMap.

My goal is to see how imputation software reacts to different data sets, to get a more accurate guess at the entire set of SNPs.

1. Get data from HapMap, then find and runimputation software on random set of SNPs to get a feel for it.
2. Create a control set of SNPs for n=10,20,30 individuals. (Counts subject to change.) Run imputation software on them.
3. Create a set of SNPs with correlation coefficient r greater than a certain amount of n=10,20,30 individuals. Run imputation software on it.
4. Create a set of SNPs with a minimum distance between SNPs. Run imputation software on the sets.
5. Compare data and try to make an analysis of results.

1. Create the Imputation software that takes data from the HapMap and does a character comparison.
2. Test on inputs, varying the window size of the imputation region, number of input chromosomes, and number of SNPs taken.
3. Record and compare data.

04-30-2009 10:25 PM - No progress, midterms week. Will do work by next Thursday. My grade: C
05-07-2009 - Changed plan to creating the algorithm. My grade: C
05-14-2009 - No Progress. My grade: D
05-21-2009 - Completed creating imputation software. My grade: B
05-31-2009 - Completed testing. My grade: A

