Project member
Dat Bach Duong
About me
I am 3rd yr undergrad, and I study molecular cell developmental biology (mcdb) and applied math. Good background in biology and statistics/probability theory. Little knowledge in programing.
Description of the project
Sometimes, genotype fails to recognize a nucleotide and reduces actual frequency of a SNP. This project tries to estimate bias due to missing data, and to fix these errors.
Goal for end of quarter
Goals:
1. Find new association, and fix association studies (easy).
2. Find new power, and fix power studies (medium-optional)
Weekly schedule
Work on this project on Tuesday and Thursday.
April 23, 2009
::Come up with simple models and attempt to solve them::
Machine has failure probability 'e', so it will not read a SNP with probability e.
1st model: let's say we use this machine only for "cases", what are new parameters?
2nd model: use this machine for "controls", what are new parameters?
3rd model: use this machines for "cases" and "controls", what are new parameters?
In each model, how does power change?
In progress of solving …
Evaluation of week: Fair.
April 28, 2009
Machine has failure probability 'e', so it will not read a SNP with probability e.
:: Found new statistics to 1st model::
1st model: let's say we use this machine only for "cases", what are new parameters? power?
new distribution
http://docs.google.com/Doc?id=df2h9psr_616cprdpd6
new association
http://docs.google.com/Doc?id=df2h9psr_59gvdv7nft
new power
http://docs.google.com/Doc?id=df2h9psr_63f4m7spfh
In progress of solving next 2 models …
Evaluation of week: Good.
May 3, 2009
Machine has failure probability 'e', so it will not read a SNP with probability e.
:: Found new statistics to 2nd model::
2nd model: let's say we use this machine only for "controls", what are new parameters? power?
new distribution
http://docs.google.com/Doc?id=df2h9psr_67ffqj4mdp
new association
http://docs.google.com/Doc?id=df2h9psr_65ch7nbrhh
new power
http://docs.google.com/Doc?id=df2h9psr_69dd5ftvdj
:: Found new statistics to 3rd model::
3rd model: use this machines for "cases" and "controls", what are new parameters? power?
new distribution
http://docs.google.com/Doc?id=df2h9psr_616cprdpd6
http://docs.google.com/Doc?id=df2h9psr_67ffqj4mdp
new association
http://docs.google.com/Doc?id=df2h9psr_53gkcv9jcw
new power
http://docs.google.com/Doc?id=df2h9psr_55hhrhxndp
Evaluation of week: Excellent.
… need to interpret answers
May 10, 2008
::Edited association and power for model 3
::Was able to interpret association in each model.
::Finished putting some # in each association/power study and see what happens to each model.
Discovered followings
(1) In each model, 'non-error model' variance and 'error model' variance are strikingly similar.
link to R code: http://docs.google.com/Doc?id=df2h9psr_454rbqv2dk
note: calculation ignores N b/c N is same in both situations ('no-error' or 'error').
(2) Bias due to sequencing errors: Association study
In model #1 #2, there are changes to association studies:
model 1: http://docs.google.com/View?id=df2h9psr_72gx26243m
model 2: http://docs.google.com/View?id=df2h9psr_73hdz2s9dz
In model #3, there is no change to association study: (no need for correction)
model 3: http://docs.google.com/View?id=df2h9psr_71fc4v3dtr
(3) Bias due to sequencing errors: Power study
Model #1 #2, power fluctuates depending on situation:
model 1: http://docs.google.com/View?id=df2h9psr_76cgznz6dk
model 2: http://docs.google.com/View?id=df2h9psr_77hqht3xhj
Model #3, power always decreases:
link to R code: http://docs.google.com/View?id=df2h9psr_74cdk82vcx
… need to find how to fix these associations and power.
… need to interpret power studies.
Evaluation of week:Excellent
May 11, 2009
From 3 observations made above, I came up with simple versions for association/power studies for each model
model 1: http://docs.google.com/Doc?id=df2h9psr_47czgcvxcj
model 2: http://docs.google.com/Doc?id=df2h9psr_49c5fxkwwc
model 3: http://docs.google.com/Doc?id=df2h9psr_51rbmshvfq
… need to find how to fix these associations and power.
Evaluation of week:Excellent
May 19, 2009
:: Found how to fix these statistics::
Model 1: (multiply C, C < 1)
association - http://docs.google.com/View?id=df2h9psr_178g9g5wbdt
Model 2: (multiply C, C < 1)
association - http://docs.google.com/View?id=df2h9psr_180dtmvkvg2
Model 3:
association - no need for correction: http://docs.google.com/View?id=df2h9psr_184dg3q4jf4
power - http://docs.google.com/View?id=df2h9psr_182f3j256g9
Also these are available from link to presentation slides (below) too.
Sum up all data.
Make conclusion.
Make slides.
Evaluation of week:Excellent
May 25, 2009
:: presentation :: http://docs.google.com/Presentation?id=df2h9psr_186hmktxqc4
(for unknown reasons, gmail.com changes some arrangements in some slides, so this one here is a bit messy)
Note: about my slides,
(1) if you just want to know the answers, scroll down to 'Conclusions' section.
(2) if you really want to understand the whole thing, you have to click on links that explain the math behind each model.
** End of project: Finished easy question and a large portion of medium question. **
Evaluation of week:Excellent