Handling Errors in Association - datduong

Project member

Dat Bach Duong

About me

I am 3rd yr undergrad, and I study molecular cell developmental biology (mcdb) and applied math. Good background in biology and statistics/probability theory. Little knowledge in programing.

Description of the project

Sometimes, genotype fails to recognize a nucleotide and reduces actual frequency of a SNP. This project tries to estimate bias due to missing data, and to fix these errors.

Goal for end of quarter

Goals:
1. Find new association, and fix association studies (easy).
2. Find new power, and fix power studies (medium-optional)

Weekly schedule

Work on this project on Tuesday and Thursday.


April 23, 2009

::Come up with simple models and attempt to solve them::

Machine has failure probability 'e', so it will not read a SNP with probability e.

1st model: let's say we use this machine only for "cases", what are new parameters?
2nd model: use this machine for "controls", what are new parameters?
3rd model: use this machines for "cases" and "controls", what are new parameters?

In each model, how does power change?

In progress of solving …

Evaluation of week: Fair.


April 28, 2009

Machine has failure probability 'e', so it will not read a SNP with probability e.

:: Found new statistics to 1st model::

1st model: let's say we use this machine only for "cases", what are new parameters? power?

new distribution
http://docs.google.com/Doc?id=df2h9psr_616cprdpd6

new association
http://docs.google.com/Doc?id=df2h9psr_59gvdv7nft

new power
http://docs.google.com/Doc?id=df2h9psr_63f4m7spfh

In progress of solving next 2 models …

Evaluation of week: Good.


May 3, 2009

Machine has failure probability 'e', so it will not read a SNP with probability e.

:: Found new statistics to 2nd model::

2nd model: let's say we use this machine only for "controls", what are new parameters? power?

new distribution
http://docs.google.com/Doc?id=df2h9psr_67ffqj4mdp

new association
http://docs.google.com/Doc?id=df2h9psr_65ch7nbrhh

new power
http://docs.google.com/Doc?id=df2h9psr_69dd5ftvdj

:: Found new statistics to 3rd model::

3rd model: use this machines for "cases" and "controls", what are new parameters? power?

new distribution
http://docs.google.com/Doc?id=df2h9psr_616cprdpd6
http://docs.google.com/Doc?id=df2h9psr_67ffqj4mdp

new association
http://docs.google.com/Doc?id=df2h9psr_53gkcv9jcw

new power
http://docs.google.com/Doc?id=df2h9psr_55hhrhxndp

Evaluation of week: Excellent.

… need to interpret answers


May 10, 2008

::Edited association and power for model 3
::Was able to interpret association in each model.
::Finished putting some # in each association/power study and see what happens to each model.

Discovered followings

(1) In each model, 'non-error model' variance and 'error model' variance are strikingly similar.
link to R code: http://docs.google.com/Doc?id=df2h9psr_454rbqv2dk
note: calculation ignores N b/c N is same in both situations ('no-error' or 'error').

(2) Bias due to sequencing errors: Association study

In model #1 #2, there are changes to association studies:
model 1: http://docs.google.com/View?id=df2h9psr_72gx26243m
model 2: http://docs.google.com/View?id=df2h9psr_73hdz2s9dz

In model #3, there is no change to association study: (no need for correction)
model 3: http://docs.google.com/View?id=df2h9psr_71fc4v3dtr

(3) Bias due to sequencing errors: Power study

Model #1 #2, power fluctuates depending on situation:
model 1: http://docs.google.com/View?id=df2h9psr_76cgznz6dk
model 2: http://docs.google.com/View?id=df2h9psr_77hqht3xhj

Model #3, power always decreases:
link to R code: http://docs.google.com/View?id=df2h9psr_74cdk82vcx

… need to find how to fix these associations and power.
… need to interpret power studies.

Evaluation of week:Excellent


May 11, 2009

From 3 observations made above, I came up with simple versions for association/power studies for each model

model 1: http://docs.google.com/Doc?id=df2h9psr_47czgcvxcj
model 2: http://docs.google.com/Doc?id=df2h9psr_49c5fxkwwc
model 3: http://docs.google.com/Doc?id=df2h9psr_51rbmshvfq

… need to find how to fix these associations and power.

Evaluation of week:Excellent


May 19, 2009

:: Found how to fix these statistics::

Model 1: (multiply C, C < 1)
association - http://docs.google.com/View?id=df2h9psr_178g9g5wbdt

Model 2: (multiply C, C < 1)
association - http://docs.google.com/View?id=df2h9psr_180dtmvkvg2

Model 3:
association - no need for correction: http://docs.google.com/View?id=df2h9psr_184dg3q4jf4
power - http://docs.google.com/View?id=df2h9psr_182f3j256g9

Also these are available from link to presentation slides (below) too.

Sum up all data.
Make conclusion.
Make slides.

Evaluation of week:Excellent


May 25, 2009

:: presentation :: http://docs.google.com/Presentation?id=df2h9psr_186hmktxqc4
(for unknown reasons, gmail.com changes some arrangements in some slides, so this one here is a bit messy)

Note: about my slides,
(1) if you just want to know the answers, scroll down to 'Conclusions' section.
(2) if you really want to understand the whole thing, you have to click on links that explain the math behind each model.

** End of project: Finished easy question and a large portion of medium question. **

Evaluation of week:Excellent


Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License