MultiArmed Bandit
by Paul Servino
2nd year CS&E Major
Project Description
Choosing and Testing one SNP from one individual at a time to maximize effectiveness.
Table of Contents

Motivation
A Machine Learning Problem
 Optimization of decisions.
 Traditionally:
 Slot machine with a number of levers.
 Each lever has a different probability of winning associated with it.
 Without any prior knowledge, the gambler must find the best lever(s) to pull to maximize his profit.
 Presents three choices after each pull:
 Pull the same lever again (Exploit).
 Pull one of the levers already pulled, again (Exploit).
 Try a new lever (Explore).
A Biological Problem
 Now, we collect all SNPs from one invididual at a time.
 The multiarmed bandit problem is choosing one SNP from an inidividual at a time.
 A theoretical biological problem.
 The technology and resources to put a solution to the MultiArmed Bandit problem to use do not exist.
Multi Armed SNP Bandit
 Genome with a number of SNPs.
 Each SNP has actual frequencies: p+A, and pA, associated with it.
 Without any prior knowledge a researcher must find the best SNPs to test to maximize his gain.
 Presents three choices after each test of an individual’s SNP:
 Test the same SNP again on a different individual (Exploit).
 Test one of the SNPs already tested, again (Exploit).
 Test a new SNP (Explore).
Goals
Defining a Solution
 Fewer(est) number of tests possible to reach the same conclusion as from a normal association study.
 Identifying the same SNPs as associated.
 Higher(est) power attainable in the same number of individual tests as a normal association study.
 Test different algorithms for doing this and in general, explore the problem.
Algorithm
Weighs Exploration vs. Exploitation
 Explore
 When to give up on SNPs that are believed to not be associated.
 When to be convinced enough that the SNP tested is associated to move on.
 Exploit
 Not enough tests to make a decision.
 Gains from testing SNPs again.
 Attempts to move through (a subset of the) genome as quickly as possible, identifying all associated SNPs.
Simple
 Instead of a complicated multipass algorithm, I chose a simple one and optimized its values.
 This was a good starting point and can lead to a better multipass, tiered design in the future.
How?
 First, I ensure that there is a firm base to be making an Explore / Exploit decision by having the algorithm test a certain number (baseDivisor) of inidividuals for that SNP.
 Then, I calculate the statistic for the values.
 I then see if it is over or under a certain value (compStat), which if:
 Under (probably not associated) = Explore!
 Over (probably associated) = Exploit!
Simplified Code
if(v_SNPs.at(m_currentSNP)>getNumTests() > N_LIMIT / baseDivisor)
{
if( v_SNPs.at(m_currentSNP)>getStat() > compStat)
Exploit();
else
Explore();
}
Simulator Solution
Framework in which to test my bandit algorithm(s).
 ~600 lines of C++.
 Very little memory usage as data is randomly generated within set bounds and tabulated.
 Generates a number of SNPs which are either:
 Unassociated (pA+ = pA).
 Associated (pA+ ≠ pA).
 Simulates Association Studies.
 Calculates Association statistics with every individual SNP test.
 Allows for easy comparison between normal association studies and individual algorithmic (bandit) runs.
 Calculates Association statistics with every individual SNP test.
 Simulates bandit runs with ability to explore set of SNPs or to exploit the currently tested one.
Testing
Similar Bandit Algorithms
 100 Tests, one of which:
 Generated 100 SNPs.
 99 Unassociated (pA+ = pA)
 1 Associated (pA+ ≠ pA)
 Ran an algorithmically (Bandit) determined number of individual tests on those SNPs (MAX = 300 per SNP).
 Attempted to find the Associated SNP.
 Generated 100 SNPs.
Results
Spreadsheet
Further
More Complicated Algorithm.
 MultiPass Approach.
 Tiered Approach.
More Extenisve Testing.
 Maximize power per test.
SNP Correlation
 Take into account and eploit correlation between SNPs.
Files
Presentation and Source Code
Weekly Schedule
Week ending 5/1
Research into choosing the project.
Grade: B
Week ending 5/8
More research of MultiArmed Bandit Problem.
Wiki
Grade: B+
Week ending 5/15
Began coding a simulator.
Grade: B
Week ending 5/22
Met with professor. Discussed angles of approach.
Email correspondence. Continued work on simulator with more refined requiremnts
Grade: A
Week ending 5/29
Coding Simulator and researching algorithms.
Grade: B
Week ending 6/6
Finished simulator, algorithm.
Started and finished my testing.
Started and finshed my presentation.
Grade: A+
page revision: 16, last edited: 13 Jun 2009 07:20