Multi-Armed Bandit

by Paul Servino
2nd year CS&E Major

Project Description

Choosing and Testing one SNP from one individual at a time to maximize effectiveness.


A Machine Learning Problem

  • Optimization of decisions.
  • Traditionally:
    • Slot machine with a number of levers.
    • Each lever has a different probability of winning associated with it.
    • Without any prior knowledge, the gambler must find the best lever(s) to pull to maximize his profit.
    • Presents three choices after each pull:
      • Pull the same lever again (Exploit).
      • Pull one of the levers already pulled, again (Exploit).
      • Try a new lever (Explore).

A Biological Problem

  • Now, we collect all SNPs from one invididual at a time.
  • The multi-armed bandit problem is choosing one SNP from an inidividual at a time.
  • A theoretical biological problem.
    • The technology and resources to put a solution to the Multi-Armed Bandit problem to use do not exist.

Multi Armed SNP Bandit

  • Genome with a number of SNPs.
  • Each SNP has actual frequencies: p+A, and p-A, associated with it.
  • Without any prior knowledge a researcher must find the best SNPs to test to maximize his gain.
  • Presents three choices after each test of an individual’s SNP:
    • Test the same SNP again on a different individual (Exploit).
    • Test one of the SNPs already tested, again (Exploit).
    • Test a new SNP (Explore).


Defining a Solution

  • Fewer(est) number of tests possible to reach the same conclusion as from a normal association study.
    • Identifying the same SNPs as associated.
  • Higher(est) power attainable in the same number of individual tests as a normal association study.
  • Test different algorithms for doing this and in general, explore the problem.


Weighs Exploration vs. Exploitation

  • Explore
    • When to give up on SNPs that are believed to not be associated.
    • When to be convinced enough that the SNP tested is associated to move on.
  • Exploit
    • Not enough tests to make a decision.
    • Gains from testing SNPs again.
  • Attempts to move through (a subset of the) genome as quickly as possible, identifying all associated SNPs.


  • Instead of a complicated multipass algorithm, I chose a simple one and optimized its values.
  • This was a good starting point and can lead to a better multipass, tiered design in the future.


  • First, I ensure that there is a firm base to be making an Explore / Exploit decision by having the algorithm test a certain number (baseDivisor) of inidividuals for that SNP.
  • Then, I calculate the statistic for the values.
  • I then see if it is over or under a certain value (compStat), which if:
    • Under (probably not associated) = Explore!
    • Over (probably associated) = Exploit!

Simplified Code

if(>getNumTests() > N_LIMIT / baseDivisor)     
    if(>getStat() > compStat)

Simulator Solution

Framework in which to test my bandit algorithm(s).

  • ~600 lines of C++.
  • Very little memory usage as data is randomly generated within set bounds and tabulated.
  • Generates a number of SNPs which are either:
    • Unassociated (pA+ = pA-).
    • Associated (pA+ ≠ pA-).
  • Simulates Association Studies.
    • Calculates Association statistics with every individual SNP test.
      • Allows for easy comparison between normal association studies and individual algorithmic (bandit) runs.
  • Simulates bandit runs with ability to explore set of SNPs or to exploit the currently tested one.


Similar Bandit Algorithms

  • 100 Tests, one of which:
    • Generated 100 SNPs.
      • 99 Unassociated (pA+ = pA-)
      • 1 Associated (pA+ ≠ pA-)
    • Ran an algorithmically (Bandit) determined number of individual tests on those SNPs (MAX = 300 per SNP).
    • Attempted to find the Associated SNP.



Raw Data, Results, & Graphs


More Complicated Algorithm.

  • Multi-Pass Approach.
  • Tiered Approach.

More Extenisve Testing.

  • Maximize power per test.

SNP Correlation

  • Take into account and eploit correlation between SNPs.


Presentation and Source Code


Weekly Schedule

Week ending 5/1

Research into choosing the project.

Grade: B

Week ending 5/8

More research of Multi-Armed Bandit Problem.

Grade: B+

Week ending 5/15

Began coding a simulator.

Grade: B

Week ending 5/22

Met with professor. Discussed angles of approach.
Email correspondence. Continued work on simulator with more refined requiremnts

Grade: A

Week ending 5/29

Coding Simulator and researching algorithms.

Grade: B

Week ending 6/6

Finished simulator, algorithm.
Started and finished my testing.
Started and finshed my presentation.

Grade: A+

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License