Traditional mismatch-detecting algorithms cannot deal with insertions and deletions. Indels will usually cause a large number of mismatches when trying to map a read to a reference genome because they cause a shift of all the bases. An indel causes a shift with respect to the reference genome. The mismatch-detecting algorithm usually only handles one to three mismatches. (More specific information to follow.)

Week Four (4/19-4/25)


To Do Next

  • Better describe the (first?) goal of the project
  • Plan out first goal implementation/theory before any coding begins

First Goal: Develop a mapper that can take many short reads and use a reference genome of length L to discover the target sequence the reads came from. The mapper should be able to identify the indels in the target sequence of length one. We will produce a random reference sequence, create single base indels in it, feed this new target sequence to a read simulator, and then pass these reads and original sequence into the mapper to test it.

Grade: A-/B+. Finally started doing things but not nearly as much as I would have liked to.

