Recent technology in genetic sequencing has led to short read sequencers. These sequencers produce random reads of a short fixed length from the genome. As such this creates a the problem of being able to reconstruct ones genome from the set of short reads that the mapper produces. The solution is a technique known as resequencing which attempts to align the reads to a reference genome from the same species. This is a difficult problem when you speak of a reference genome of length 3*10^9 base pairs and roughly 10^8 reads to cover it. This project attempts to address this problem by leveraging current trends in high performance computer architectures.
The goal of this quarter
The goal of this project is to leverage the power of current GPU programming models to accelerate short read mapping technologies. The hope is that the massively parallel nature of the GPUs will lead to order of magnitude improvements over that of conventional CPU based approaches. A secondary goal is to achieve good performance for sequences of length 10^9 and longer with over 10^8 reads.
The Schedule for the quarter
Due to the complexities of implementation attributed to using the GPUs, I plan to have a complete prototype by the end of 9th week. Following the completion of the programming, 10th week will be devoted to performance evaluation.
1) Research current techniques in sequence alignment.
2) Perform high level analysis of potential complications.
3) Build off virus scanner to get a running start on resequencer.
I should have started this work a bit sooner.
1) Continue implementation of CPU based management structures.
2) Complete GPU Scheduler to exploit read locality.
3) Run unit test on the code to make sure it actually runs.
Completed CPU structures. Unit testing will come over the weekend. I think there is a bug in the scheduler.
Should have set aside more time for development this week. :(
1) Complete implementation of the scanner.
2) Begin performance analysis.
3) Prepare final presentation.
Completed implementation of the GPU scan kernel, and associated management structures.
Had to track down a bug in the GPU scheduler which was causing reads in use on the GPU to be removed
before the comparison completed.
1) Run performance test for presentation.
2) Finish Powerpoint
Completed performance benchmarks and incorporated them into the report. All performance metrics can be found in the M.S. report.
Upload source files and finish up wiki.
Completed all task.
The performance analysis has been completed and added to the report found above. More performance evaluation needs to be completed in order to find bottleneck to improve performance. Also closer look to the cmatch system put out by the MUMmer project may lead to valuable insight into further acceleration techniques.