GPU ReSequencing

Project Member

Stephen Oakley

Project Description

Recent technology in genetic sequencing has led to short read sequencers. These sequencers produce random reads of a short fixed length from the genome. As such this creates a the problem of being able to reconstruct ones genome from the set of short reads that the mapper produces. The solution is a technique known as resequencing which attempts to align the reads to a reference genome from the same species. This is a difficult problem when you speak of a reference genome of length 3*10^9 base pairs and roughly 10^8 reads to cover it. This project attempts to address this problem by leveraging current trends in high performance computer architectures.

The goal of this quarter

The goal of this project is to leverage the power of current GPU programming models to accelerate short read mapping technologies. The hope is that the massively parallel nature of the GPUs will lead to order of magnitude improvements over that of conventional CPU based approaches. A secondary goal is to achieve good performance for sequences of length 10^9 and longer with over 10^8 reads.

The Schedule for the quarter

Due to the complexities of implementation attributed to using the GPUs, I plan to have a complete prototype by the end of 9th week. Following the completion of the programming, 10th week will be devoted to performance evaluation.

Project Files


Week Seven

Plan

1) Research current techniques in sequence alignment.
2) Perform high level analysis of potential complications.
3) Build off virus scanner to get a running start on resequencer.

Progress

I should have started this work a bit sooner.

Problems

None.

Grade

A.


Week Eight

Plan

1) Continue implementation of CPU based management structures.
2) Complete GPU Scheduler to exploit read locality.
3) Run unit test on the code to make sure it actually runs.

Progress

Completed CPU structures. Unit testing will come over the weekend. I think there is a bug in the scheduler.

Problems

Should have set aside more time for development this week. :(

Grade

B+.


Week Nine

Plan

1) Complete implementation of the scanner.
2) Begin performance analysis.
3) Prepare final presentation.

Progress

Completed implementation of the GPU scan kernel, and associated management structures.

Problems

Had to track down a bug in the GPU scheduler which was causing reads in use on the GPU to be removed
before the comparison completed.

Grade

A.


Week Ten

Plan

1) Run performance test for presentation.
2) Finish Powerpoint

Progress

Completed performance benchmarks and incorporated them into the report. All performance metrics can be found in the M.S. report.

Problems

None.

Grade

A


Final Week

Plan

Upload source files and finish up wiki.

Progress

Completed all task.

Problems

none

Grade

A


Concluding remarks

The performance analysis has been completed and added to the report found above. More performance evaluation needs to be completed in order to find bottleneck to improve performance. Also closer look to the cmatch system put out by the MUMmer project may lead to valuable insight into further acceleration techniques.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License