Big Data needs Big Processing

 

http://www.medgadget.com/2014/02/beagle-sniffs-out-240-genomes-in-two-days.html

Beagle Sniffs Out 240 Genomes in Two Days

by GAVIN CORLEY on Feb 20, 2014 • 5:53 pm

beagle supercomputer Beagle Sniffs Out 240 Genomes in Two DaysWhole genome sequencing holds great potential for enriching diagnoses and understanding hereditary risk factors for specific diseases. However, the sheer volume of data involved poses major technical challenges, which limits the utility of this approach. For this reason many clinical geneticists have turned to exome sequencing which looks at a small portion of the genome that codes for proteins.

A team from the University of Chicago have managed to turn the spotlight back on whole genome sequencing by analyzing 240 full genomes in two days by recruiting the computational muscle of Beagle, one of the world’s fastest supercomputers. Beagle is a Cray XE6 supercomputer at the Argonne National Laboratory outside Chicago, and is used for computation, simulation, and data analysis for the biomedical research community.

The architecture of the Beagle is such that it allows highly efficient and rapid processing of parallel data streams. To give you some idea of just how powerful the Beagle is, the researchers estimate that the equivalent task carried out by a single 2.1 GHz CPU would take approximately 47.2 years to complete.

According to one of the lead investigators, Professor Elizabeth McNally:

Improving analysis through both speed and accuracy reduces the price per genome, with this approach, the price for analyzing an entire genome is less than the cost of the looking at just a fraction of genome. New technology promises to bring the costs of sequencing down to around $1,000 per genome. Our goal is get the cost of analysis down into that range.

The team have published their results, in great technical depth, in the journalBioinformatics and while we won’t see this kind of technology in clinics anytime soon, it should certainly enhance the pace and clinical utility of whole genome sequencing.

Journal of BioinformaticsSupercomputing for the parallelization of whole genome analysis

Press release: Whole Genome Analysis, STAT