Google Wants Your DNA: Are You Willing to be a Project in the Cloud?
For the past 18 months, Google has quietly been approaching hospitals and universities to acquire genome data in an effort to roll out a cloud computing service for DNA, according to Technology Review.
Google Genomics is the search giant’s first product for the DNA age, providing an API to store, process, explore and share DNA sequence reads, reference-based alignments, and variant calls, using Google’s cloud infrastructure.
For $25 a year, Google will host a copy of genome sequences in the cloud.
While genetic databases already exist online, Google Genomics is the latest and most ambitious iteration. Genealogy databases for finding ancestors and public genetic databases run by national research centers, while impressive and useful, have nothing on the DNA storage service.
Connecting and comparing genomes by the thousands, and soon by the millions, will propel medical discoveries for the next decade. Between Google, IBM, Microsoft and Amazon – the question of who will store the data is already a point of growing competition.
“We saw biologists moving from studying one genome at a time to studying millions,” David Glazer, the software engineer who led the effort, told Technology Review. “The opportunity is how to apply breakthroughs in data technology to help with this transition.”
Why Google Genomics is Important
The collection of data is vastly increasing in labs all over the world as faster equipment for decoding DNA is becoming more accessible. The Broad Institute in Cambridge, Massachusetts, reported that during the month of October it decoded the equivalent of one human genome every 32 minutes – roughly 200 terabytes of raw data.
This flow of data exceeds what biologists have previously handled (to put this in perspective, in over two months, Broad Institute will produce the equivalent of the amount of material that gets uploaded to YouTube in one day) prompting the effort to store and access data at a central point.
The National Cancer Institute said in October that it would pay $19 million to move copies of the 2.6 petabyte Cancer Genome Atlas into the cloud. Copies of the data will reside at both Google Genomics and in Amazon’s data centers.
The Future of Medical Discoveries
Without the comparison of genome sequences, it is tough for researchers to determine what a mutation is and what is not within DNA. With a database that houses thousands of genomes, the chances of pinpointing inconsistencies become much higher.
A database such as Google Genomics can serve as a search catalogue for doctors to determine the best treatment options for a patient.
“Our bird’s eye view is that if I were to get lung cancer in the future, doctors are going to sequence my genome and my tumor’s genome, and then query them against a database of 50 million other genomes,” said Deniz Kural, CEO of Seven Bridges, which stores genome data on behalf of 1,600 researchers in Amazon’s cloud. “The result will be ‘Hey, here’s the drug that will work best for you.’”
Solving the Privacy Issues
With big data comes big privacy issues. Genome databases have to carefully calibrate how much information they provide alongside DNA sequences. While more information such as age, sex, location, diet habits, etc. are more useful to researchers, the easier it is to identify who the genome belongs to.
A study in Science last year was able to identify several men from the publicly available 1000 Genomes Project based on their Y chromosomes and age, location and family tree data. While Google Genomics’ data is geared towards researchers rather than the general public, the wide accessibility of this information leaves the privacy matter open.
Additionally, what if researchers who are studying a patient’s genomes for cancer come across information that reveals a newly discovered rare disease or that said patient has an unknown sibling. Do they tell the patient?
While these privacy worries aren’t unique to Google Genomics, the sheer magnitude of the project magnifies the potential problems. According to Gizmodo, researchers have advocated for central genomic data centers to standardize privacy policies. Once these privacy concerns are reckoned with, Google Genomics has the capability to succeed where others haven’t.
According to Technology Review, at least 3,500 genomes from public projects are already stored on Google’s servers.