The floodgates are open. Specifically speaking, the repositories of big data are open and with the proper use and management of this data, the vast amount of information available is set to potentially change the face of modern medicine in terms of personalized healthcare for all.
Atul Butte, M.D., Ph.D.Overwhelming quantities of open data, including molecular, clinical and epidemiologic data have been collected over the last years at sites like the NIH National Cancer for Biotechnology Information or the clinical data repositories at hospital systems around the world. These data sets are at the disposal of the medical and scientific communities to use at their discretion for scientific advancement. Researchers today can access this trove of information and wield it to forge and develop potential advancements in diagnostics, therapeutics, and insights into diseases across the medical spectrum.
According to one expert, one of the main challenges going forward is to be able to convert the genome-era discoveries and translate the hundreds of trillions of points of molecular, clinical and epidemiologic data into clinical usefulness.
“The age of using invaluable molecular, genetic, and epidemiologic data to drive scientific and medical discoveries is upon us. The collection of data, even publicly-available data, has taken on a new meaning in terms of providing state-of-the-art healthcare, one that can be personalized and tailored to the individual patient,” says Atul Butte, M.D., Ph.D., Director, Institute for Computational Health Sciences, and Professor of Pediatrics, University of California, San Francisco.
The end of the scientific method?
We are in the middle of a data revolution today, and it is estimated we as humans now generate more than two zettabytes (1021 bytes) of information annually, a number that is projected to double every two to three years going forward. According to Dr. Butte, this data “deluge” could make the scientific method obsolete.
“With the scientific method, we ask a question and then we go gather the data to answer that question. Classically, in science, we ask the scientific question first, and then we go gather the data to answer that question. With the increasing troves of data becoming readily available for analysis, the new magic is to figure out what’s the question that we want to ask given all the data at our disposal,” Dr. Butte says.
The data revolution in life science and biomedicine is made possible in part by novel devices such as DNA sequencing technologies or gene expression microarrays, useful research tools that can read and quantitate every single gene in the human genome. For example, the gene expression microarray allows researchers to measure numerous different molecules simultaneously, and the garnered information can then be shared among other researchers at various dedicated databases and websites. Scientists and researchers can now simply access all of this genetic data and perform any number of experiments and research, such as exploring the effects of different established drugs on different diseases, or discovering a blood test that could detect and diagnose a particular cancer ahead of time. According to Dr. Butte, this repositioning of existing drugs could be very fruitful across all of the medical specialties.
“There are many other good uses of the drugs that are out there with the medicines we already have. The trick is to try to find a way to see which drugs could be repositioned and used in other diseases or conditions. Finding new uses for drugs, especially connecting a rare disease with a commonly treated one makes me feel good about the future and how we can suggest potential therapies in the future for our patients,” Dr. Butte says.
Two well-known examples of the repurposing of drugs include the cardiac drug sildenafil (Viagra, Pfizer), now commonly used for erectile dysfunction, and minoxidil, which is now readily used for androgenic alopecia. The hope is that instead of finding these new uses for drugs by accident that we discover new uses on purpose with the help of vast public datasets. Big data can be used to help determine the impact of drugs on people with and without diseases in dedicated clinical trials in a much faster and cheaper way than ever before.
Searching the genome
Kavita Yang Sarin, M.D., Ph.D.Databases allow researchers to explore the various different aspects of diseases such as inflammation or cancer genes, and lets them easily and quickly compare their data of a given patient or disease with existing datasets. If there is no specific genetic data available for a given patient, the datasets can still help clinicians and researchers learn more about the disease in question.
Gene databases can also help clinicians arrive at an accurate diagnosis of clinically ambiguous skin lesions by comparing the gene expression signatures of the biopsy tissue in question with already publicly available data on the suspected potential differential diagnoses.
According to Dr. Butte, one of the future roles of physicians could be to help patients understand what they can do to compensate for their genome, should a disease and/or disease association be identified in their individual genome. Compensation could include a healthier lifestyle, such as changes in diet and daily exercise, or much more detailed changes according to what’s found in the patient’s individual genome.
“We regularly use the various warehouses of big data for our research. Although there is a multitude of dedicated websites, I think that one of the most clinically friendly databases is probably the GEO datasets, because the NIH has done an excellent job curating the data and making it accessible,” says Kavita Yang Sarin, M.D., Ph.D., Clinical Assistant Professor in Dermatology, Department of Dermatology, Stanford University Medical Center, Stanford, Calif.
When someone using the GEO dataset specifies the name of a gene, the GEO will pull up the expression levels of that gene across a dataset or specific disease. The process works very much like a Google search for looking up the expression levels of various genes.
One way that Dr. Sarin and her research team use big data websites is for the repurposing of drugs.
YOU MIGHT ALSO LIKE: Genomic test promising for boosting certainty of melanoma diagnoses
“We look at genes that are altered in various diseases using GEO datasets, and then we identify existing drugs that might actually target those expression patterns in order to explore new and potential therapies of older standing drugs as well as for agents that we already have available,” Dr. Sarin says.
BCC and big data
In her research with basal cell carcinoma (BCC), Dr. Sarin and her colleagues in the Stanford Translational Basal Cell Cancer group identified mutations that cause drug resistance to vismodegib. After curating a list of known and published mutations that cause drug resistance to smoothened inhibitors, it was found that those skin cancer patients who have the smoothened mutation would likely be resistant to vismodegib and similar agents. Using patients’ specific genome data together with the big datasets helped elucidate which patients with advanced BCC would respond well to smoothened inhibitor treatment before actually starting therapy.
“Analyzing the patient’s genome data among known data sets for specific diseases is extremely helpful because it allows you to practice precision medicine in your patients. You can use genetic information from your patient to determine their clinical care, and to potentially implement effective therapies for their given disease,” Dr. Sarin says.
Dealing with the sheer volume and complexity of genetic data at these big data websites may at first appear to be a daunting task. Fortunately however, many of the currently available databases already have computational tools in place that process the data, allowing the user to perform a simple search using the dedicated search engine at the website.
“These database websites can offer you very valuable information, but you have to know what you are looking for and what to ask for. Our research is usually guided by a clinical question or set of questions. If you are looking for an expression of a certain gene, you can find it. If you are looking for the top 500 genes that are different between psoriasis and lupus, you can find it as well. You just have to learn and know how and what to ask for via the search engine,” Dr. Sarin says.
A precision medicine approach with big data can also be used to help determine the risk of skin cancer in patients. In her ongoing skin cancer research, Dr. Sarin genotypes patients and, based on their individual genetic risk markers, she can assess their genetic risk of developing skin cancer. With the help of big datasets, Dr. Sarin can inform patients of their risk based on genetic markers that they have by using data from published studies that show increased risk with certain genetic alleles and certain gene markers.
Where does it lead?
Big data is going to be playing an increasingly important role in precision medicine, Dr. Sarin says, and a more intricate role in the optimal treatment and management of patients. Although physicians and researchers from any medical specialty can access and use big data, Dr. Sarin says that dermatologists could and should take more advantage of the wealth of data currently available at the many databases.
Dr. Sarin concludes by saying, “I think that dermatologists should pay more attention to the resources available and the diagnostic and therapeutic possibilities at our disposal when using big data. We need to realize now already that there is a wealth of medical information that we have at our disposal, just sitting there, and we need to use this data appropriately to learn about how we can better treat and manage a host of diseases and conditions in our patients.”
Disclosures: Dr. Sarin has no relevant disclosures. Dr. Butte is founder and consultant to NuMedii, Personalis, Carmenta.