College of Letters and Science home page

Statistical Challenges in Genomics

Sandrine Dudoit is Associate Professor of Biostatistics and Statistics at UC Berkeley and is a faculty affiliate of the Center for Computational Biology and the California Institute for Quantitative Biosciences (QB3). Image credit: Alain Dudoit

At first glance, it bears an uncanny resemblance to a piece of modern art. A grid of red, yellow, and green spots glows against a glassy black backdrop in an abstract composition no larger than a microscope slide. But in addition to having visual appeal, this stylish object possesses tremendous scientific value. Called a DNA microarray, it is a miniature laboratory on a chip. In a single experiment it can deliver a detailed snapshot of the thousands of genes and proteins interacting in an organism, whether bacterium or human.

For biologists, DNA microarrays have been boon and curse alike. Researchers routinely use these assays to monitor gene expression patterns in cells from cancer patients, with the aim of deriving better diagnosis and treatment strategies for the disease. They can now obtain unprecedented insights into the activities of genes and cells with a minimum of experimental effort. At the same time, they are struggling to make sense of the tidal wave of data that ensues.

"Each microarray experiment yields thousands and thousands of measurements for just one person," says Sandrine Dudoit, a Berkeley professor of Biostatistics and Statistics. "Microarrays and other high-throughput biological assays are raising challenging statistical design and analysis questions and are a driving force for our discipline. The scale and complexity of the data are unprecedented and far greater than traditional methods allow you to handle."

DNA microarrays allow biologists to monitor gene expression levels for entire genomes. Image credit: Wikipedia

Dudoit specializes in developing statistical and computational methods to analyze and comprehend the mind-bogglingly large and intricate datasets generated by high-throughput biotechnologies such as DNA microarrays. She has pioneered approaches for combining and synthesizing information from multiple and diverse data sources that concern different aspects of gene expression. She develops statistical methods to uncover relationships among a patient's entire genome; demographic and environmental variables such as age, sex, ethnicity, and diet; and medical outcomes such as survival prognosis and response to treatment.

Dudoit regularly fields requests from colleagues on campus and around the world seeking assistance with their unwieldy data collections. "Biologists are contacting statisticians with fascinating questions and new tools to measure biological variables we'd never thought we'd be able to investigate just a few years ago. They are wondering how to gain biological insight from their experimental data and the wealth of information publicly available on the Internet," she remarks. "It's a very exciting time to be a statistician."

This stained glass window at Cambridge University commemorates Sir Ronald Aylmer Fisher (1890-1962), one of the founding fathers of statistics and a pioneer in the application of statistics to genetics. It represents a Latin square, a table that can be applied to design DNA microarray experiments.Image credit: Wikipedia

Her collaborators encounter a sympathetic ear. As a postdoctoral researcher Dudoit had line-of-fire experience with the formidable data analysis challenges faced by biologists. On her first day in the lab, she was presented with "raw" image data from a microarray experiment and wondered how one could possibly extract biological knowledge from these thousand-dimensional numeric matrices. She recalls, "It was facing the real world in a biology lab that made me realize what applied statistics was all about. It highlighted the critical importance of effective communication between biologists and statisticians. It also emphasized the crucial role of statistical computing as a link between the development of statistical methodology and its timely impact on biology."

Dudoit and her students rely on computing to make the results of their research available to the scientific community. They implement their novel approaches using the R language for statistical computing and graphics and release their software as part of the open-source Bioconductor Project.

"Being a biostatistician today allows me to combine my interests in both mathematics and biology. There's a real need for statistical principles and methodology and our contributions are being used right away to address relevant biological questions. With a next generation of DNA sequencing machines entering the scene, we are facing new and even greater statistical and computational challenges," Dudoit says. "You feel like your work really matters; it's being applied immediately, with the goal of elucidating fundamental scientific questions and improving public health."

Related Web Sites

Return to top