skip to primary navigationskip to content
 

Empirical Bayes in Genomics: when dimensionality is a blessing

last modified Jun 18, 2015 12:18 PM
Gwenael G.R. Leday, MRC Biostatistics Unit

Empirical Bayes in Genomics: when dimensionality is a blessing

Gwenael G.R. Leday MRC Biostatistics Unit Cambridge

Abstract

Technological advances continue to expand the scale and scope of molecular assays in biomedicine. Increasingly, multiple assays, interrogating dierent molecular variables, are being brought to bear on large-scale projects spanning large numbers of patient samples. For example, The Cancer Genome Atlas (TCGA) project is generating molecular proles of more than 30 cancer types across thousands of patients with data types including DNA copy number, methylation, transcription and protein (see http://tcga-data.nci.nih.gov/tcga). The statistical analysis of these Big Data has emerged as a major challenge in current research.

Over the past two decades, a surge statistical models and inference procedures have been developed to address the various biological questions arising from the analysis of molecular data. What we have learned is that (1) statistical regularization is essential, (2) Bayesian methods are promising for complex high-dimensional data and, (3) empirical Bayes procedures, which combine frequentists and Bayesian arguments for the borrowing of strength between seemingly unrelated problems, can be crucial. Notably, empirical Bayes is facilitated in high-dimensions and hence the method's gain is particularly large when the data are big. Consequently, for this approach high-dimensionality is a blessing rather than a curse.

In this presentation, I will exemplify the benecial eects of empirical Bayes procedures on various data analysis problems met in Genomics, including dierential expression analysis, largescale multiple testing, gene network reconstruction, data integration, and the incorporation of prior biological knowledge. Through these exemplars, we shall see that empirical Bayes is a very versatile principle which combines well with complex high-dimensional data and, more generally, promise to offer good opportunities in Big Data Science.