Statistical tools for single cell gene expression analysis
Daphne Ezer, Bertie Gottgens, Boris Adryan
Department of Genetics
Cambridge Systems Biology Centre
Recently, single cell gene expression assays have provided vast amounts of data that can be used to probe cell-to-cell variability in tissues. These single cell methods have been applied in a number of biomedical contexts, from the study of tumor heterogeneity to the study of neurodegenerative disorders, so the development of statistically sound analysis tools for this new large quantity of data is becoming increasingly important. We have developed two statistical tools for analyzing single cell gene expression data. First, we developed a clustering algorithm that can take into account the family of distributions that we would expect to find in single cell gene expression data. This algorithm can distinguish between gene expression heterogeneity caused by bursty gene expression and that caused by mixtures of different cell types. Secondly, we developed a statistical method for evaluating the probability that the burst frequency or transcription rate has been differentially regulated across two populations of cells. This allows us to distinguish between the regulatory mechanisms used to control gene expression. After validating these approaches with simulated data, we applied both these strategies to understand how key transcription factors are regulated during hematopoiesis.