Latino Studies at New York University

Ilya Korsunsky

Computational Biology Program
Courant Institute of Mathematical Sciences
New York University

Survival Analysis Using Probabilistic Graphical Models and Philosophical Causation with Applications to Cancer Genomics

November 3, 2015

Survival analysis in cancer lies at the center of many clinical applications, including prognosis, diagnosis, personalized therapy and drug trials. The advent of genomics promises extremely accurate models for survival analysis but has been mired by many challenges. The biological, statistical and computational complexities inherent to genomic datasets are manifold: disease heterogeneity, low-resolution samples, nonlinear interactions, small sample sizes and high dimensionality and survival analysis is computationally demanding. Currently, various machine learning approaches tackle these challenges with limited success, since they rely on a static formulation of a dynamic evolving process. Cancer is a progressive disease of the genome as it is driven by somatic evolution.

We hypothesize that a successful survival analysis model must take progression into account. We develop a general survival analysis framework that uses the Fisher kernel and Cox's proportional hazards model to incorporate information about genomic progression. We derive model-specific kernels to show that the framework supports all the main probabilistic models of genomic progression from the literature. We also develop the Suppes-Bayes Causal Network (SBCN), a novel progression model that encodes Suppes's theory of probabilistic causation into a fully Bayesian generative model. The SBCN generalizes existing models to allow for arbitrarily complex Boolean interactions among genes and provide for a flexible and robust formalization of genomic progression. On synthetic data, we reconstruct SBCNs faithfully and use the model to significantly improve performance in the survival analysis framework. With real data, we demonstrate that the survival framework with the SBCN significantly "lifts" survival prediction accuracy on several datasets from the Tumor Cancer Genome Atlas (TCGA).