Latino Studies at New York University

Rachel Hodos

Computational Biology PhD Student
New York University

November 26, 2013

Encoding evidence of independence into Bayesian network structure learning

Bayesian networks are an increasingly popular tool for extracting structure from multivariate data, and can be very useful for discovering causal relationships in large-scale biological datasets such as gene expression data.  However, the problem of learning a Bayesian network structure is NP-hard, so research has traditionally focused on developing various heuristic search strategies.  Recent work by Brenner and Sontag, 2013, has taken a different approach, developing a new objective function called SparsityBoost.  SparsityBoost adds a data-driven complexity penalty, looking for strong evidence in the data that an edge should not be present in the optimal graph, and boosting the score for graphs that are consistent with this evidence.  The original presentation was only for binary variables.  Here, I show how the score can be extended to networks over discrete variables with more than two states, and show results on synthetic data as well as gene expression data.