Latino Studies at New York University

David Sontag

Computer Science
Courant Institute of Mathematical Sciences
New York University

February 19, 2013

Extracting structure from data

This talk considers the problem of automatically extracting structure from data. Given observations of hundreds or thousands of variables, is it possible to infer the relationships between the variables? Mathematically, this problem can be formulated as that of finding the Bayesian network structure that maximizes the likelihood of the observed data subject to a complexity penalty. However, maximum likelihood structure learning is well-known to be NP-hard, and heuristics are often far from optimal. In this talk, I will present a new approach to structure learning based on linear programming relaxations. Key to our approach is the introduction of a novel objective function which combines the theoretical advantages of maximum likelihood with the computational benefits of conditional independence tests. Based on joint work with Eliot Brenner.

Bio: David Sontag is an Assistant Professor of Computer Science at New York University's Courant Institute of Mathematical Sciences. His research interests include theoretical and practical aspects of machine learning and probabilistic inference. David’s recent work has focused on unsupervised learning of probabilistic models (e.g., for medical diagnosis) directly from clinical data found in electronic medical records. Prior to joining Courant, he was a postdoctoral researcher for Microsoft Research New England, 2010-11.  David's Ph.D. thesis won the award for the best doctoral thesis in Computer Science at MIT in 2010.  His research has received recognition including a Best Paper Award at the conference on Empirical Methods in Natural Language Processing in 2010, a Best Paper Award at the conference on Uncertainty in Artificial Intelligence in 2008, and an Outstanding Student Paper Award at the conference on Neural Information Processing Systems in 2007.