Latino Studies at New York University

Constantin Aliferis

Center for Health Informatics and Bioinformatics
New York University

April 29, 2014

Frontier problems in feature selection for Big Data analytics

Big Data Science (BDS) studies methods of data collection linking, analysis and modeling that exceed in scope and complexity the typical capabilities of traditional “small data” methods. Critical to the success of Big Data Science Analytics are methods to manage high dimensional data. Such methods include regularization, dimensionality reduction, model selection parsimony criteria, feature construction and feature selection. Arguably, among them, feature selection plays the most important role on both practical and theoretical grounds.

The purpose of the present talk is two-fold: (a) I will present major recent advances in the theory and practice of feature selection that have provided transformative analytic capabilities relative to what was possible a decade ago. (b) I will argue that the landscape of feature selection problems with practical relevance is much larger and vastly more complicated than previously thought.

Problem classes exist in feature selection with immense practical applications that have traditionally escaped the attention of the BDS research community and should provide fertile ground for the next generation of advances in the field.