Paper abstract

A Novel Scalable and Data Efficient Feature Subset Selection Algorithm

Sergio Rodrigues de Morais - INSA-Lyon, France
Alex Aussem - Universite de Lyon 1, France

Session: Feature Selection
Springer Link: http://dx.doi.org/10.1007/978-3-540-87481-2_20

In this paper, we aim to identify the minimal subset of discrete random variables that is relevant for probabilistic classification in data sets with many variables but few instances. A principled solution to this problem is to determine the Markov boundary of the class variable. Also, we present a novel scalable, data efficient and correct Markov boundary learning algorithm under the so-called faithfulness condition. We report extensive empiric experiments on synthetic and real data sets scaling up to 139,351 variables.