DualBoost: Handling Missing Values with Feature Weights and Weak Classifiers that Abstain

Weihong Wang, Jie Xu, Yang Wang, Chen Cai, and Fang Chen

International Conference on Information and Knowledge Management

Missing values in real world datasets are a common issue. Handling missing values is one of the most key aspects in data mining, as it can seriously impact the performance of predictive models. In this paper we proposed a unified Boosting framework that consolidates model construction and missing value handling. At each Boosting iteration, weights are assigned to both the samples and features. The sample weights make difficult samples become the learning focus, while the feature weights enable critical features to be compensated by less critical features when they are unavailable. A weak classifier that abstains (i.e, produce no prediction when required feature value is missing) is learned on a data subset determined by the feature weights. Experimental results demonstrate the efficacy and robustness of the proposed method over existing Boosting algorithms.

Publication Type

Conference

Publication Date