Faculty Host: Tina Eliassi-Rad
Abstract: In this talk, we shall look at classification techniques inducing models that are accurate and comprehensible. The aspect of humanly understandable and intuitive models is often overlooked, yet is of crucial importance in any domain where the model needs to be validated before it can be implemented, such as in the medical diagnosis and credit scoring domain. For example, support vector machines (SVMs) are currently state-of-the-art for the classification task and, generally speaking, exhibit good predictive performance due to their ability to model nonlinearities. However, their strength is also their main weakness, as the generated nonlinear models are typically regarded as incomprehensible black-box models. In a first approach, we use rule extraction from SVMs to extract rules that are accurate, comprehensible, and mimic the SVM model as much as possible. The developed ALBA technique extracts rules from the trained SVM model by explicitly making use of key concepts of the SVM: the support vectors, and the observation that these are typically close to the decision boundary. We'll show that the extracted rules do indeed provide insight into the black box model and even outperform traditional rule induction techniques under certain conditions. Next, the use of swarm intelligence (ant colony optimization, particle swarm optimization) for classification is studied, focusing on sequential covering algorithms that attempt to induce accurate and comprehensible classification models from data. The results show the potential of such metaheuristics for classification and demonstrate the importance of heuristic rule evaluation functions in sequential covering algorithms, allowing a trade-off between accuracy and comprehensibility.
Joint work with Bart Minnaert, Enric Junqué de Fortuny, Bart Baesens.
Bio: David Martens is an assistant professor at the Faculty of Applied Economics at University of Antwerp and a visiting researcher at NYU Stern. His work focuses on learning from social network data and the development of comprehensible data mining techniques, and has been published in high-impact journals. In 2008, David was a finalist for the ACM SIGKDD Doctoral Dissertation Award. Applications of his work can be found in the banking, telecommunciations, and online advertising industry.