Heterogeneity Meets Rarity: Mining Multi-Faceted Diamond

Dr. Jingrui He, IBM Watson Research Center

Wednesday February 15, 2012 at 1:30 PM in CBIM 22 (Multipurpose Room)

Faculty Host: Tina Eliassi-Rad

Abstract: Many real-world machine learning and data mining problems exhibit both heterogeneity and rarity. Take anomaly detection (e.g., insider threat detection) from various social contexts as an example. While the target abnormal persons may only be a very small portion of the entire population (i.e., rarity), each person can be characterized by rich features, such as hyper-links, texts, social friendship, etc (i.e., feature heterogeneity). Moreover, different types of anomalies, though correlated, may exhibit different statistical characteristics (i.e., task heterogeneity). How can we identify at least one example for a new type of rare category? How can we leverage both feature heterogeneity and task heterogeneity to maximally boost the learning performance? In this talk, I will present our recent work on addressing these two challenges. For the challenge of heterogeneity, I will introduce a graph-based approach for both feature heterogeneity and task heterogeneity. For the challenge of rarity, I will talk about rare category analysis, e.g., how to discover the rare examples with the help of a labeling oracle, how to simultaneously identify both the rare examples and the relevant subspace, etc.

Bio: Dr. Jingrui He is currently a research staff member at IBM T.J. Watson Research Center. She received her M.Sc and Ph.D degree from Carnegie Mellon University in 2008 and 2010 respectively, both majored in Machine Learning. Her research interests include developing scalable algorithms for heterogeneous learning, rare category analysis, and semi-supervised learning with an emphasis on applications in social media analysis. She is the recipient of IBM Fellowship between 2008 and 2010. She also won the second place in ICDM2010 data mining contest on traffic prediction (both Task 2 and Task 3). She has published over 30 referred articles and served as the organization committee member of ICML, KDD, etc.

Suggest readings:

  • Jingrui He, Hanghang Tong, and Jaime Carbonell. Rare Category Characterization. IEEE Int. Conf. on Data Mining (ICDM), 2010. [Invited to FCS SI on 'Best of ICDM2010']
  • Jingrui He and Jaime Carbonell. Co-Selection of Features and Instances for Unsupervised Rare Category Analysis. SIAM Int. Conf. on Data Mining (SDM), 2010. [Invited to SAM SI on 'Best of SDM2010']