[Fall 2014  -- 16:198:598]

Topics in Artificial Intelligence (16:198:598) -- Machine Learning with Large-scale Data

 

Lecture #1: September 8, 2014

Topic: Introduction & Overview

Readings:

á      Ch 1 of Scaling Up Machine Learning

á      Ch 2 of MMDS on MapReduce and the New Software Stack

á      New Templates for Scalable Data Analysis (A. Ahmed, A. Smola, and M. Weimer, WWW 2012 Tutorial)

 

Lecture #2: September 15, 2014

Topic: Statistical Queries & their Uses on Distributed Platforms

Readings:

á      [Classic Paper] Efficient Noise-Tolerant Learning From Statistical Queries (M. Kearns, JACM 1998)

á      Map-Reduce for Machine Learning on Multicore (C.T. Chu et al., NIPS 2006)

á      Stochastic Gradient Boosted Distributed Decision Trees (J. Ye et al., CIKM 2009)

á      Optional: Modeling with Hadoop (V. Narayanan & M. Bhandarkar, KDDÕ11 Tutorial)

 

Lecture #3: September 22, 2014

Topic: Frameworks for Scaling Up Machine Learning, Part I

Readings:

á      Ch 2 of Scaling Up Machine Learning

á      Ch 3 of Scaling Up Machine Learning

á      Ch 4 of Scaling Up Machine Learning

 

Lecture #4: September 29, 2014

Topic: Frameworks for Scaling Up Machine Learning, Part II

Readings:

á      Pregel: A System for Large-scale Graph Processing (G. Malewicz et al., SIGMOD 2010)

á      GraphLab: A New Framework For Parallel Machine Learning (Y. Low et al., UAI 2010)

á      Parameter Server for Distributed Machine Learning (M. Li et al., NIPS 2013 Big Learning Workshop)

 

Lecture #5: October 6, 2014

Topic: Computing Reputation Scores for Nodes in Big Social Networks

Guest Lecturer: Vinayak Javaly (Lenddo) on PageRank-type Algorithm to Determine Creditworthiness

Readings:

á      [Classic Paper] The Anatomy of a Large-Scale Hypertextual Web Search Engine (Sergey Brin and Lawrence Page, Stanford Technical Report 1998)

á      [Classic Paper] The PageRank Citation Ranking: Bringing Order to the Web (Lawrence Page, Sergey Brin , Rajeev Motwani , Terry Winograd, Stanford Technical Report 1999)

á      [Classic Paper] MapReduce: Simplified Data Processing on Large Clusters (Jeffrey Dean and Sanjay Ghemawat, OSDI 2004)

á      PageRank beyond the Web (D. Gleich, arxiv:1407.5107 [cs.SI], July 2014)

á      Local Graph Partitioning using PageRank Vectors (R. Andersen, F. Chung, and K. Lang, FOCS 2006)

á      Fast Matrix Computations for Pairwise and Columnwise Commute Times and Katz Scores (F. Bonchi et al., Internet Mathematics 2012).

 

Lecture #6: October 13, 2014

Topic: Parallelizing SVMs & Learning to Rank

Readings:

á      Ch 6 of Scaling Up Machine Learning

á      Ch 7 of Scaling Up Machine Learning

á      Ch 8 of Scaling Up Machine Learning

 

Lecture #7: October 20, 2014

Topic: Applications

Guest Lecturer #1: Kara Greenfield (MIT Lincoln Laboratory) on Developing and Evaluating Link Prediction Algorithms for Speaker Content Graphs

Guest Lecturer #2: James Fan (IBM Research) on Watson Beyond Jeopardy!: Challenges and Approaches

Readings:

á      Developing and Evaluating Link Prediction Algorithms for Speaker Content Graphs (Greenfield and Campbell, ICASSP 2013)

á      VizLinc: Integrating Information Extraction, Search, Graph Analysis, and Geo-location for the Visual Exploration of Large Data Sets (J.C. Acevedo-Aviles et al., KDD 2014: IDEA Workshop)

á      Building Watson: An Overview of the DeepQA Project (D. Ferruci et al., AI Magazine 2010)

á      This is Watson (IBM Journal of Research and Development, Issues 3-4, 2012)

á      Medical Relation Extraction with Manifold Models (C. Wang and J. Fan, ACL 2014)

 

Lecture #8: October 27, 2014

Topic: Graphical Models

Readings:

á      Ch 10 of Scaling Up Machine Learning

á      Ch 11 of Scaling Up Machine Learning

á      Online Learning for Latent Dirichlet Allocation (Matthew Hoffman, David Blei, and Francis Bach, NIPS 2010)

o   Supplemental material

o   Code

 

Lecture #9: November 3, 2014

Topic: Graphical Models & Clustering

Readings:

á      Reducing the Sampling Complexity of Topic Models (Aaron Li et al., KDD 2014 best research paper)

á      Ch 12 of Scaling Up Machine Learning

á      Ch 13 of Scaling Up Machine Learning

 

Lecture #10: November 10, 2014

Topic: Online learning, SSL & Feature Selection

Readings:

á      Ch 14 of Scaling Up Machine Learning

á      Ch 15 of Scaling Up Machine Learning

á      Ch 17 of Scaling Up Machine Learning

 

Lecture #11: November 17, 2014

Topic: Sampling & Sketching

Readings:

á      A Space Efficient Streaming Algorithm for Triangle Counting Using the Birthday Paradox (M. Jha et al., KDDÕ13 Best Student Paper Award)

á      Graph Sample and Hold: A Framework for Big-Graph Analytics (N.K. Ahmed et al., KDDÕ14 paper)

á      Simple and Deterministic Matrix Sketching (Edo Liberty, KDDÕ13 best research paper)

á      Optional: Sampling for Big Data (Cormode & Duffield, KDDÕ14 Tutorial)

 

Lecture #12: November 24, 2014

Topic: Vowpal Wabbit

Guest Lecturer: Alekh Agarwal, MSR NYC

Readings:

á      A Reliable Effective Terascale Linear Learning System (Alekh Agarwal et al., arXiv:1110.4198 [cs.LG], 2011)

á      Adaptive Subgradient Methods for Online Learning and Stochastic Optimization (John Duchi, Elad Hazan, and Yoram Singer, JMLR 2011)

á      Feature Hashing for Large Scale Multitask Learning (Kilian Weinberger et al., ICML 2009)

á      Hash Kernels for Structured Data, (Qinfeng Shi et al., AISTAT 2009)

 

Lecture #13: December 1, 2014

Topic #1: Visualization of Big Data

Guest Lecturer: Yifan Hu (Yahoo! Labs) on Visualizing Graphs and Text Data

Readings:

 

Topic #2: Class project presentations

 

Lecture #14: December 8, 2014

No class; Tina is at NIPS.