[Fall 2014 -- 16:198:598]

Topics in Artificial Intelligence (16:198:598) -- Machine Learning with Large-scale Data

Lecture #1: September 8, 2014

Topic: Introduction & Overview

Readings:

· Ch 1 of Scaling Up Machine Learning

· Ch 2 of MMDS on MapReduce and the New Software Stack

· New Templates for Scalable Data Analysis (A. Ahmed, A. Smola, and M. Weimer, WWW 2012 Tutorial)

Lecture #2: September 15, 2014

Topic: Statistical Queries & their Uses on Distributed Platforms

Readings:

· [Classic Paper] Efficient Noise-Tolerant Learning From Statistical Queries (M. Kearns, JACM 1998)

· Map-Reduce for Machine Learning on Multicore (C.T. Chu et al., NIPS 2006)

· Stochastic Gradient Boosted Distributed Decision Trees (J. Ye et al., CIKM 2009)

· Optional: Modeling with Hadoop (V. Narayanan & M. Bhandarkar, KDD’11 Tutorial)

Lecture #3: September 22, 2014

Topic: Frameworks for Scaling Up Machine Learning, Part I

Readings:

· Ch 2 of Scaling Up Machine Learning

· Ch 3 of Scaling Up Machine Learning

· Ch 4 of Scaling Up Machine Learning

Lecture #4: September 29, 2014

Topic: Frameworks for Scaling Up Machine Learning, Part II

Readings:

· Pregel: A System for Large-scale Graph Processing (G. Malewicz et al., SIGMOD 2010)

· GraphLab: A New Framework For Parallel Machine Learning (Y. Low et al., UAI 2010)

· Parameter Server for Distributed Machine Learning (M. Li et al., NIPS 2013 Big Learning Workshop)

Lecture #5: October 6, 2014

Topic: Computing Reputation Scores for Nodes in Big Social Networks

Guest Lecturer: Vinayak Javaly (Lenddo) on PageRank-type Algorithm to Determine Creditworthiness

Readings:

· [Classic Paper] The Anatomy of a Large-Scale Hypertextual Web Search Engine (Sergey Brin and Lawrence Page, Stanford Technical Report 1998)

· [Classic Paper] The PageRank Citation Ranking: Bringing Order to the Web (Lawrence Page, Sergey Brin , Rajeev Motwani , Terry Winograd, Stanford Technical Report 1999)

· [Classic Paper] MapReduce: Simplified Data Processing on Large Clusters (Jeffrey Dean and Sanjay Ghemawat, OSDI 2004)

· PageRank beyond the Web (D. Gleich, arxiv:1407.5107 [cs.SI], July 2014)

· Local Graph Partitioning using PageRank Vectors (R. Andersen, F. Chung, and K. Lang, FOCS 2006)

· Fast Matrix Computations for Pairwise and Columnwise Commute Times and Katz Scores (F. Bonchi et al., Internet Mathematics 2012).

Lecture #6: October 13, 2014

Topic: Parallelizing SVMs & Learning to Rank

Readings:

· Ch 6 of Scaling Up Machine Learning

· Ch 7 of Scaling Up Machine Learning

· Ch 8 of Scaling Up Machine Learning

Lecture #7: October 20, 2014

Topic: Applications

Guest Lecturer #1: Kara Greenfield (MIT Lincoln Laboratory) on Developing and Evaluating Link Prediction Algorithms for Speaker Content Graphs

Guest Lecturer #2: James Fan (IBM Research) on Watson Beyond Jeopardy!: Challenges and Approaches

Readings:

· Developing and Evaluating Link Prediction Algorithms for Speaker Content Graphs (Greenfield and Campbell, ICASSP 2013)

· VizLinc: Integrating Information Extraction, Search, Graph Analysis, and Geo-location for the Visual Exploration of Large Data Sets (J.C. Acevedo-Aviles et al., KDD 2014: IDEA Workshop)

· Building Watson: An Overview of the DeepQA Project (D. Ferruci et al., AI Magazine 2010)

· This is Watson (IBM Journal of Research and Development, Issues 3-4, 2012)

· Medical Relation Extraction with Manifold Models (C. Wang and J. Fan, ACL 2014)

Lecture #8: October 27, 2014

Topic: Graphical Models

Readings:

· Ch 10 of Scaling Up Machine Learning

· Ch 11 of Scaling Up Machine Learning

· Online Learning for Latent Dirichlet Allocation (Matthew Hoffman, David Blei, and Francis Bach, NIPS 2010)

o Supplemental material

o Code

Lecture #9: November 3, 2014

Topic: Graphical Models & Clustering

Readings:

· Reducing the Sampling Complexity of Topic Models (Aaron Li et al., KDD 2014 best research paper)

· Ch 12 of Scaling Up Machine Learning

· Ch 13 of Scaling Up Machine Learning

Lecture #10: November 10, 2014

Topic: Online learning, SSL & Feature Selection

Readings:

· Ch 14 of Scaling Up Machine Learning

· Ch 15 of Scaling Up Machine Learning

· Ch 17 of Scaling Up Machine Learning

Lecture #11: November 17, 2014

Topic: Sampling & Sketching

Readings:

· A Space Efficient Streaming Algorithm for Triangle Counting Using the Birthday Paradox (M. Jha et al., KDD’13 Best Student Paper Award)

· Graph Sample and Hold: A Framework for Big-Graph Analytics (N.K. Ahmed et al., KDD’14 paper)

· Simple and Deterministic Matrix Sketching (Edo Liberty, KDD’13 best research paper)

· Optional: Sampling for Big Data (Cormode & Duffield, KDD’14 Tutorial)

Lecture #12: November 24, 2014

Topic: Vowpal Wabbit

Guest Lecturer: Alekh Agarwal, MSR NYC

Readings:

· A Reliable Effective Terascale Linear Learning System (Alekh Agarwal et al., arXiv:1110.4198 [cs.LG], 2011)

· Adaptive Subgradient Methods for Online Learning and Stochastic Optimization (John Duchi, Elad Hazan, and Yoram Singer, JMLR 2011)

· Feature Hashing for Large Scale Multitask Learning (Kilian Weinberger et al., ICML 2009)

· Hash Kernels for Structured Data, (Qinfeng Shi et al., AISTAT 2009)

Lecture #13: December 1, 2014

Topic #1: Visualization of Big Data

Guest Lecturer: Yifan Hu (Yahoo! Labs) on Visualizing Graphs and Text Data

Readings:

Emden R. Gansner, Yifan Hu, Stephen C. North: Interactive Visualization of Streaming Text Data with Dynamic Maps. J. Graph Algorithms Appl. 17(4): 515-540 (2013)
Emden R. Gansner, Yifan Hu, Shankar Krishnan: COAST: A Convex Optimization Approach to Stress-Based Embedding. Graph Drawing 2013: 268-279
Emden R. Gansner, Yifan Hu, Stephen C. North: A Maxent-Stress Model for Graph Layout. IEEE Trans. Vis. Comput. Graph. 19(6): 927-940 (2013)
Marc Khoury, Yifan Hu, Shankar Krishnan, Carlos Eduardo Scheidegger: Drawing Large Graphs by Low-Rank Stress Majorization. Comput. Graph. Forum 31(3): 975-984 (2012)
Yifan Hu, Emden R. Gansner, Stephen G. Kobourov: Visualizing Graphs and Clusters as Maps. IEEE Computer Graphics and Applications 30(6): 54-66 (2010)

Topic #2: Class project presentations

Lecture #14: December 8, 2014

No class; Tina is at NIPS.