[Fall 2014 -- 16:198:598]
Topics in
Artificial Intelligence (16:198:598) -- Machine Learning with Large-scale Data
Lecture #1: September 8, 2014
Topic: Introduction
& Overview
Readings:
á Ch 1 of Scaling
Up Machine Learning
á Ch 2 of MMDS on MapReduce
and the New Software Stack
á New Templates for
Scalable Data Analysis (A. Ahmed, A. Smola, and
M. Weimer, WWW 2012 Tutorial)
Lecture #2: September 15, 2014
Topic: Statistical
Queries & their Uses on Distributed Platforms
Readings:
á [Classic
Paper] Efficient
Noise-Tolerant Learning From Statistical Queries (M. Kearns, JACM 1998)
á Map-Reduce
for Machine Learning on Multicore (C.T. Chu et al., NIPS 2006)
á Stochastic
Gradient Boosted Distributed Decision Trees (J. Ye et al., CIKM 2009)
á Optional:
Modeling
with Hadoop (V. Narayanan & M. Bhandarkar, KDDÕ11 Tutorial)
Lecture #3: September 22, 2014
Topic: Frameworks
for Scaling Up Machine Learning, Part I
Readings:
á Ch 2 of Scaling
Up Machine Learning
á Ch 3 of Scaling
Up Machine Learning
á Ch 4 of Scaling
Up Machine Learning
Lecture #4: September 29, 2014
Topic: Frameworks
for Scaling Up Machine Learning, Part II
Readings:
á Pregel: A System for Large-scale Graph Processing (G. Malewicz et al.,
SIGMOD 2010)
á GraphLab:
A New Framework For Parallel Machine Learning (Y. Low et al., UAI 2010)
á Parameter Server for Distributed Machine
Learning (M. Li et al., NIPS 2013 Big Learning
Workshop)
Lecture #5: October 6, 2014
Topic: Computing
Reputation Scores for Nodes in Big Social Networks
Guest Lecturer: Vinayak
Javaly (Lenddo) on PageRank-type
Algorithm to Determine Creditworthiness
Readings:
á [Classic Paper] The Anatomy of a
Large-Scale Hypertextual Web Search Engine
(Sergey Brin and Lawrence Page, Stanford Technical
Report 1998)
á [Classic Paper] The PageRank Citation
Ranking: Bringing Order to the Web (Lawrence Page, Sergey Brin ,
Rajeev Motwani , Terry Winograd,
Stanford Technical Report 1999)
á [Classic Paper] MapReduce: Simplified Data Processing on Large Clusters
(Jeffrey Dean and Sanjay Ghemawat, OSDI 2004)
á PageRank
beyond the Web (D. Gleich, arxiv:1407.5107 [cs.SI], July 2014)
á Local Graph
Partitioning using PageRank Vectors (R. Andersen, F. Chung, and K. Lang,
FOCS 2006)
á Fast
Matrix Computations for Pairwise and Columnwise
Commute Times and Katz Scores (F. Bonchi et al., Internet Mathematics 2012).
Lecture #6: October 13, 2014
Topic: Parallelizing
SVMs & Learning to Rank
Readings:
á Ch 6 of Scaling
Up Machine Learning
á Ch 7 of Scaling
Up Machine Learning
á Ch 8 of Scaling
Up Machine Learning
Lecture #7: October 20, 2014
Topic: Applications
Guest Lecturer #1: Kara
Greenfield (MIT Lincoln Laboratory) on
Developing and
Evaluating Link Prediction Algorithms for Speaker Content Graphs
Guest Lecturer #2: James
Fan (IBM Research) on Watson Beyond Jeopardy!: Challenges and Approaches
Readings:
á Developing
and Evaluating Link Prediction Algorithms for Speaker Content Graphs
(Greenfield and Campbell, ICASSP 2013)
á VizLinc: Integrating Information Extraction, Search, Graph
Analysis, and Geo-location for the Visual Exploration of Large Data Sets
(J.C. Acevedo-Aviles et al.,
KDD 2014: IDEA Workshop)
á Building
Watson: An Overview of the DeepQA Project (D. Ferruci et al.,
AI Magazine 2010)
á This
is Watson (IBM Journal of Research and Development, Issues 3-4, 2012)
á Medical Relation
Extraction with Manifold Models (C. Wang and J. Fan, ACL 2014)
Lecture #8: October 27, 2014
Topic: Graphical Models
Readings:
á Ch 10 of
Scaling
Up Machine Learning
á Ch 11 of
Scaling
Up Machine Learning
á Online
Learning for Latent Dirichlet Allocation (Matthew
Hoffman, David Blei, and Francis Bach, NIPS 2010)
o
Code
Lecture #9: November 3, 2014
Topic: Graphical
Models & Clustering
Readings:
á Reducing the Sampling
Complexity of Topic Models (Aaron Li et
al., KDD 2014 best research paper)
á Ch 12 of
Scaling
Up Machine Learning
á Ch 13 of
Scaling
Up Machine Learning
Lecture #10: November 10, 2014
Topic: Online
learning, SSL & Feature Selection
Readings:
á Ch 14 of
Scaling
Up Machine Learning
á Ch 15 of
Scaling
Up Machine Learning
á Ch 17 of
Scaling
Up Machine Learning
Lecture #11: November 17, 2014
Topic: Sampling
& Sketching
Readings:
á A Space Efficient Streaming
Algorithm for Triangle Counting Using the Birthday Paradox (M. Jha et al., KDDÕ13 Best Student Paper Award)
á Graph Sample and Hold: A Framework
for Big-Graph Analytics (N.K. Ahmed et al., KDDÕ14 paper)
á Simple
and Deterministic Matrix Sketching (Edo Liberty, KDDÕ13 best research paper)
á Optional:
Sampling
for Big Data (Cormode & Duffield, KDDÕ14
Tutorial)
Lecture #12: November 24, 2014
Topic: Vowpal Wabbit
Guest Lecturer: Alekh Agarwal, MSR NYC
Readings:
á A
Reliable Effective Terascale Linear Learning System
(Alekh Agarwal et al., arXiv:1110.4198
[cs.LG], 2011)
á Adaptive Subgradient Methods for Online Learning and Stochastic
Optimization (John Duchi, Elad
Hazan, and Yoram Singer,
JMLR 2011)
á Feature
Hashing for Large Scale Multitask Learning (Kilian
Weinberger et al., ICML 2009)
á Hash
Kernels for Structured Data, (Qinfeng Shi et al., AISTAT 2009)
Lecture #13: December 1, 2014
Topic #1:
Visualization of Big Data
Guest Lecturer: Yifan Hu (Yahoo! Labs)
on Visualizing Graphs
and Text Data
Readings:
Topic #2: Class
project presentations
Lecture #14: December 8, 2014
No class; Tina is at
NIPS.