[Spring 2015 – 01:198:443] Introduction to Data Science
Schedule / Syllabus (Subject to Change; Updated 3/7/2015)
Lecture #
Date
Topics
Reading Material
Material Assigned & Due
Notes
1
Thu 1/22
Overview & Introduction
Probability Review
Linear Algebra Review
Overview & Introduction:
á http://infolab.stanford.edu/~ullman/mmds/ch1.pdf
á http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
á http://www.quora.com/What-is-data-science
á http://www.quora.com/Career-Advice/How-do-I-become-a-data-scientist
á http://nirvacana.com/thoughts/becoming-a-data-scientist/
á http://nyti.ms/10QarGu
á http://nyti.ms/WT9gDd
á http://www.cs.cmu.edu/~tom/pubs/MachineLearning.pdf
Probability:
Text (both of these):
á http://www.stat.cmu.edu/~hseltman/309/Book/chapter3.pdf
á http://www.ics.uci.edu/~smyth/courses/cs274/notes/notes1.pdf
Slides:
á http://snap.stanford.edu/class/cs246-2014/slides/ProbSection.pdf
á http://www.autonlab.org/tutorials/prob18.pdf (Pages 1 to 25)
á http://www.cs.princeton.edu/courses/archive/spring12/cos424/pdf/lecture02.pdf (Pages 1-23)
Optional:
á http://cs229.stanford.edu/section/cs229-prob.pdf
Linear Algebra:
Text:
á http://www.seas.upenn.edu/~jadbabai/ESE504/LAreview.pdf
http://snap.stanford.edu/class/cs246-2014/slides/LinAlgSession.pdf
2
Mon 1/26
Data Visualization
á http://www.itl.nist.gov/div898/handbook/eda/eda.htm
á http://dataiap.github.io/dataiap/lectures/day2.pdf
á http://www.cs.uiuc.edu/homes/hanj/cs412/bk3_slides/02Data.pdf
á http://www.cc.gatech.edu/~agray/4245fall10/lecture25.pdf
á http://en.wikipedia.org/wiki/Exploratory_data_analysis
3
Thu 1/29
Data Wrangling & Pre-processing
á http://research.microsoft.com/en-us/um/people/nath/docs/datawrangling_ivj2011.pdf
á http://www.cs.uiuc.edu/homes/hanj/cs412/bk3_slides/03Preprocessing.ppt
4
Mon 2/2
Finding Similar Items
á http://infolab.stanford.edu/~ullman/mmds/ch3.pdf
5
Thu 2/5
6
Mon 2/9
Data Streams
á http://infolab.stanford.edu/~ullman/mmds/ch4.pdf
HW#1 Out
7
Thu 2/12
8
Mon 2/16
Decision Trees
Text (one of these):
á http://www.cs.princeton.edu/courses/archive/spring07/cos424/papers/mitchell-dectrees.pdf
á http://robotics.stanford.edu/~nilsson/MLBOOK.pdf (Chapter 6)
á http://www-users.cs.umn.edu/~kumar/dmbook/ch4.pdf
á http://www.autonlab.org/tutorials/dtree18.pdf
á http://www.autonlab.org/tutorials/infogain11.pdf
9
Thu 2/19
Na•ve Bayes
á http://www.cs.cmu.edu/~tom/mlbook/NBayesLogReg.pdf
á http://www.autonlab.org/tutorials/prob18.pdf (Pages 26 to 58)
10
Mon 2/23
HW#1 Due
11
Thu 2/26
Logistic Regression
12
Mon 3/2
13
Thu 3/5
No Class (Snow Day)
–
14
Mon 3/9
Linear Regression
& Perceptron
á http://www-stat.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf (Chapter 3)
á http://i.stanford.edu/~ullman/pub/ch12.pdf
á http://www.autonlab.org/tutorials/introreg05.pdf
HW#2 Out
HW#1 Graded
15
Thu 3/12
In-class Project Pitches
Project Proposals Due
Mon 3/16
No Class (Spring Break)
Thu 3/19
16
Mon 3/23
Evaluation Techniques for Supervised Learning & Ensemble Methods
á http://robotics.stanford.edu/~ronnyk/accEst.ps
á http://www2.cs.uregina.ca/~dbd/cs831/notes/confusion_matrix/confusion_matrix.html
á http://cseweb.ucsd.edu/~yfreund/papers/IntroToBoosting.pdf
á http://www.springerlink.com/content/u0p06167n6173512/
HW#2 Due
Project Pitches & Proposals graded
17
Thu 3/26
Support Vector Machines
á http://select.cs.cmu.edu/class/10701-F09/readings/hearst98.pdf
á http://research.microsoft.com/en-us/um/people/cburges/papers/SVMTutorial.pdf
á http://www.autonlab.org/tutorials/svm15.pdf
18
Mon 3/30
k-means Clustering &
Evaluation Techniques for Unsupervised Learning
Text (one of these)
á http://infolab.stanford.edu/~ullman/mmds/ch7.pdf
á http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf
á http://www.autonlab.org/tutorials/kmeans11.pdf
á http://www-stat.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf (Section 14.3)
HW#3 Out
19
Thu 4/2
Hierarchical Agglomerative Clustering
& Spectral Clustering
á http://robotics.stanford.edu/~nilsson/MLBOOK.pdf (Chapter 9)
á http://ai.stanford.edu/~ang/papers/nips01-spectral.pdf
20
Mon 4/6
Dimensionality Reduction
á http://infolab.stanford.edu/~ullman/mmds/ch11.pdf
á http://www.snl.salk.edu/~shlens/pca.pdf
HW#2 Graded
21
Thu 4/9
Frequent Itemsets
á http://infolab.stanford.edu/~ullman/mmds/ch6.pdf
á http://www-users.cs.umn.edu/~kumar/dmbook/ch6.pdf
á http://www-stat.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf (Section 14.2)
22
Mon 4/13
Link Analysis
á http://infolab.stanford.edu/~ullman/mmds/ch5.pdf
HW #3 Due
23
Thu 4/16
In-class Exam
24
Mon 4/20
Recommendation Systems
á http://infolab.stanford.edu/~ullman/mmds/ch9.pdf
25
Thu 4/23
Mining Social-Network Graphs
á http://infolab.stanford.edu/~ullman/mmds/ch10.pdf
26
Mon 4/27
Project Presentations
Group 1 Project Presentations Due
HW#3 Graded
27
Thu 4/30
Map-Reduce
á http://infolab.stanford.edu/~ullman/mmds/ch2.pdf
Guest Lecturer
28
Mon 5/4
(Last day of class)
Group 2 Project Presentations Due
Exam Graded
Mon 5/11
Project Reports due
Thu 5/14
Final Grades Released