[Fall 2013 – 01:198:444] Introduction to Data Science
Schedule / Syllabus (Subject to Change)
Lecture #
Date
Topics
Reading Material
Material Assigned & Due
Notes
1
Thu 9/5
Overview & Introduction
á http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
á http://www.quora.com/What-is-data-science
á http://www.quora.com/Career-Advice/How-do-I-become-a-data-scientist
á http://nirvacana.com/thoughts/becoming-a-data-scientist/
á http://nyti.ms/10QarGu
á http://nyti.ms/WT9gDd
á http://www.cs.cmu.edu/~tom/pubs/MachineLearning.pdf
2
Mon 9/9
Data Visualization
Text:
á http://www.itl.nist.gov/div898/handbook/eda/eda.htm
Slides:
á http://dataiap.github.io/dataiap/lectures/day2.pdf
á http://www.cs.uiuc.edu/homes/hanj/cs412/bk3_slides/02Data.pdf
á http://www.cc.gatech.edu/~agray/4245fall10/lecture25.pdf
Optional:
á http://en.wikipedia.org/wiki/Exploratory_data_analysis
3
Thu 9/12
Data Wrangling & Pre-processing
á http://research.microsoft.com/en-us/um/people/nath/docs/datawrangling_ivj2011.pdf
á http://www.cs.uiuc.edu/homes/hanj/cs412/bk3_slides/03Preprocessing.ppt
4
Mon 9/16
Finding Similar Items I
á http://infolab.stanford.edu/~ullman/mmds/ch3.pdf
5
Thu 9/19
Finding Similar Items II
6
Mon 9/23
Probability Review
Text (both of these):
á http://www.stat.cmu.edu/~hseltman/309/Book/chapter3.pdf
á http://www.ics.uci.edu/~smyth/courses/cs274/notes/notes1.pdf
á http://www.stanford.edu/class/cs246/slides/ProbSession.pdf
á http://www.autonlab.org/tutorials/prob18.pdf (Pages 1 to 25)
á http://www.cs.princeton.edu/courses/archive/spring12/cos424/pdf/lecture02.pdf (Pages 1-23)
á http://cs229.stanford.edu/section/cs229-prob.pdf
HW#1 Out
Guest Lecturer: Vukosi Marivate
7
Thu 9/26
Working with Big Data
–
Guest Lecturer: Vinayak Javaly
8
Mon 9/30
Data Streams
á http://infolab.stanford.edu/~ullman/mmds/ch4.pdf
9
Thu 10/3
Linear Algebra Review
á http://www.seas.upenn.edu/~jadbabai/ESE504/LAreview.pdf
á http://www.stanford.edu/class/cs246/slides/LinAlgSession.pdf
10
Mon 10/7
Decision Trees
Text (one of these):
á http://www.cs.princeton.edu/courses/archive/spring07/cos424/papers/mitchell-dectrees.pdf
á http://robotics.stanford.edu/~nilsson/MLBOOK.pdf (Chapter 6)
á http://www-users.cs.umn.edu/~kumar/dmbook/ch4.pdf
á http://www.autonlab.org/tutorials/dtree18.pdf
á http://www.autonlab.org/tutorials/infogain11.pdf
HW#1 Due
11
Thu 10/10
Na•ve Bayes
á http://www.cs.cmu.edu/~tom/mlbook/NBayesLogReg.pdf
á http://www.autonlab.org/tutorials/prob18.pdf (Pages 26 to 58)
12
Mon 10/14
Logistic Regression
13
Thu 10/17
Evaluation Techniques
á http://robotics.stanford.edu/~ronnyk/accEst.ps
á http://www2.cs.uregina.ca/~dbd/cs831/notes/confusion_matrix/confusion_matrix.html
14
Mon 10/21
Linear Regression
á http://www-stat.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf (Chapter 3)
á http://www.autonlab.org/tutorials/introreg05.pdf
HW#2 Out
HW#1 Graded
15
Thu 10/24
Perceptron
In-class Project Pitches
á http://i.stanford.edu/~ullman/pub/ch12.pdf
Project Proposals Due
16
Mon 10/28
Ensemble Methods
á http://cseweb.ucsd.edu/~yfreund/papers/IntroToBoosting.pdf
á http://www.springerlink.com/content/u0p06167n6173512/
17
Thu 10/31
Support Vector Machines
á http://select.cs.cmu.edu/class/10701-F09/readings/hearst98.pdf
á http://research.microsoft.com/en-us/um/people/cburges/papers/SVMTutorial.pdf
á http://www.autonlab.org/tutorials/svm15.pdf
18
Mon 11/4
Clustering: k-means
Text (one of these)
á http://infolab.stanford.edu/~ullman/mmds/ch7.pdf
á http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf
á http://www.autonlab.org/tutorials/kmeans11.pdf
á http://www-stat.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf (Section 14.3)
HW#2 Due
Project Pitches & Proposals graded
19
Thu 11/7
Clustering: Hierarchical Agglomerative Clustering
á http://robotics.stanford.edu/~nilsson/MLBOOK.pdf (Chapter 9)
20
Mon 11/11
Dimensionality Reduction
á http://infolab.stanford.edu/~ullman/mmds/ch11.pdf
á http://www.snl.salk.edu/~shlens/pca.pdf
HW#3 Out
21
Thu 11/14
Frequent Itemsets
á http://infolab.stanford.edu/~ullman/mmds/ch6.pdf
á http://www-users.cs.umn.edu/~kumar/dmbook/ch6.pdf
á http://www-stat.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf (Section 14.2)
22
Mon 11/18
Link Analysis
á http://infolab.stanford.edu/~ullman/mmds/ch5.pdf
HW#2 Graded
23
Thu 11/21
Mining Social-Network Graphs
á http://infolab.stanford.edu/~ullman/mmds/ch10.pdf
24
Mon 11/25
Recommendation Systems
á http://infolab.stanford.edu/~ullman/mmds/ch9.pdf
HW #3 Due
25
Thu 11/28
No class (Thanksgiving)
26
Mon 12/2
Anomaly Detection
á http://www.cs.umn.edu/tech_reports_upload/tr2007/07-017.pdf
27
Thu 12/5
Project Presentations
Group 1 Project Presentations Due
28
Mon 12/9
Group 2 Project Presentations Due
HW#3 Graded
Thu 12/19
Project Reports due
Mon 12/23
Final Grades Released