[Fall 2013 – 01:198:444] Introduction to Data Science

 

Schedule / Syllabus (Subject to Change)

 

Lecture #

Date

Topics

Reading Material

Material Assigned & Due

Notes

1

Thu 9/5

Overview & Introduction

       http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century

       http://www.quora.com/What-is-data-science

       http://www.quora.com/Career-Advice/How-do-I-become-a-data-scientist

       http://nirvacana.com/thoughts/becoming-a-data-scientist/

       http://nyti.ms/10QarGu

       http://nyti.ms/WT9gDd

       http://www.cs.cmu.edu/~tom/pubs/MachineLearning.pdf

 

 

2

Mon 9/9

Data Visualization

Text:

       http://www.itl.nist.gov/div898/handbook/eda/eda.htm

Slides:

       http://dataiap.github.io/dataiap/lectures/day2.pdf

       http://www.cs.uiuc.edu/homes/hanj/cs412/bk3_slides/02Data.pdf

       http://www.cc.gatech.edu/~agray/4245fall10/lecture25.pdf

Optional:

       http://en.wikipedia.org/wiki/Exploratory_data_analysis

 

 

3

Thu 9/12

Data Wrangling & Pre-processing

Text:

       http://research.microsoft.com/en-us/um/people/nath/docs/datawrangling_ivj2011.pdf

Slides:

       http://www.cs.uiuc.edu/homes/hanj/cs412/bk3_slides/03Preprocessing.ppt

 

 

4

Mon 9/16

Finding Similar Items I

Text:

       http://infolab.stanford.edu/~ullman/mmds/ch3.pdf

 

 

5

Thu 9/19

Finding Similar Items II

Text:

       http://infolab.stanford.edu/~ullman/mmds/ch3.pdf

 

 

6

Mon 9/23

Probability Review

Text (both of these):

       http://www.stat.cmu.edu/~hseltman/309/Book/chapter3.pdf

       http://www.ics.uci.edu/~smyth/courses/cs274/notes/notes1.pdf

Slides:

       http://www.stanford.edu/class/cs246/slides/ProbSession.pdf

       http://www.autonlab.org/tutorials/prob18.pdf (Pages 1 to 25)

       http://www.cs.princeton.edu/courses/archive/spring12/cos424/pdf/lecture02.pdf (Pages 1-23)

Optional:

       http://cs229.stanford.edu/section/cs229-prob.pdf

HW#1 Out

Guest Lecturer: Vukosi Marivate

7

Thu 9/26

Working with Big Data

 

Guest Lecturer: Vinayak Javaly

8

Mon 9/30

Data Streams

Text:

       http://infolab.stanford.edu/~ullman/mmds/ch4.pdf

 

 

9

Thu 10/3

Linear Algebra Review

Text:

       http://www.seas.upenn.edu/~jadbabai/ESE504/LAreview.pdf

Slides:

       http://www.stanford.edu/class/cs246/slides/LinAlgSession.pdf

 

Guest Lecturer: Vukosi Marivate

10

Mon 10/7

Decision Trees

Text (one of these):

       http://www.cs.princeton.edu/courses/archive/spring07/cos424/papers/mitchell-dectrees.pdf

       http://robotics.stanford.edu/~nilsson/MLBOOK.pdf (Chapter 6)

       http://www-users.cs.umn.edu/~kumar/dmbook/ch4.pdf

Slides:

       http://www.autonlab.org/tutorials/dtree18.pdf

       http://www.autonlab.org/tutorials/infogain11.pdf

HW#1 Due

 

11

Thu 10/10

Nave Bayes

Text:

       http://www.cs.cmu.edu/~tom/mlbook/NBayesLogReg.pdf

Slides:

       http://www.autonlab.org/tutorials/prob18.pdf (Pages 26 to 58)

 

 

12

Mon 10/14

Logistic Regression

Text:

       http://www.cs.cmu.edu/~tom/mlbook/NBayesLogReg.pdf

 

 

13

Thu 10/17

Evaluation Techniques

Text (both of these):

       http://robotics.stanford.edu/~ronnyk/accEst.ps

       http://www2.cs.uregina.ca/~dbd/cs831/notes/confusion_matrix/confusion_matrix.html

 

Guest Lecturer: Vukosi Marivate

14

Mon 10/21

Linear Regression

Text:

       http://www-stat.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf (Chapter 3)

Slides:

       http://www.autonlab.org/tutorials/introreg05.pdf

HW#2 Out

HW#1 Graded

15

Thu 10/24

Perceptron

In-class Project Pitches

Text:

       http://i.stanford.edu/~ullman/pub/ch12.pdf

Project Proposals Due

 

16

Mon 10/28

Ensemble Methods

 

Text (both of these):

       http://cseweb.ucsd.edu/~yfreund/papers/IntroToBoosting.pdf

       http://www.springerlink.com/content/u0p06167n6173512/

 

 

17

Thu 10/31

Support Vector Machines

Text (both of these):

       http://select.cs.cmu.edu/class/10701-F09/readings/hearst98.pdf

       http://research.microsoft.com/en-us/um/people/cburges/papers/SVMTutorial.pdf

Slides:

       http://www.autonlab.org/tutorials/svm15.pdf

 

 

18

Mon 11/4

Clustering: k-means

Text (one of these)

       http://infolab.stanford.edu/~ullman/mmds/ch7.pdf

       http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf

Slides:

       http://www.autonlab.org/tutorials/kmeans11.pdf

Optional:

       http://www-stat.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf  (Section 14.3)

HW#2 Due

Project Pitches & Proposals graded

19

Thu 11/7

Clustering: Hierarchical Agglomerative Clustering

Text:

       http://robotics.stanford.edu/~nilsson/MLBOOK.pdf (Chapter 9)

 

 

20

Mon 11/11

Dimensionality Reduction

Text:

       http://infolab.stanford.edu/~ullman/mmds/ch11.pdf

Optional:

       http://www.snl.salk.edu/~shlens/pca.pdf

HW#3 Out

 

21

Thu 11/14

Frequent Itemsets

Text:

       http://infolab.stanford.edu/~ullman/mmds/ch6.pdf

Optional:

       http://www-users.cs.umn.edu/~kumar/dmbook/ch6.pdf

       http://www-stat.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf  (Section 14.2)

 

 

22

Mon 11/18

Link Analysis

       http://infolab.stanford.edu/~ullman/mmds/ch5.pdf

 

HW#2 Graded

23

Thu 11/21

Mining Social-Network Graphs

       http://infolab.stanford.edu/~ullman/mmds/ch10.pdf

 

 

24

Mon 11/25

Recommendation Systems

       http://infolab.stanford.edu/~ullman/mmds/ch9.pdf

HW #3 Due

 

25

Thu 11/28

 No class (Thanksgiving)

 

 

26

Mon 12/2

Anomaly Detection

       http://www.cs.umn.edu/tech_reports_upload/tr2007/07-017.pdf

 

 

27

Thu 12/5

Project Presentations

Group 1 Project Presentations Due

 

28

Mon 12/9

Project Presentations

Group 2 Project Presentations Due

HW#3 Graded

Thu 12/19

 

 

Project Reports due

 

Mon 12/23

 

 

 

Final Grades Released