[Spring 2015 – 01:198:443] Introduction to Data Science

 

Schedule / Syllabus (Subject to Change)

 

Lecture #

Date

Topics

Reading Material

Material Assigned & Due

Notes

1

Thu 1/22

Overview & Introduction

Probability Review

Linear Algebra Review

Overview & Introduction:

       http://infolab.stanford.edu/~ullman/mmds/ch1.pdf

       http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century

       http://www.quora.com/What-is-data-science

       http://www.quora.com/Career-Advice/How-do-I-become-a-data-scientist

       http://nirvacana.com/thoughts/becoming-a-data-scientist/

       http://nyti.ms/10QarGu

       http://nyti.ms/WT9gDd

       http://www.cs.cmu.edu/~tom/pubs/MachineLearning.pdf

Probability:

      Text (both of these):

       http://www.stat.cmu.edu/~hseltman/309/Book/chapter3.pdf

       http://www.ics.uci.edu/~smyth/courses/cs274/notes/notes1.pdf

      Slides:

       http://snap.stanford.edu/class/cs246-2014/slides/ProbSection.pdf

       http://www.autonlab.org/tutorials/prob18.pdf (Pages 1 to 25)

       http://www.cs.princeton.edu/courses/archive/spring12/cos424/pdf/lecture02.pdf (Pages 1-23)

      Optional:

       http://cs229.stanford.edu/section/cs229-prob.pdf

Linear Algebra:

      Text:

       http://www.seas.upenn.edu/~jadbabai/ESE504/LAreview.pdf

      Slides:

http://snap.stanford.edu/class/cs246-2014/slides/LinAlgSession.pdf

 

 

2

Mon 1/26

Data Visualization

Text:

       http://www.itl.nist.gov/div898/handbook/eda/eda.htm

Slides:

       http://dataiap.github.io/dataiap/lectures/day2.pdf

       http://www.cs.uiuc.edu/homes/hanj/cs412/bk3_slides/02Data.pdf

       http://www.cc.gatech.edu/~agray/4245fall10/lecture25.pdf

Optional:

       http://en.wikipedia.org/wiki/Exploratory_data_analysis

 

 

3

Thu 1/29

Data Wrangling & Pre-processing

Text:

       http://research.microsoft.com/en-us/um/people/nath/docs/datawrangling_ivj2011.pdf

Slides:

       http://www.cs.uiuc.edu/homes/hanj/cs412/bk3_slides/03Preprocessing.ppt

 

 

4

Mon 2/2

Finding Similar Items

Text:

       http://infolab.stanford.edu/~ullman/mmds/ch3.pdf

 

 

5

Thu 2/5

Data Streams

Text:

       http://infolab.stanford.edu/~ullman/mmds/ch4.pdf

 

 

6

Mon 2/9

Decision Trees

Text (one of these):

       http://www.cs.princeton.edu/courses/archive/spring07/cos424/papers/mitchell-dectrees.pdf

       http://robotics.stanford.edu/~nilsson/MLBOOK.pdf (Chapter 6)

       http://www-users.cs.umn.edu/~kumar/dmbook/ch4.pdf

Slides:

       http://www.autonlab.org/tutorials/dtree18.pdf

       http://www.autonlab.org/tutorials/infogain11.pdf

HW#1 Out

7

Thu 2/12

Nave Bayes

Text:

       http://www.cs.cmu.edu/~tom/mlbook/NBayesLogReg.pdf

Slides:

       http://www.autonlab.org/tutorials/prob18.pdf (Pages 26 to 58)

 

8

Mon 2/16

Logistic Regression

Text:

       http://www.cs.cmu.edu/~tom/mlbook/NBayesLogReg.pdf

 

 

9

Thu 2/19

Linear Regression

Text:

       http://www-stat.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf (Chapter 3)

Slides:

       http://www.autonlab.org/tutorials/introreg05.pdf

 

10

Mon 2/23

Evaluation Techniques for Supervised Learning

Text (both of these):

       http://robotics.stanford.edu/~ronnyk/accEst.ps

       http://www2.cs.uregina.ca/~dbd/cs831/notes/confusion_matrix/confusion_matrix.html

HW#1 Due

 

11

Thu 2/26

Ensemble Methods

 

Text (both of these):

       http://cseweb.ucsd.edu/~yfreund/papers/IntroToBoosting.pdf

       http://www.springerlink.com/content/u0p06167n6173512/

 

 

12

Mon 3/2

Perceptron &

Neural Networks

Text:

       http://i.stanford.edu/~ullman/pub/ch12.pdf

       http://csc.lsu.edu/~jianhua/nn.pdf

 

 

13

Thu 3/5

Support Vector Machines

Text (both of these):

       http://select.cs.cmu.edu/class/10701-F09/readings/hearst98.pdf

       http://research.microsoft.com/en-us/um/people/cburges/papers/SVMTutorial.pdf

Slides:

       http://www.autonlab.org/tutorials/svm15.pdf

 

14

Mon 3/9

Map-Reduce

       http://infolab.stanford.edu/~ullman/mmds/ch2.pdf

HW#2 Out

HW#1 Graded

15

Thu 3/12

In-class Project Pitches

Project Proposals Due

 

Mon 3/16

No class (Spring Break)

 

 

Thu 3/19

No class (Spring Break)

 

 

16

Mon 3/23

k-means Clustering, Evaluation Techniques for Unsupervised Learning

Text (one of these)

       http://infolab.stanford.edu/~ullman/mmds/ch7.pdf

       http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf

Slides:

       http://www.autonlab.org/tutorials/kmeans11.pdf

Optional:

       http://www-stat.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf  (Section 14.3)

HW#2 Due

Project Pitches & Proposals graded

17

Thu 3/26

Hierarchical Agglomerative Clustering, Spectral Clustering

Text:

       http://robotics.stanford.edu/~nilsson/MLBOOK.pdf (Chapter 9)

       http://ai.stanford.edu/~ang/papers/nips01-spectral.pdf

 

 

18

Mon 3/30

Dimensionality Reduction

Text:

       http://infolab.stanford.edu/~ullman/mmds/ch11.pdf

Optional:

       http://www.snl.salk.edu/~shlens/pca.pdf

HW#3 Out

 

19

Thu 4/2

Frequent Itemsets

Text:

       http://infolab.stanford.edu/~ullman/mmds/ch6.pdf

Optional:

       http://www-users.cs.umn.edu/~kumar/dmbook/ch6.pdf

       http://www-stat.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf  (Section 14.2)

 

 

20

Mon 4/6

Link Analysis

       http://infolab.stanford.edu/~ullman/mmds/ch5.pdf

 

HW#2 Graded

21

Thu 4/9

Recommendation Systems

       http://infolab.stanford.edu/~ullman/mmds/ch9.pdf

 

 

22

Mon 4/13

Exam Review

 

HW #3 Due

 

23

Thu 4/16

In-class Exam

 

 

24

Mon 4/20

Mining Social-Network Graphs

       http://infolab.stanford.edu/~ullman/mmds/ch10.pdf

 

 

25

Thu 4/23

Anomaly Detection

       http://www.cs.umn.edu/tech_reports_upload/tr2007/07-017.pdf

 

26

Mon 4/27

Project Presentations

Group 1 Project Presentations Due

HW#3 Graded

27

Thu 4/30

TBD

Guest lecturer

 

28

Mon 5/4

Project Presentations

 

Group 2 Project Presentations Due

Exam Graded

Mon 5/11

 

 

Project Reports due

 

Thu 5/14

 

 

 

Final Grades Released