Fall 2019: CS 6220  Data Mining
Techniques, CRN 12564 crosslisted with
Fall 2019: DS 5230  Unsupervised Machine
Learning and Data Mining, CRN 15043
Lecture time: Tuesdays 11:45 – 1:25 pm & Thursdays 2:50 – 4:30 pm

Place: West
Village G, Room 102

Instructor: Tina EliassiRad

Office hours: Tuesdays 1:30 – 3:00 PM in Kariotis
Hall, Room 304

TA: Govind Bhala

Office hours:
Wednesdays 4:50 PM – 6:20 PM in West
Village F, Room 116
Also, available by appointment. Email bhala.g [at] husky [dot] neu
[dot] edu;

TA: Hui “Sophie” Wang

Office hours:
Thursdays 10:00 AM – 11:30 AM in Behrakis
Health Sciences Center, Room 210
Also, available by
appointment. Email wang.hui1 [at] husky [dot] neu [dot] edu;

This 4credit graduatelevel course covers data mining and unsupervised learning. Its prerequisites are:
This course does not have a designated textbook. The readings are assigned in the syllabus (see below). Here are some textbooks (all optional) related to the course.
Lec # 
Date 
Topic 
Readings &
Notes 
1 
R 9/5 
Introduction and Overview 
o Chapter
1 of http://eliassi.org/mmdsbookv2L.pdf
o http://www.cs.cmu.edu/~tom/pubs/MachineLearning.pdf

2 
T 9/10 
Density Estimation 
o http://ned.ipac.caltech.edu/level5/March02/Silverman/Silver_contents.html
o http://eliassi.org/Sheather_StatSci_2004.pdf
o Optional:
Sections 6.66.9 of http://statweb.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf

3 
R 9/12 
Frequent Itemsets & Association Rules 
o Chapter
6 of http://eliassi.org/mmdsbookv2L.pdf o Optional:
Sections 6.16.6 of http://wwwusers.cs.umn.edu/~kumar/dmbook/ch6.pdf

4 
T 9/17 
Frequent Itemsets & Association Rules 
o Chapter
6 of http://eliassi.org/mmdsbookv2L.pdf o Optional:
Sections 6.16.6 of http://wwwusers.cs.umn.edu/~kumar/dmbook/ch6.pdf

5 
R 9/19 
Social Bots (Guest lecturer: 
o The
Rise of Social Bots: https://arxiv.org/abs/1407.5225 o Online
HumanBot Interactions: Detection, Estimation, and Characterization: https://arxiv.org/abs/1703.03107 o Arming
The Public with Artificial Intelligence to Counter Social Bots: https://arxiv.org/abs/1901.00912 o Deception
Strategies and Threats for Online Discussions: https://arxiv.org/abs/1906.11371 
6 
T 9/24 
Finding Similar Items 
o Chapter
3 of http://eliassi.org/mmdsbookv2L.pdf 
7 
R 9/26 
Finding Similar Items 
o Chapter
3 of http://eliassi.org/mmdsbookv2L.pdf 
Homework #1 covers density estimation, frequent itemsets & association rules, plus finding similar items. o out on Thursday September 26 o
due
on Sunday October 6 at 11:59 PM Eastern o graded
by Wednesday October 16 

8 
T 10/1 
Mining Data Streams 
o Chapter
4 of http://eliassi.org/mmdsbookv2L.pdf 
9 
R 10/3 
Mining Data Streams 
o Chapter
4 of http://eliassi.org/mmdsbookv2L.pdf 
10 
T 10/8 
Mining Data Streams 
o Chapter
4 of http://eliassi.org/mmdsbookv2L.pdf 
11 
R 10/10 
Dimensionality Reduction (PCA, SVD, CUR, 
o Chapter
11 of http://eliassi.org/mmdsbookv2L.pdf o
Section 14.5 of http://statweb.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf

12 
T 10/15 
Dimensionality Reduction (PCA, SVD, CUR, 
o Chapter
11 of http://eliassi.org/mmdsbookv2L.pdf o
Section 14.5 of http://statweb.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf
o
https://alex.smola.org/papers/1999/MikSchSmoMuletal99.pdf

Homework
#2 covers mining data streams and dimensionality reduction. o out on Tuesday October 15 o due on Friday October 25
at 11:59 PM Eastern o graded by Monday November 4 

13 
R 10/17 
Dimensionality Reduction 
o tSNE paper: http://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf o
tSNE website: https://lvdmaaten.github.io/tsne/
o
UMAP
paper: https://arxiv.org/abs/1802.03426
o
UMAP
website: https://umaplearn.readthedocs.io/en/latest/
o A
nice presentation on UMAP: https://www.youtube.com/watch?v=nq6iPZVUxZU

14 
T 10/22 
Project
Proposal Pitches (inclass) 
o
Proposals are due at 9:00 AM Eastern on
Tuesday October 22; there are no late days for this assignment. o Graded
by Tuesday October 29 
15 
R 10/24 
Clustering: Kmeans, Gaussian Mixture Models, Expectation Maximization (EM) 
o Chapter
9 of http://robotics.stanford.edu/~nilsson/MLBOOK.pdf
o Sections
7.17.3 of http://eliassi.org/mmdsbookv2L.pdf o Chapter
8 of https://www.cs.cornell.edu/jeh/book2016June9.pdf o Section
14.3 of http://statweb.stanford.edu/~tibs/ElemStatLearn o http://cs229.stanford.edu/notes/cs229notes7b.pdf o http://cs229.stanford.edu/notes/cs229notes8.pdf o Optional:
https://www.cs.rutgers.edu/~mlittman/courses/lightai03/jain99data.pdf
o Optional:
http://web.itu.edu.tr/sgunduz/courses/verimaden/paper/validity_survey.pdf o Optional:
http://www.dbs.ifi.lmu.de/Publikationen/Papers/KDD96.final.frame.pdf

16 
T 10/29 
Midterm Exam (inclass) 
Graded by Tuesday November 12 
17 
R 10/31 
Clustering: Kmeans, Gaussian Mixture Models,
Expectation Maximization (EM) 
o Chapter
9 of http://robotics.stanford.edu/~nilsson/MLBOOK.pdf
o Sections
7.17.3 of http://eliassi.org/mmdsbookv2L.pdf o Chapter
8 of https://www.cs.cornell.edu/jeh/book2016June9.pdf o Section
14.3 of http://statweb.stanford.edu/~tibs/ElemStatLearn o http://cs229.stanford.edu/notes/cs229notes7b.pdf o http://cs229.stanford.edu/notes/cs229notes8.pdf o Optional:
https://www.cs.rutgers.edu/~mlittman/courses/lightai03/jain99data.pdf
o Optional:
http://web.itu.edu.tr/sgunduz/courses/verimaden/paper/validity_survey.pdf o Optional:
http://www.dbs.ifi.lmu.de/Publikationen/Papers/KDD96.final.frame.pdf

18 
T 11/5 
Clustering: EM, Kmedoids, Hierarchical Clustering,
Evaluation Metrics and Practical Issues 
o Chapter
9 of http://robotics.stanford.edu/~nilsson/MLBOOK.pdf
o Sections
7.17.3 of http://eliassi.org/mmdsbookv2L.pdf o Chapter
8 of https://www.cs.cornell.edu/jeh/book2016June9.pdf o Section
14.3 of http://statweb.stanford.edu/~tibs/ElemStatLearn o http://cs229.stanford.edu/notes/cs229notes7b.pdf o http://cs229.stanford.edu/notes/cs229notes8.pdf o Optional:
https://www.cs.rutgers.edu/~mlittman/courses/lightai03/jain99data.pdf
o Optional:
http://web.itu.edu.tr/sgunduz/courses/verimaden/paper/validity_survey.pdf o Optional:
http://www.dbs.ifi.lmu.de/Publikationen/Papers/KDD96.final.frame.pdf

Homework #3 covers clustering: kmeans, Gaussian mixture models, expectation maximization, kmediods, hierarchical clustering, and evaluation metrics. o out on Tuesday November 5 o
due
on Friday November 15 at 11:59 PM Eastern o graded
by Monday November 25 

19 
R 11/7 
Spectral Clustering 
o http://ai.stanford.edu/~ang/papers/nips01spectral.pdf o http://www.cs.columbia.edu/~jebara/4772/papers/Luxburg07_tutorial.pdf 
20 
T 11/12 
Matrix Factorization 
o Chapter
14.6 of http://statweb.stanford.edu/~tibs/ElemStatLearn/ o http://papers.nips.cc/paper/1861algorithmsfornonnegativematrixfactorization.pdf
o Optional: http://www.sandia.gov/~tgkolda/pubs/pubfiles/TensorReview.pdf 
21 
R 11/14 
Recommendation Systems (Guest lecturer: 
o Chapter
9 of http://eliassi.org/mmdsbookv2L.pdf 
22 
T 11/19 
Recommendation Systems 
o Chapter
9 of http://eliassi.org/mmdsbookv2L.pdf 
23 
R 11/21 
Link Analysis 
o Chapter
5 of http://eliassi.org/mmdsbookv2L.pdf o Optional: http://bit.ly/2iYxo82 
Homework #4 covers spectral clustering, matrix factorization, recommendation systems, and link analysis. o out on Thursday November 21 o
due
on Sunday December 1 at 11:59 PM Eastern o graded
by Tuesday December 10 

24 
T 11/26 
Link Analysis and 
o Chapter
5 of http://eliassi.org/mmdsbookv2L.pdf o
Optional:
http://bit.ly/2iYxo82
o
The
final covers all the material since the beginning of the term. 
25 
R 11/28 
Thanksgiving Break 

26 
T 12/3 
Final Exam (inclass) 
Graded by Friday December 13 
Project posters o due on Wednesday December 4 at 11:59 PM
Eastern; there are no late days for this assignment. o graded
by Friday December 7 

27 
R 12/5 
Data Science and Ethics 
o https://www.ted.com/talks/damon_horowitz?language=en
o
https://ted.com/talks/joy_buolamwini_how_i_m_fighting_bias_in_algorithms
o
https://www.propublica.org/article/machinebiasriskassessmentsincriminalsentencing o https://hbr.org/2016/12/aguidetosolvingsocialproblemswithmachinelearning 
Project reports o due on Tuesday December 10 at 11:59 PM
Eastern; there are no late days for this assignment. o graded
by Sunday December 15 

Final grades are due to the
Registrar Office on Monday December 16 at 2:00 PM Eastern. 
A 
93100 
A 
9092 
B+ 
8789 
B 
8386 
B 
8082 
C+ 
7779 
C 
7376 
C 
7072 
F 
< 70 