Spring 2022: DS 5230  Unsupervised
Machine Learning and Data Mining, CRN 32874
Lecture time: Mondays & Wednesdays

Place: West
Village G, Room 106

Instructor: Tina EliassiRad

Office hours: Tuesdays 4:30 – 6:00 PM via Zoom

TA: Priya Garg

Office hours: Available by appointment. Email garg.p [at] northeastern [dot] edu;

TA: Hani Haider

Office hours: Available by appointment. Email haider.sy [at]
northeastern [dot] edu;

TA: Oj Sindher

Office hours: Available by
appointment: Email sindher.o
[at] northeastern [dot] edu;

This 4credit graduatelevel course covers data mining and unsupervised learning. Students are expected to have taken courses on or have knowledge of the following:
There is no specific textbook for this course. Readings are assigned in the syllabus (see below). Here are some textbooks (all optional) for this course. Those that are freely available online are listed first.
o
Charu
C. Aggarwal, Data Mining, The Textbook, Springer
2015. (free online; visit this site and
log in via your institutional account)
Lec # 
Date 
Topic 
Readings &
Notes 
1 
W 1/19 
Introduction and Overview 
o http://infolab.stanford.edu/~ullman/mmds/ch1n.pdf o http://www.cs.cmu.edu/~tom/pubs/MachineLearning.pdf

2 
M 1/24 
Density Estimation 
o http://ned.ipac.caltech.edu/level5/March02/Silverman/Silver_contents.html
o http://eliassi.org/Sheather_StatSci_2004.pdf
o Optional:
Sections 6.6.1 of https://hastie.su.domains/ElemStatLearn/printings/ESLII_print12_toc.pdf o
Optional: https://dl.acm.org/doi/pdf/10.1145/3422622 
3 
W 1/26 
No Class (Professor Away) 

4 
M 1/31 
Frequent Itemsets &
Association Rules 
o http://infolab.stanford.edu/~ullman/mmds/ch6.pdf
o Optional:
Sections 6.16.6 of http://wwwusers.cs.umn.edu/~kumar/dmbook/ch6.pdf 
5 
W 2/2 
Frequent Itemsets & Association Rules 
o http://infolab.stanford.edu/~ullman/mmds/ch6.pdf
o Optional: Sections 6.16.6 of http://wwwusers.cs.umn.edu/~kumar/dmbook/ch6.pdf

Homework #1 covers density estimation and frequent itemsets & association rules. o out on Wednesday February 2 o
due
on Sunday February 13 at 11:59 PM Eastern o graded
by Wednesday February 23 

6 
M 2/7 
Finding Similar Items 

7 
W 2/9 
Finding Similar Items 

8 
M 2/14 
Mining Data Streams 

9 
W 2/16 
Mining Data Streams 

10 
M 2/21 
No Class (Presidents Day) 

11 
W 2/23 
Mining Data Streams 

Homework
#2 covers finding similar items and mining data streams. o out on Wednesday February 23 o due on Sunday March 6
at 11:59 PM Eastern o graded
by Wednesday March 16 

12 
M 2/28 
Dimensionality Reduction (SVD, CUR) 
o http://infolab.stanford.edu/~ullman/mmds/ch11.pdf o Chapter
3 of http://www.eliassi.org/FODSbook2019.pdf 
13 
W 3/2 
Dimensionality Reduction (PCA, Kernel PCA, MDS, ISOMAP) 
o http://www.eliassi.org/ang/cs229notes10pca.pdf o
Section 14.5 of https://hastie.su.domains/ElemStatLearn/printings/ESLII_print12_toc.pdf
o
https://alex.smola.org/papers/1999/MikSchSmoMuletal99.pdf
o
https://en.wikipedia.org/wiki/Multidimensional_scaling
o http://www.eliassi.org/tenenbaumisomapScience2000.pdf
(supplementary
material) 
14 
M 3/7 
Dimensionality Reduction 
o tSNE paper: http://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf o
tSNE website: https://lvdmaaten.github.io/tsne/
o
UMAP
paper: https://arxiv.org/abs/1802.03426
o
UMAP
website: https://umaplearn.readthedocs.io/en/latest/
o A
nice presentation on UMAP: https://www.youtube.com/watch?v=nq6iPZVUxZU
o Optional: https://www.jmlr.org/papers/volume22/201061/201061.pdf

Homework #3 covers dimensionality reduction. o out on Monday March 7 o
due
on Thursday March 17 at 11:59 PM Eastern o graded
by Sunday March 27 

15 
W 3/9 
Dimensionality Reduction (autoencoders) 
o https://web.stanford.edu/class/cs294a/sparseAutoencoder_2011new.pdf
o https://www.deeplearningbook.org/contents/autoencoders.html
o A
nice presentation on autoencoders: https://www.youtube.com/watch?v=R3DNKE3zKFk
o Optional:
https://www.jeremyjordan.me/autoencoders/

Project proposals o
due on Thursday March 10 at 11:59 PM
Eastern; there are no late days for this assignment. o graded
by Sunday March 20 

16 
M 3/14 
No Class (Spring Break) 

17 
W 3/16 
No Class (Spring Break) 

18 
M 3/21 
Nonnegative Matrix Factorization 
o http://papers.nips.cc/paper/1861algorithmsfornonnegativematrixfactorization.pdf
o Chapter 14.6 of https://hastie.su.domains/ElemStatLearn/printings/ESLII_print12_toc.pdf o Optional: http://eliassi.org/papers/hendersonkdd2012.pdf 
19 
W 3/23 
Midterm Exam (inclass) 
Graded by Monday April 4 
20 
M 3/28 
Clustering: Kmeans, Gaussian Mixture Models, Expectation Maximization (EM) 
o http://www.eliassi.org/ang/cs229notes7akmeans.pdf o http://www.eliassi.org/ang/cs229notes7bmixtureofguassians.pdf o http://www.eliassi.org/ang/cs229notes8em.pdf
o Sections
7.17.3 of http://infolab.stanford.edu/~ullman/mmds/ch7.pdf

21 
W 3/30 
Clustering: EM, Kmedoids, Hierarchical Clustering, Evaluation Metrics and Practical Issues 
o Chapter
9 of http://robotics.stanford.edu/~nilsson/MLBOOK.pdf
o Sections
7.17.4 of http://www.eliassi.org/FODSbook2019.pdf
o Section
14.3 of https://hastie.su.domains/ElemStatLearn/printings/ESLII_print12_toc.pdf
o Optional:
http://www.eliassi.org/jain99dataclusteringreview.pdf
o Optional:
http://www.eliassi.org/validity_survey.pdf o Optional: http://www.eliassi.org/dbscan.pdf
o Optional: http://www.eliassi.org/ang/cs229notes9factoranalysis.pdf

22 
M 4/4 
Spectral Clustering 
o http://ai.stanford.edu/~ang/papers/nips01spectral.pdf o http://www.cs.columbia.edu/~jebara/4772/papers/Luxburg07_tutorial.pdf o Optional:
Section 7.5 of http://www.eliassi.org/FODSbook2019.pdf

23 
W 4/6 
Recommendation Systems 

24 
M 4/11 
Recommendation Systems 

Homework #4 covers clustering, matrix factorization, and recommendation systems. o out on Monday April 11 o
due
on Thursday April 21 at 11:59 PM Eastern o graded
by Sunday May 1 

25 
W 4/13 
Link Analysis 
o http://infolab.stanford.edu/~ullman/mmds/ch5.pdf o Optional: http://bit.ly/2iYxo82 
26 
M 4/18 
No Class (Patriots Day) 

27 
W 4/20 
Link Analysis 
o http://infolab.stanford.edu/~ullman/mmds/ch5.pdf o Optional: http://bit.ly/2iYxo82

28 
M 4/25 
Review for the Final Exam 
The final covers all the material since the beginning of the term. 
29 
W 4/27 
Final Exam (inclass) 
Graded by Sunday May 8th 
Project posters o due on Sunday May 1 at 11:59 PM
Eastern; there are no late days for this assignment. o graded
by Wednesday May 4 

30 
M 5/2 
Poster
Pitches (inclass) 

31 
W 5/4 
Poster
Pitches (inclass) 

Project reports o due on Thursday May 5th at 11:59 PM
Eastern; there are no late days for this assignment. o graded
by Sunday May 8th 

Final grades are due to the Registrar
Office on Tuesday May 10th. 
A 
93100 
A 
9092 
B+ 
8789 
B 
8386 
B 
8082 
C+ 
7779 
C 
7376 
C 
7072 
F 
< 70 