Spring 2022: DS 5230  Unsupervised
Machine Learning and Data Mining, CRN 32874
Lecture time: Mondays & Wednesdays

Place: West
Village G, Room 106

Instructor: Tina EliassiRad

Office hours: Tuesdays 4:30 – 6:00 PM via Zoom

TA: Priya Garg

Office hours: Available by appointment. Email garg.p [at] northeastern
[dot] edu;

TA: Hani Haider

Office hours: Available by appointment. Email haider.sy [at]
northeastern [dot] edu;

TA: Oj Sindher

Office hours: Available by
appointment: Email sindher.o [at] northeastern [dot] edu;

This 4credit graduatelevel course covers data mining and unsupervised learning. Students are expected to have taken courses on or have knowledge of the following:
There is no specific textbook for this course. Readings are assigned in the syllabus (see below). Here are some textbooks (all optional) for this course. Those that are freely available online are listed first.
o
Charu
C. Aggarwal, Data Mining, The Textbook, Springer
2015. (free online; visit this site and
log in via your institutional account)
Lec # 
Date 
Topic 
Readings &
Notes 
1 
W 1/19 
Introduction and Overview 
o http://infolab.stanford.edu/~ullman/mmds/ch1n.pdf o http://www.cs.cmu.edu/~tom/pubs/MachineLearning.pdf

2 
M 1/24 
Density Estimation 
o http://ned.ipac.caltech.edu/level5/March02/Silverman/Silver_contents.html
o http://eliassi.org/Sheather_StatSci_2004.pdf
o Optional:
Sections 6.6.1 of https://hastie.su.domains/ElemStatLearn/printings/ESLII_print12_toc.pdf o
Optional: https://dl.acm.org/doi/pdf/10.1145/3422622 
3 
W 1/26 
No Class (Professor Away) 

4 
M 1/31 
Frequent Itemsets & Association Rules 
o http://infolab.stanford.edu/~ullman/mmds/ch6.pdf
o Optional:
Sections 6.16.6 of http://wwwusers.cs.umn.edu/~kumar/dmbook/ch6.pdf 
5 
W 2/2 
Frequent Itemsets & Association Rules 
o http://infolab.stanford.edu/~ullman/mmds/ch6.pdf
o Optional: Sections 6.16.6 of http://wwwusers.cs.umn.edu/~kumar/dmbook/ch6.pdf

Homework #1 covers density estimation and frequent itemsets & association rules. o out on Wednesday February 2 o
due
on Sunday February 13 at 11:59 PM Eastern o graded
by Wednesday February 23 

6 
M 2/7 
Finding Similar Items 

7 
W 2/9 
Finding Similar Items 

8 
M 2/14 
Mining Data Streams 

9 
W 2/16 
Mining Data Streams 

10 
M 2/21 
No Class (Presidents Day) 

11 
W 2/23 
Mining Data Streams 

Homework
#2 covers finding similar items and mining data streams. o out on Wednesday February 23 o due on Sunday March 6
at 11:59 PM Eastern o graded
by Wednesday March 16 

12 
M 2/28 
Dimensionality Reduction (SVD, CUR) 
o http://infolab.stanford.edu/~ullman/mmds/ch11.pdf o Chapter
3 of http://www.eliassi.org/FODSbook2019.pdf 
13 
W 3/2 
Dimensionality Reduction (PCA, Kernel PCA, MDS, ISOMAP) 
o http://www.eliassi.org/ang/cs229notes10pca.pdf o
Section 14.5 of https://hastie.su.domains/ElemStatLearn/printings/ESLII_print12_toc.pdf
o
https://alex.smola.org/papers/1999/MikSchSmoMuletal99.pdf
o
https://en.wikipedia.org/wiki/Multidimensional_scaling
o http://www.eliassi.org/tenenbaumisomapScience2000.pdf
(supplementary
material) 
14 
M 3/7 
Dimensionality Reduction 
o tSNE paper: http://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf o
tSNE website: https://lvdmaaten.github.io/tsne/
o
UMAP
paper: https://arxiv.org/abs/1802.03426
o
UMAP
website: https://umaplearn.readthedocs.io/en/latest/
o A
nice presentation on UMAP: https://www.youtube.com/watch?v=nq6iPZVUxZU
o Optional: https://www.jmlr.org/papers/volume22/201061/201061.pdf

Homework #3 covers dimensionality reduction. o out on Monday March 7 o
due
on Thursday March 17 at 11:59 PM Eastern o graded
by Sunday March 27 

15 
W 3/9 
Dimensionality Reduction (autoencoders) 
o https://web.stanford.edu/class/cs294a/sparseAutoencoder_2011new.pdf
o https://www.deeplearningbook.org/contents/autoencoders.html
o A
nice presentation on autoencoders: https://www.youtube.com/watch?v=R3DNKE3zKFk
o Optional:
https://www.jeremyjordan.me/autoencoders/

Project proposals o
due on Thursday March 10 at 11:59 PM
Eastern; there are no late days for this assignment. o graded
by Sunday March 20 

16 
M 3/14 
No Class (Spring Break) 

17 
W 3/16 
No Class (Spring Break) 

18 
M 3/21 
Nonnegative Matrix Factorization 
o http://papers.nips.cc/paper/1861algorithmsfornonnegativematrixfactorization.pdf
o Chapter 14.6 of https://hastie.su.domains/ElemStatLearn/printings/ESLII_print12_toc.pdf o Optional: http://eliassi.org/papers/hendersonkdd2012.pdf 
19 
W 3/23 
Midterm Exam (inclass) 
Graded by Monday April 11 
20 
M 3/28 
Clustering: Kmeans, Gaussian Mixture Models, Expectation Maximization (EM) 
o http://www.eliassi.org/ang/cs229notes7akmeans.pdf o http://www.eliassi.org/ang/cs229notes7bmixtureofguassians.pdf o http://www.eliassi.org/ang/cs229notes8em.pdf
o Sections
7.17.3 of http://infolab.stanford.edu/~ullman/mmds/ch7.pdf

21 
W 3/30 
Clustering: EM, Kmedoids, Hierarchical Clustering, Evaluation Metrics and Practical Issues 
o Chapter
9 of http://robotics.stanford.edu/~nilsson/MLBOOK.pdf
o Sections
7.17.4 of http://www.eliassi.org/FODSbook2019.pdf
o Section
14.3 of https://hastie.su.domains/ElemStatLearn/printings/ESLII_print12_toc.pdf
o Optional:
http://www.eliassi.org/jain99dataclusteringreview.pdf
o Optional:
http://www.eliassi.org/validity_survey.pdf o Optional: http://www.eliassi.org/dbscan.pdf
o Optional: http://www.eliassi.org/ang/cs229notes9factoranalysis.pdf

22 
M 4/4 
Spectral Clustering 
o http://ai.stanford.edu/~ang/papers/nips01spectral.pdf o http://www.cs.columbia.edu/~jebara/4772/papers/Luxburg07_tutorial.pdf o Optional:
Section 7.5 of http://www.eliassi.org/FODSbook2019.pdf

23 
W 4/6 
Recommendation Systems 

24 
M 4/11 
Midterms returned and 
o Any regrading request for the
midterm must be made at the end of the lecture. o Midterms
will be collected at the end of the lecture. 
25 
W 4/13 
Recommendation Systems 

Homework #4 covers clustering, matrix factorization, and recommendation systems. o out on Wednesday April 13 o
due
on Friday April 22 at 11:59 PM Eastern o graded
by Sunday May 1 

26 
M 4/18 
No Class (Patriots Day) 

27 
W 4/20 
Recommendation Systems 

28 
M 4/25 
Link Analysis 
o http://infolab.stanford.edu/~ullman/mmds/ch5.pdf o Optional: http://bit.ly/2iYxo82

29 
W 4/27 
Link Analysis 
o http://infolab.stanford.edu/~ullman/mmds/ch5.pdf o Optional: http://bit.ly/2iYxo82

30 
M 5/2 
Final
Exam (inclass) 
Graded by Friday May 6th 
31 
W 5/4 
No
Class 

Project posters and reports o due on Wednesday May 4th at 11:59 PM
Eastern; there are no late days for this assignment. o graded
by Sunday May 8th 

Final grades are due to the
Registrar Office on Monday May 9th at 9:00 AM Eastern. 
A 
93100 
A 
9092 
B+ 
8789 
B 
8386 
B 
8082 
C+ 
7779 
C 
7376 
C 
7072 
F 
< 70 