Spring 2022: DS 5230 -- Unsupervised
Machine Learning and Data Mining, CRN 32874
Lecture time: Mondays & Wednesdays
|
Place: West
Village G, Room 106
|
Instructor: Tina Eliassi-Rad
|
Office hours: Tuesdays 4:30 – 6:00 PM via Zoom
|
TA: Priya Garg
|
Office hours: Available by appointment. Email garg.p [at] northeastern
[dot] edu;
|
TA: Hani Haider
|
Office hours: Available by appointment. Email haider.sy [at]
northeastern [dot] edu;
|
TA: Oj Sindher
|
Office hours: Available by
appointment: Email sindher.o [at] northeastern [dot] edu;
|
This 4-credit graduate-level course covers data mining and unsupervised learning. Students are expected to have taken courses on or have knowledge of the following:
There is no specific textbook for this course. Readings are assigned in the syllabus (see below). Here are some textbooks (all optional) for this course. Those that are freely available online are listed first.
o
Charu
C. Aggarwal, Data Mining, The Textbook, Springer
2015. (free online; visit this site and
log in via your institutional account)
Lec # |
Date |
Topic |
Readings &
Notes |
1 |
W 1/19 |
Introduction and Overview |
o http://infolab.stanford.edu/~ullman/mmds/ch1n.pdf o http://www.cs.cmu.edu/~tom/pubs/MachineLearning.pdf
|
2 |
M 1/24 |
Density Estimation |
o http://ned.ipac.caltech.edu/level5/March02/Silverman/Silver_contents.html
o http://eliassi.org/Sheather_StatSci_2004.pdf
o Optional:
Sections 6.6.1 of https://hastie.su.domains/ElemStatLearn/printings/ESLII_print12_toc.pdf o
Optional: https://dl.acm.org/doi/pdf/10.1145/3422622 |
3 |
W 1/26 |
No Class (Professor Away) |
|
4 |
M 1/31 |
Frequent Itemsets & Association Rules |
o http://infolab.stanford.edu/~ullman/mmds/ch6.pdf
o Optional:
Sections 6.1-6.6 of http://www-users.cs.umn.edu/~kumar/dmbook/ch6.pdf |
5 |
W 2/2 |
Frequent Itemsets & Association Rules |
o http://infolab.stanford.edu/~ullman/mmds/ch6.pdf
o Optional: Sections 6.1-6.6 of http://www-users.cs.umn.edu/~kumar/dmbook/ch6.pdf
|
Homework #1 covers density estimation and frequent itemsets & association rules. o out on Wednesday February 2 o
due
on Sunday February 13 at 11:59 PM Eastern o graded
by Wednesday February 23 |
|||
6 |
M 2/7 |
Finding Similar Items |
|
7 |
W 2/9 |
Finding Similar Items |
|
8 |
M 2/14 |
Mining Data Streams |
|
9 |
W 2/16 |
Mining Data Streams |
|
10 |
M 2/21 |
No Class (Presidents Day) |
|
11 |
W 2/23 |
Mining Data Streams |
|
Homework
#2 covers finding similar items and mining data streams. o out on Wednesday February 23 o due on Sunday March 6
at 11:59 PM Eastern o graded
by Wednesday March 16 |
|||
12 |
M 2/28 |
Dimensionality Reduction (SVD, CUR) |
o http://infolab.stanford.edu/~ullman/mmds/ch11.pdf o Chapter
3 of http://www.eliassi.org/FODS-book-2019.pdf |
13 |
W 3/2 |
Dimensionality Reduction (PCA, Kernel PCA, MDS, ISOMAP) |
o http://www.eliassi.org/ang/cs229-notes10-pca.pdf o
Section 14.5 of https://hastie.su.domains/ElemStatLearn/printings/ESLII_print12_toc.pdf
o
https://alex.smola.org/papers/1999/MikSchSmoMuletal99.pdf
o
https://en.wikipedia.org/wiki/Multidimensional_scaling
o http://www.eliassi.org/tenenbaum-isomap-Science2000.pdf
(supplementary
material) |
14 |
M 3/7 |
Dimensionality Reduction |
o t-SNE paper: http://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf o
t-SNE website: https://lvdmaaten.github.io/tsne/
o
UMAP
paper: https://arxiv.org/abs/1802.03426
o
UMAP
website: https://umap-learn.readthedocs.io/en/latest/
o A
nice presentation on UMAP: https://www.youtube.com/watch?v=nq6iPZVUxZU
o Optional: https://www.jmlr.org/papers/volume22/20-1061/20-1061.pdf
|
Homework #3 covers dimensionality reduction. o out on Monday March 7 o
due
on Thursday March 17 at 11:59 PM Eastern o graded
by Sunday March 27 |
|||
15 |
W 3/9 |
Dimensionality Reduction (autoencoders) |
o https://web.stanford.edu/class/cs294a/sparseAutoencoder_2011new.pdf
o https://www.deeplearningbook.org/contents/autoencoders.html
o A
nice presentation on autoencoders: https://www.youtube.com/watch?v=R3DNKE3zKFk
o Optional:
https://www.jeremyjordan.me/autoencoders/
|
Project proposals o
due on Thursday March 10 at 11:59 PM
Eastern; there are no late days for this assignment. o graded
by Sunday March 20 |
|||
16 |
M 3/14 |
No Class (Spring Break) |
|
17 |
W 3/16 |
No Class (Spring Break) |
|
18 |
M 3/21 |
Non-negative Matrix Factorization |
o http://papers.nips.cc/paper/1861-algorithms-for-non-negative-matrix-factorization.pdf
o Chapter 14.6 of https://hastie.su.domains/ElemStatLearn/printings/ESLII_print12_toc.pdf o Optional: http://eliassi.org/papers/henderson-kdd2012.pdf |
19 |
W 3/23 |
Midterm Exam (in-class) |
Graded by Monday April 11 |
20 |
M 3/28 |
Clustering: K-means, Gaussian Mixture Models, Expectation Maximization (EM) |
o http://www.eliassi.org/ang/cs229-notes7a-kmeans.pdf o http://www.eliassi.org/ang/cs229-notes7b-mixture-of-guassians.pdf o http://www.eliassi.org/ang/cs229-notes8-em.pdf
o Sections
7.1-7.3 of http://infolab.stanford.edu/~ullman/mmds/ch7.pdf
|
21 |
W 3/30 |
Clustering: EM, K-medoids, Hierarchical Clustering, Evaluation Metrics and Practical Issues |
o Chapter
9 of http://robotics.stanford.edu/~nilsson/MLBOOK.pdf
o Sections
7.1-7.4 of http://www.eliassi.org/FODS-book-2019.pdf
o Section
14.3 of https://hastie.su.domains/ElemStatLearn/printings/ESLII_print12_toc.pdf
o Optional:
http://www.eliassi.org/jain99data-clustering-review.pdf
o Optional:
http://www.eliassi.org/validity_survey.pdf o Optional: http://www.eliassi.org/dbscan.pdf
o Optional: http://www.eliassi.org/ang/cs229-notes9-factor-analysis.pdf
|
22 |
M 4/4 |
Spectral Clustering |
o http://ai.stanford.edu/~ang/papers/nips01-spectral.pdf o http://www.cs.columbia.edu/~jebara/4772/papers/Luxburg07_tutorial.pdf o Optional:
Section 7.5 of http://www.eliassi.org/FODS-book-2019.pdf
|
23 |
W 4/6 |
Recommendation Systems |
|
24 |
M 4/11 |
Midterms returned and |
o Any regrading request for the
midterm must be made at the end of the lecture. o Midterms
will be collected at the end of the lecture. |
25 |
W 4/13 |
Recommendation Systems |
|
Homework #4 covers clustering, matrix factorization, and recommendation systems. o out on Wednesday April 13 o
due
on Friday April 22 at 11:59 PM Eastern o graded
by Sunday May 1 |
|||
26 |
M 4/18 |
No Class (Patriots Day) |
|
27 |
W 4/20 |
Recommendation Systems |
|
28 |
M 4/25 |
Link Analysis |
o http://infolab.stanford.edu/~ullman/mmds/ch5.pdf o Optional: http://bit.ly/2iYxo82
|
29 |
W 4/27 |
Link Analysis |
o http://infolab.stanford.edu/~ullman/mmds/ch5.pdf o Optional: http://bit.ly/2iYxo82
|
30 |
M 5/2 |
Final
Exam (in-class) |
Graded by Friday May 6th |
31 |
W 5/4 |
No
Class |
|
Project posters and reports o due on Wednesday May 4th at 11:59 PM
Eastern; there are no late days for this assignment. o graded
by Sunday May 8th |
|||
Final grades are due to the
Registrar Office on Monday May 9th at 9:00 AM Eastern. |
A |
93-100 |
A- |
90-92 |
B+ |
87-89 |
B |
83-86 |
B- |
80-82 |
C+ |
77-79 |
C |
73-76 |
C- |
70-72 |
F |
< 70 |