Fall 2019: CS 6220 -- Data Mining
Techniques, CRN 12564 cross-listed with
Fall 2019: DS 5230 -- Unsupervised Machine
Learning and Data Mining, CRN 15043
Lecture time: Tuesdays 11:45 – 1:25 pm & Thursdays 2:50 – 4:30 pm
|
Place: West
Village G, Room 102
|
Instructor: Tina Eliassi-Rad
|
Office hours: Tuesdays 1:30 – 3:00 PM in Kariotis
Hall, Room 304
|
TA: Govind Bhala
|
Office hours:
Wednesdays 4:50 PM – 6:20 PM in West
Village F, Room 116
Also, available by appointment. Email bhala.g [at] husky [dot] neu
[dot] edu;
|
TA: Hui “Sophie” Wang
|
Office hours:
Thursdays 10:00 AM – 11:30 AM in Behrakis
Health Sciences Center, Room 210
Also, available by
appointment. Email wang.hui1 [at] husky [dot] neu [dot] edu;
|
This 4-credit graduate-level course covers data mining and unsupervised learning. Its prerequisites are:
This course does not have a designated textbook. The readings are assigned in the syllabus (see below). Here are some textbooks (all optional) related to the course.
Lec # |
Date |
Topic |
Readings &
Notes |
1 |
R 9/5 |
Introduction and Overview |
o Chapter
1 of http://eliassi.org/mmds-book-v2L.pdf
o http://www.cs.cmu.edu/~tom/pubs/MachineLearning.pdf
|
2 |
T 9/10 |
Density Estimation |
o http://ned.ipac.caltech.edu/level5/March02/Silverman/Silver_contents.html
o http://eliassi.org/Sheather_StatSci_2004.pdf
o Optional:
Sections 6.6-6.9 of http://statweb.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf
|
3 |
R 9/12 |
Frequent Itemsets & Association Rules |
o Chapter
6 of http://eliassi.org/mmds-book-v2L.pdf o Optional:
Sections 6.1-6.6 of http://www-users.cs.umn.edu/~kumar/dmbook/ch6.pdf
|
4 |
T 9/17 |
Frequent Itemsets & Association Rules |
o Chapter
6 of http://eliassi.org/mmds-book-v2L.pdf o Optional:
Sections 6.1-6.6 of http://www-users.cs.umn.edu/~kumar/dmbook/ch6.pdf
|
5 |
R 9/19 |
Social Bots (Guest lecturer: |
o The
Rise of Social Bots: https://arxiv.org/abs/1407.5225 o Online
Human-Bot Interactions: Detection, Estimation, and Characterization: https://arxiv.org/abs/1703.03107 o Arming
The Public with Artificial Intelligence to Counter Social Bots: https://arxiv.org/abs/1901.00912 o Deception
Strategies and Threats for Online Discussions: https://arxiv.org/abs/1906.11371 |
6 |
T 9/24 |
Finding Similar Items |
o Chapter
3 of http://eliassi.org/mmds-book-v2L.pdf |
7 |
R 9/26 |
Finding Similar Items |
o Chapter
3 of http://eliassi.org/mmds-book-v2L.pdf |
Homework #1 covers density estimation, frequent itemsets & association rules, plus finding similar items. o out on Thursday September 26 o
due
on Sunday October 6 at 11:59 PM Eastern o graded
by Wednesday October 16 |
|||
8 |
T 10/1 |
Mining Data Streams |
o Chapter
4 of http://eliassi.org/mmds-book-v2L.pdf |
9 |
R 10/3 |
Mining Data Streams |
o Chapter
4 of http://eliassi.org/mmds-book-v2L.pdf |
10 |
T 10/8 |
Mining Data Streams |
o Chapter
4 of http://eliassi.org/mmds-book-v2L.pdf |
11 |
R 10/10 |
Dimensionality Reduction (PCA, SVD, CUR, |
o Chapter
11 of http://eliassi.org/mmds-book-v2L.pdf o
Section 14.5 of http://statweb.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf
|
12 |
T 10/15 |
Dimensionality Reduction (PCA, SVD, CUR, |
o Chapter
11 of http://eliassi.org/mmds-book-v2L.pdf o
Section 14.5 of http://statweb.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf
o
https://alex.smola.org/papers/1999/MikSchSmoMuletal99.pdf
|
Homework
#2 covers mining data streams and dimensionality reduction. o out on Tuesday October 15 o due on Friday October 25
at 11:59 PM Eastern o graded by Monday November 4 |
|||
13 |
R 10/17 |
Dimensionality Reduction |
o t-SNE paper: http://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf o
t-SNE website: https://lvdmaaten.github.io/tsne/
o
UMAP
paper: https://arxiv.org/abs/1802.03426
o
UMAP
website: https://umap-learn.readthedocs.io/en/latest/
o A
nice presentation on UMAP: https://www.youtube.com/watch?v=nq6iPZVUxZU
|
14 |
T 10/22 |
Project
Proposal Pitches (in-class) |
o
Proposals are due at 9:00 AM Eastern on
Tuesday October 22; there are no late days for this assignment. o Graded
by Tuesday October 29 |
15 |
R 10/24 |
Clustering: K-means, Gaussian Mixture Models, Expectation Maximization (EM) |
o Chapter
9 of http://robotics.stanford.edu/~nilsson/MLBOOK.pdf
o Sections
7.1-7.3 of http://eliassi.org/mmds-book-v2L.pdf o Chapter
8 of https://www.cs.cornell.edu/jeh/book2016June9.pdf o Section
14.3 of http://statweb.stanford.edu/~tibs/ElemStatLearn o http://cs229.stanford.edu/notes/cs229-notes7b.pdf o http://cs229.stanford.edu/notes/cs229-notes8.pdf o Optional:
https://www.cs.rutgers.edu/~mlittman/courses/lightai03/jain99data.pdf
o Optional:
http://web.itu.edu.tr/sgunduz/courses/verimaden/paper/validity_survey.pdf o Optional:
http://www.dbs.ifi.lmu.de/Publikationen/Papers/KDD-96.final.frame.pdf
|
16 |
T 10/29 |
Midterm Exam (in-class) |
Graded by Tuesday November 12 |
17 |
R 10/31 |
Clustering: K-means, Gaussian Mixture Models,
Expectation Maximization (EM) |
o Chapter
9 of http://robotics.stanford.edu/~nilsson/MLBOOK.pdf
o Sections
7.1-7.3 of http://eliassi.org/mmds-book-v2L.pdf o Chapter
8 of https://www.cs.cornell.edu/jeh/book2016June9.pdf o Section
14.3 of http://statweb.stanford.edu/~tibs/ElemStatLearn o http://cs229.stanford.edu/notes/cs229-notes7b.pdf o http://cs229.stanford.edu/notes/cs229-notes8.pdf o Optional:
https://www.cs.rutgers.edu/~mlittman/courses/lightai03/jain99data.pdf
o Optional:
http://web.itu.edu.tr/sgunduz/courses/verimaden/paper/validity_survey.pdf o Optional:
http://www.dbs.ifi.lmu.de/Publikationen/Papers/KDD-96.final.frame.pdf
|
18 |
T 11/5 |
Clustering: EM, K-medoids, Hierarchical Clustering,
Evaluation Metrics and Practical Issues |
o Chapter
9 of http://robotics.stanford.edu/~nilsson/MLBOOK.pdf
o Sections
7.1-7.3 of http://eliassi.org/mmds-book-v2L.pdf o Chapter
8 of https://www.cs.cornell.edu/jeh/book2016June9.pdf o Section
14.3 of http://statweb.stanford.edu/~tibs/ElemStatLearn o http://cs229.stanford.edu/notes/cs229-notes7b.pdf o http://cs229.stanford.edu/notes/cs229-notes8.pdf o Optional:
https://www.cs.rutgers.edu/~mlittman/courses/lightai03/jain99data.pdf
o Optional:
http://web.itu.edu.tr/sgunduz/courses/verimaden/paper/validity_survey.pdf o Optional:
http://www.dbs.ifi.lmu.de/Publikationen/Papers/KDD-96.final.frame.pdf
|
Homework #3 covers clustering: k-means, Gaussian mixture models, expectation maximization, k-mediods, hierarchical clustering, and evaluation metrics. o out on Tuesday November 5 o
due
on Friday November 15 at 11:59 PM Eastern o graded
by Monday November 25 |
|||
19 |
R 11/7 |
Spectral Clustering |
o http://ai.stanford.edu/~ang/papers/nips01-spectral.pdf o http://www.cs.columbia.edu/~jebara/4772/papers/Luxburg07_tutorial.pdf |
20 |
T 11/12 |
Matrix Factorization |
o Chapter
14.6 of http://statweb.stanford.edu/~tibs/ElemStatLearn/ o http://papers.nips.cc/paper/1861-algorithms-for-non-negative-matrix-factorization.pdf
o Optional: http://www.sandia.gov/~tgkolda/pubs/pubfiles/TensorReview.pdf |
21 |
R 11/14 |
Recommendation Systems (Guest lecturer: |
o Chapter
9 of http://eliassi.org/mmds-book-v2L.pdf |
22 |
T 11/19 |
Recommendation Systems |
o Chapter
9 of http://eliassi.org/mmds-book-v2L.pdf |
23 |
R 11/21 |
Link Analysis |
o Chapter
5 of http://eliassi.org/mmds-book-v2L.pdf o Optional: http://bit.ly/2iYxo82 |
Homework #4 covers spectral clustering, matrix factorization, recommendation systems, and link analysis. o out on Thursday November 21 o
due
on Sunday December 1 at 11:59 PM Eastern o graded
by Tuesday December 10 |
|||
24 |
T 11/26 |
Link Analysis and |
o Chapter
5 of http://eliassi.org/mmds-book-v2L.pdf o
Optional:
http://bit.ly/2iYxo82
o
The
final covers all the material since the beginning of the term. |
25 |
R 11/28 |
Thanksgiving Break |
|
26 |
T 12/3 |
Final Exam (in-class) |
Graded by Friday December 13 |
Project posters o due on Wednesday December 4 at 11:59 PM
Eastern; there are no late days for this assignment. o graded
by Friday December 7 |
|||
27 |
R 12/5 |
Data Science and Ethics |
o https://www.ted.com/talks/damon_horowitz?language=en
o
https://ted.com/talks/joy_buolamwini_how_i_m_fighting_bias_in_algorithms
o
https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing o https://hbr.org/2016/12/a-guide-to-solving-social-problems-with-machine-learning |
Project reports o due on Tuesday December 10 at 11:59 PM
Eastern; there are no late days for this assignment. o graded
by Sunday December 15 |
|||
Final grades are due to the
Registrar Office on Monday December 16 at 2:00 PM Eastern. |
A |
93-100 |
A- |
90-92 |
B+ |
87-89 |
B |
83-86 |
B- |
80-82 |
C+ |
77-79 |
C |
73-76 |
C- |
70-72 |
F |
< 70 |