[Fall 2014] Topics in Artificial Intelligence: Machine Learning with Large-scale Data
General Information
Overview
This graduate-level course covers machine-learning algorithms, programming environments, and software frameworks that are designed to effectively deal with large-scale (i.e., big) data.
Prerequisites: A previous course on machine learning or data mining. A strong knowledge of algorithms and programming (Java, C, and scripting/dynamic languages).
Textbook
Resources
- (textbook) Kevin Murphy,
Machine Learning: A Probabilistic Perspective. ISBN 0262018020, MIT Press, 2012.
- (textbook) Christopher Bishop,
Pattern Recognition and Machine Learning. ISBN 0387310738, Springer 2006.
- (textbook) Tom Mitchell, Machine Learning. ISBN 0070428077, McGraw-Hill, 1997.
- (textbook, free on-line) Trevor Hastie, Robert Tibshirani and Jerome Friedman, Elements of Statistical Learning. ISBN 0387952845, Springer, 2009 (2nd edition).
- (textbook, free on-line) David MacKay, Information Theory, Inference, and Learning Algorithms. ISBN 0521642981, Cambridge University Press, 2003.
- (textbook, free on-line) Roberto Battiti and Mauro Brunato.
The LION Way: Machine Learning plus Intelligent Optimization. Lionsolver, Inc. 2013.
- Probability Review (David Blei, Princeton)
- Probability Theory Review (Arian Maleki and Tom Do, Stanford)
- Linear Algebra Tutorial (C.T. Abdallah, Penn)
- Linear Algebra Review and Reference (Zico Kolter and Chuong Do, Stanford)
- Statistical Data Mining Tutorials (Andrew Moore, Google/CMU)
- Theoretical CS Cheat Sheet (Princeton)
Grading
You will be evaluated based on student presentations (40%) and a substantial semester-long project (60%). The project must include at least one big data set, at least one learning/mining algorithm, and a real-world application. For the project, you will need to prepare a proposal, give a presentation at the end of the semester, and write a final report. More details will be provided in class.
Notes, Policies, and Guidelines
- We will use the class sakai site for announcements, assignments, and your contributions.
- When emailing me about the course, begin the subject line with [f14 cs598].
- For your hadoop-based jobs, you can use the DCS hadoop cluster. For big non-hadoop jobs, you can use aurora.cs. If you don't have accounts on these machine, let me know.
- Course projects must be done individually.
- Any regrading request must be submitted in writing and within one week of the returned material. The request must detail precisely and concisely the grading error.
- Refresh your knowledge of the university's academic integrity policy and plagiarism. There is zero-tolerance for cheating!
Some Similar Courses in Other Universities