- Semester: Spring 2015
- Course number: 01:198:443
- Course title: Introduction to Data Science
- Credits: 3
- Lecture: Mondays & Thursdays 12:00pm-1:20pm
- Location: Busch Campus, SEC-205
- Course website: here and in Sakai
- Instructor: Tina Eliassi-Rad
- Office: CBIM 8
- Office hours: Mondays 1:30pm-2:30pm
- Teaching assistant: Chetan Tonde
- Office: CBIM (cubicle near printer room)
- Office hours: Thursdays 3:00pm-5:00pm
Advances in technology have allowed us to collect massive amounts of data. A data scientist is a person who has the skills, knowledge, and ability to extract actionable knowledge from the data -- either for the good of society, advancement of science, profits in business, etc. This course will cover the topics needed to solve data-science problems, which include data preparation (collection & integration), data characterization & presentation, data analysis (experimentation & observational studies), and data products.
Syllabus / Schedule
This course does not have a designated textbook. The readings are assigned in the syllabus.
Here are some textbooks (all optional) related to the course.
- Anand Rajaraman, Jurij Leskovec, and Jeffrey Ullman. Mining of Massive Datasets. v2.1, Cambridge University Press. 2014. (free online)
- Foster Provost, Tom Fawcett. Data Science for Business: What You Need to Know about Data Mining and Data-analytic Thinking. ISBN 1449361323.
- Tom Mitchell. Machine Learning. ISBN 0070428077.
- Christopher Bishop. Pattern Recognition and Machine Learning. ISBN 0387310738.
- Kevin P. Murphy. Machine Learning: A Probabilistic Perspective. ISBN 0262018020.
- Peter Flach. Machine Learning: The Art and Science of Algorithms that Make Sense of Data. ISBN 1107422221.
- Trevor Hastie, Robert Tibshirani, Jerome Friedman. Elements of Statistical Learning. ISBN 0387952845. (free online)
- David J. Hand, Heikki Mannila, Padhraic Smyth. Principles of Data Mining. ISBN 026208290X.
- Jiawei Han, Micheline Kamber, Jian Pei. Data Mining: Concepts and Techniques, Third Edition. ISBN 0123814790.
- Pang-Ning Tan, Michael Steinbach, Vipin Kumar. Introduction to Data Mining. ISBN 0321321367.
- Ian H. Witten, Eibe Frank, Mark A. Hall. Data Mining: Practical Machine Learning Tools and Techniques, Third Edition. ISBN 0123748569.
- Class project (40%), where you solve a data-science problem from data preparation to data product
- Proposal report (10%) -- 2 pages maximum plus 4-minute in-class pitch -- due on Thu 3/12.
Should include answers to the following questions:
Class presentation (12%) -- 6-minute presentations -- due on Mon 4/27 (Group 1) and Mon 5/4 (Group 2). Groups will be determined later.
Final report (18%) -- 6 pages maximum -- due on Mon 5/11.
- What is the problem?
- Why is it interesting and important?
- Why is it hard? Why have previous approaches failed?
- What are the key components of your approach?
- What data sets and metrics will be used to validate the approach?
Three homework assignments (36% total; 12% per HW)
- HW#1 out on Mon 2/9; due on Mon 2/23; graded by Mon 3/9.
- HW#2 out on Mon 3/9; due on Mon 3/23; graded by Mon 4/6.
- HW#3 out on Mon 3/30; due on Mon 4/13; graded by Mon 4/27.
In-class exam (24%) on Thu 4/16; graded by Mon 5/4
Notes, Policies, and Guidelines
We will use the class Sakai site for announcements, assignments, and your contributions.
- Homeworks must be done individually. Late homeworks are accepted up to 4 days after the deadline. A penalty of 20% will be charged for each late day.
- The class project can be done either individually or in groups of two.
- Any regrading request must be submitted in writing and within one week of the returned material. The request must detail precisely and concisely the grading error.
- When emailing me or the TA about the course, begin the subject line with [sp15 cs443].
- Refresh your knowledge of the university's academic integrity policy and plagiarism. There is zero-tolerance for cheating.
- Letter grades will be assigned based on Rutgers Undergraduate Grade Scale, which is as follows:
| A in [90, 100]
|| B+ in [85, 89.99]
|| B in [80, 84.99]
|| C+ in [75, 79.99]
|| C in [70, 74.99]
|| D in [60, 69.99]
|| F in [0, 59.99]
Resources & Recent Stories
- UC Berkeley's Data Science Resources
- Some software, tools, and data resources
- Claire Cain Miller, Data Science: The Numbers of Our Lives, New York Times, April 11, 2013.
- Steve Lor. Sure, Big Data Is Great. But So Is Intuition, New York Times, December 29, 2012.
- Thomas H. Davenport and D.J. Patil. Data Scientist: The Sexiest Job of the 21st Century, Harvard Business Review, October 2012.
- Data Science: An Introduction (Wikibook)
- Shamanth Kumar, Fred Morstatter, Huan Liu. Twitter Data Analytics, Springer 2013.