- Semester: Fall 2013
- Course number: 01:198:444
- Course title: Introduction to Data Science
- Credits: 3
- Lecture: Mondays & Thursdays 12:00pm-1:20pm
- Location: Busch Campus, SEC-220
- Course website: here and in Sakai
- Instructor: Tina Eliassi-Rad
- Office: CBIM 8
- Office hours: Mondays 1:30pm-2:30pm
- Teaching assistant: Vukosi Marivate
- Office: Hill 486
- Office hours: Thursdays 3:00pm-4:00pm
Advances in technology have allowed us to collect massive amounts of data. A data scientist is a person who has the skills, knowledge, and ability to extract actionable knowledge from the data -- either for the good of society, advancement of science, profits in business, etc. This course will cover the topics needed to solve data-science problems, which include data preparation (collection & integration), data characterization & presentation, data analysis (experimentation & observational studies), and data products.
Syllabus / Schedule
This course does not have a designated textbook. The readings are assigned in the syllabus.
Here are some textbooks (all optional) related to the course.
- Anand Rajaraman, Jurij Leskovec, and Jeffrey Ullman. Mining of Massive Datasets. Cambridge University Press. 2012. (free online)
- Foster Provost, Tom Fawcett. Data Science for Business: What You Need to Know about Data Mining and Data-analytic Thinking. ISBN 1449361323.
- Tom Mitchell. Machine Learning. ISBN 0070428077.
- Christopher Bishop. Pattern Recognition and Machine Learning. ISBN 0387310738.
- Kevin P. Murphy. Machine Learning: A Probabilistic Perspective. ISBN 0262018020.
- Peter Flach. Machine Learning: The Art and Science of Algorithms that Make Sense of Data. ISBN 1107422221.
- Trevor Hastie, Robert Tibshirani, Jerome Friedman. Elements of Statistical Learning. ISBN 0387952845. (free online)
- David J. Hand, Heikki Mannila, Padhraic Smyth. Principles of Data Mining. ISBN 026208290X.
- Jiawei Han, Micheline Kamber, Jian Pei. Data Mining: Concepts and Techniques, Third Edition. ISBN 0123814790.
- Pang-Ning Tan, Michael Steinbach, Vipin Kumar. Introduction to Data Mining. ISBN 0321321367.
- Ian H. Witten, Eibe Frank, Mark A. Hall. Data Mining: Practical Machine Learning Tools and Techniques, Third Edition. ISBN 0123748569.
- Class project (45%), where you solve a data-science problem from data preparation to data product
- Proposal report (10%) -- 2 pages maximum plus 5-minute in-class pitch -- due on Thu 10/24.
Should include answers to the following questions:
Class presentation (15%) -- 8-minute presentation -- due on Thu 12/5 (Group 1) and Mon 12/9 (Group 2). Groups will be determined later.
Final report (20%) -- 6 pages maximum -- due on Thu 12/19.
- What is the problem?
- Why is it interesting and important?
- Why is it hard? Why have previous approaches failed?
- What are the key components of your approach?
- What data sets and metrics will be used to validate the approach?
Three homework assignments (45% total; 15% per HW)
- HW#1 out on Mon 9/23; due on Mon 10/7; graded by Mon 10/21.
- HW#2 out on Mon 10/21; due on Mon 11/4; graded by Mon 11/18.
- HW#3 out on Mon 11/11; due on Mon 11/25; graded by Mon 12/9.
Class participation (10%)
Notes, Policies, and Guidelines
We will use the class Sakai site for announcements, assignments, and your contributions.
- Homeworks must be done individually. Late homeworks are accepted up to 4 days after the deadline. A penalty of 20% will be charged for each late day.
- The class project can be done either individually or in groups of two.
- Any regrading request must be submitted in writing and within one week of the returned material. The request must detail precisely and concisely the grading error.
- Refresh your knowledge of the university's academic integrity policy and plagiarism. There is zero-tolerance for cheating.
Resources & Recent Stories
- UC Berkeley's Data Science Resources
- Some software, tools, and data resources
- Claire Cain Miller, Data Science: The Numbers of Our Lives, New York Times, April 11, 2013.
- Steve Lor. Sure, Big Data Is Great. But So Is Intuition, New York Times, December 29, 2012.
- Thomas H. Davenport and D.J. Patil. Data Scientist: The Sexiest Job of the 21st Century, Harvard Business Review, October 2012.
- Data Science: An Introduction (Wikibook)
- Shamanth Kumar, Fred Morstatter, Huan Liu. Twitter Data Analytics, Springer 2013.