Advances in technology have allowed us to collect massive amounts of data. A data scientist is a person who has the skills, knowledge, and ability to extract actionable knowledge from the data -- either for the good of society, advancement of science, profits in business, etc. This course will cover the topics needed to solve data-science problems, which include data preparation (collection & integration), data characterization & presentation, data analysis (experimentation & observational studies), and data products.

Syllabus / Schedule


This course does not have a designated textbook. The readings are assigned in the syllabus. Here are some textbooks (all optional) related to the course.


The class requires an ability to deal with abstract mathematical concepts such as the ones covered in 01:198:112, 01:198:205, and 01:198:206. You need an introductory-level background in algorithms, probability, and linear algebra. You also need to know programming to perform data manipulation and analysis (e.g., one of Python, Matlab, R, etc) and Web programming (e.g., one of HTML, CSS, Javascript, etc). The specific programming language is mostly your choice.

Grading Policies

Notes, Policies, and Guidelines

Resources & Recent Stories