CS365: Foundations of Data Science (Spring’23)

Info

Instructor

Teaching Fellow

Piazza website

Github

Prerequisites

Students taking this class must have taken:

  • CS 112
  • CS 131 (MA293)
  • CS 132 (MA242) 
  • and CS 237 (MA581) or equivalent.

This year the prerequisites will be strictly enforced. CS 330 is highly recommended but not a prereq.

Syllabus

Topics will include probability, information theory, linear algebra, calculus, Fourier analysis, graph theory with a strong focus on their applicability for analyzing datasets. Finally, two lectures will be devoted to data management, and more specifically the classic relational model, SQL and Datalog. A detailed syllabus is available on Piazza.

Textbooks

There will be assigned readings from the following books that are available online (click for the pdf)

  1. Mathematics for Machine Learning by Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong.
  2. Foundations of Data Science by Avrim Blum, John Hopcroft, Ravi Kannan
  3. Understanding Machine Learning: From theory to algorithms by Shai Shalev-Shwartz and Shai Ben-David
  4. Introduction to Probability for Data Science by Stanley Chan

Programming

The class assumes familiarity with programming. The recommended languages for this class are Python3 and Julia. R and Matlab are also recommended. Other languages are welcome (C, C++, Java, etc), but are not recommended for this class.

Lectures

Note: at the end of each lecture, you will find the assigned readings. The readings associated with a magnifying glass are mandatory. The rest is material if you are further interested, and have the time to devote.

  • Lecture 1 (1/19): data visualization – introduction, class logistics, types of data, basics of data visualization
    Slides available here.
  • Lecture 2 (1/25)probability I – review of prerequisite material, and other basic concepts through problem solving
    Slides available here.
  • Lecture 3 (1/26):probability II – convergence of random variables, Markov’s inequality
    Slides available here.
  • Lecture 4 (2/1): probability III – Weak law of large numbers, confidence intervals, π estimation randomized algorithm Central Limit theorem
    Slides available here.

Assignments

%d bloggers like this: