Info
Instructor
- When: Tue, Thu 5pm-6.15pm
- Where: CAS-313
- Prof: Babis Tsourakakis
- Email: ctsourak@bu.edu
- Office hours (CDS 912): Tue 11-noon, Thu 10.30-11.30am
Teaching Fellow
- TF: Mr. Tiany Chen
- Email: ctony@bu.edu
- Labs : schedule
- Office hours (CDS 908): Mon 4-5.30pm, Thu 3-4.30pm
Github
Prerequisites
Students taking this class must have taken:
- CS 112
- CS 131 (MA293)
- CS 132 (MA242)
- and CS 237 (MA581) or equivalent.
This year the prerequisites will be strictly enforced. CS 330 is highly recommended but not a prereq.
Syllabus
Topics will include probability, information theory, linear algebra, calculus, Fourier analysis, graph theory with a strong focus on their applicability for analyzing datasets. Finally, two lectures will be devoted to data management, and more specifically the classic relational model, SQL and Datalog. A detailed syllabus is available on Piazza.
Textbooks
There will be assigned readings from the following books that are available online (click for the pdf)
- Machine Learning: A Probabilistic Perspective [M] by Kevin Murphy
- Mathematics for Machine Learning [DFO] by Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong.
- Foundations of Data Science by Avrim Blum, John Hopcroft, Ravi Kannan [BHK]
- Understanding Machine Learning: From theory to algorithms by Shai Shalev-Shwartz and Shai Ben-David [SD]
- Introduction to Probability for Data Science [SC] by Stanley Chan
Programming
The class assumes familiarity with programming. The recommended languages for this class are Python3 and Julia. R, Mathematica and Matlab are also recommended. Other languages are welcome (C, C++, Java, etc), but are not recommended for this class.
Lectures
Note: at the end of each lecture, you will find the assigned readings. The readings associated with a magnifying glass are mandatory. The rest is material if you are further interested, and have the time to devote.
Part I: Core Concepts in Data Science (Probability, Linear Algebra, Optimization)
- Introduction (1/18), : Introduction
Slides available here - PART 1A: Probability and Statistics
Slides available here and Julia notebook here
Readings: [SC] Chapters 1-5 and [M] Chapter 2
PART 1B: Linear Algebra, SVD, PCA
PART 1C: Vector Calculus and Optimization
Slides are available here
Readings: [DFO] Chapters 5 and 7
Part II: Data Science in Action
- Topic 1: Data streams
Slides available here - Topic 2: Dimensionality reduction
- Topic 3: EM Algorithm
Slides available here
Readings- What is the expectation maximization algorithm?
- Optional reading: mixtures of Gaussians Andrew Ng’s notes
- Topic 4: Markov Chains
Slides available here
Readings- BHK 4.1, 4.8
- BHK 4.1, 4.8
- Topic 5: Time Series
Slides available here - Topic 6: What is learning? The Perceptron algorithm
Slides from CMU available here
Readings- [SD] Chapters 2,3 and 9.1.2
- [SD] Chapters 2,3 and 9.1.2
- Topic 7: Unsupervised learning
Readings- BHK Chapter 7
- k-means demo and a youtube video
- Spectral graph theory and its applications
- Densest subgraph problem tutorial