18.408 Theoretical Foundations for Deep Learning

Spring 2021

Deep learning has sparked a revolution across machine learning. It has led to major advancements in vision, speech, playing strategic games, and the sciences. And yet it remains largely a mystery. We do not understanding why the algorithms that we use work so well in practice.

In this class we will explore theoretical foundations for deep learning, emphasizing the following themes: (1) Approximation: What sorts of functions can be represented by deep networks, and does depth provably increase the expressive power? (2) Optimization: Essentially all optimization problems we want to solve in practice are non-convex. What frameworks can be used to analyze such problems? (3) Beyond-Worst Case Analysis: Deep networks can memorize worst-case data, so why do they generalize well on real-world data? For this and related questions, our starting point will often be natural data-generative models. The theory of deep learning is still very much a work-in-progress. Our goal in this course is merely to explain some of the key questions that drive the this area, and take a critical look at where the existing theory falls short.

We will cover topics such as: Barron's theorem, depth separations, landscape analysis, implicit regularization, neural tangent kernels, generalization bounds, data poisoning attacks and frameworks for proving lower bounds against deep learning.

Announcement: Homework 2 is now up, and is due May 7th. Please submit via gradescope.

Announcement: The final project description is up here. Please email me if you'd like to chat about potential project topics.

Announcement: You should join the course slack where we will post all lectures, notes, homeworks, etc.

Course Information

Instructor: Ankur Moitra

Harvard Sister Seminar: CS 229br: Advanced Topics in the Theory of Machine Learning, by Boaz Barak

Lectures: Wednesday 12-3, on Zoom

Prerequisites: A course in algorithms (6.046/18.410 or equivalent) and probability (6.041/18.440 or equivalent). You will need a strong background in algorithms, probability and linear algebra.

Assessment: Students will be expected to solve two problem sets, and complete a research-oriented final project. This could be either a survey, or original research; students will be encouraged to find connections between the course material and their own research interests.

Course Outline

Here is a tentative outline for the course:

Approximation Theory

Universal Approximation and the Fourier Transform
Barron's Theorem
Depth Separations and Connections to Circuit Complexity

Optimization Theory

Convex Optimization and Acceleration
Escaping Local Minima
Implicit Regularization
Neural Tangent Kernels

Statistical Theory

Data-independent and Data-dependent Generalization Bounds
Kernel Ridge Regression
The Interpolation Regime and Double Descent

Representation Learning

Dictionary Learning
Student-Teacher Networks
Lower Bounds from Cryptography

Advanced Optimization Theory

The Mean Field View
Landscape Analysis and Matrix Completion
Langevin Dynamics and Metastability

Instructor Notes

Lecture 1: Universal Approximation and Barron's Theorem [pdf]
Lecture 2: Barron's Theorem (continued) and Depth Separations [pdf]
Lecture 3: Neural Tangent Kernels and Least Squares [pdf]
Lecture 4: Implicit Regularization and Mirror Descent [pdf]
Lecture 5: Generalization Bounds and PAC Learnability [pdf]
Lecture 6: Stability, Regularization and Privacy [pdf]
Lecture 7: Double Descent and Proper vs. Improper Learning [pdf]
Lecture 8: The Low Degree Algorithm, Intersections of Halfspaces and Statistical Queries [pdf]
Lecture 9: More Lower Bounds, Landscape Design and Filtered PCA [pdf]
Lecture 10: An Interlude [pdf]
Lecture 11: Adversarial Examples and Data Poisoning [pdf]
Lecture 12: GANs, Equilibria and Compressed Sensing [pdf]
Lecture 13: Deep Reinforcement Learning [pdf]