NYU CS-GY 6763 (3943)
Algorithmic Machine Learning
and Data Science
Advanced theory course exploring contemporary computational methods that enable machine learning and data science at scale.
Course Team:
Lectures: Mondays, 5:00-7:30, 2 Metrotech Center 9011
Professor office hours: Weekly on Wednesdays, 4:15-5:45pm Zoom link.
TA office hours: Tuesdays, 2:30 - :400pm. Zoom Link
Syllabus: here.
Grading breakdown: Problem Sets 50%, Midterm 15%, Final project OR final exam 25%, Partipation 10% Final project guidelines: here.
Problem Sets: Problem sets must be turned in via Gradescope on NYU Brightspace. While not required, I highly encourage students to prepare problem sets in LaTeX. You can use this template for LaTeX. If you do write problems by hand, scan and upload as a PDF. Collaboration is allowed on homework, but solutions and code must be written independently. Writing should not be done in parallel, and students must list collaborators for each problem separately. See the syllabus for details.
Prerequisites: This course is mathematically rigorous, and is intended for graduate students and advanced undergraduates. Formally we require previous courses in machine learning, algorithms, and linear algebra. Experience with probability and random variables is also necessary. See the syllabus for more details and email me if you have questions about your preparation for the course.
Resources: There is no textbook to purchase. Course material will consist of annotated course, as well as assorted online resources, including papers, notes from other courses, and publicly available surveys. Please refer to the course webpage before and after lectures to keep up-to-date as new resources are posted.
Problem Sets:
Problem Set 1 (due Monday, Feb 14th by 11:59pm ET).
Problem Set 2 (due Friday, Mar 11th by 11:59pm ET).
Problem Set 3, UScities.txt (due Monday, April 11 by 11:59pm ET).
Problem Set 4 (due Monday, May 2nd, by 11:59pm ET).
Week # | Topic | Reading | Homework |
---|---|---|---|
The Power of Randomness | |||
1. 1/24 | Random variables and concentration, Markov's and Chebyshev's inequality, applications to Mark and Recapture and Frequent Item Detection. |
|
|
2. 1/31 | Count-Sketch, Median Trick, Chernoff Bounds & Union Bound with applications |
|
|
3. 2/7 | High-Dimensional Geometry and the Johnson Lindenstrauss Lemma |
|
|
4. 2/14 | Linear Sketching, Sparse Recovery, and L_0 Sampling. |
|
|
2/21 | No Class: Presidents Day. | ||
5. 2/28 | Locality sensitive hashing and approximate nearest neighbor search |
|
|
Optimization | |||
6. 3/7 | Convexity, Gradient Descent, and Conditioning |
|
|
3/14 | No Class: Spring Break | ||
7. 3/21 |
Midterm Exam (first half of class)
Positive-Semi Definiteness and Preconditioning (second half). |
|
|
8. 3/28 | Online Gradient Descent, Stochastic Gradient Descent, and Online Learning with the Multiplicative Weights Algorithm. |
|
|
Spectral Methods and Randomized Numerical Linear Algebra | |||
9. 4/4 | Singular value decomposition, Krylov methods |
|
|
10. 4/11 | Sketching for Linear Regression, ε-nets |
|
|
11. 4/18 | Sparse Recovery and Compressed Sensing. |
|
|
12. 4/25 | Spectral graph theory, spectral clustering, generative models for networks |
|
|
11. 5/2 | Spectral Sparsification |
|