NYU CS-GY 6763 (3943)
Algorithmic Machine Learning
and Data Science

Advanced theory course exploring contemporary computational methods that enable machine learning and data science at scale.


Course Team:

Rajesh Jayaram
Professor
Rajesh Jayaram
Aarshvi Gajjar
Course Assistant
Aarshvi Gajjar
Teal Witter
Course Assistant
Teal Witter

Lectures: Mondays, 5:00-7:30, 2 Metrotech Center 9011

Professor office hours: Weekly on Wednesdays, 4:15-5:45pm Zoom link.

TA office hours: Tuesdays, 2:30 - :400pm. Zoom Link

Syllabus: here.

Grading breakdown: Problem Sets 50%, Midterm 15%, Final project OR final exam 25%, Partipation 10%
Final project guidelines: here.

Problem Sets: Problem sets must be turned in via Gradescope on NYU Brightspace. While not required, I highly encourage students to prepare problem sets in LaTeX. You can use this template for LaTeX. If you do write problems by hand, scan and upload as a PDF. Collaboration is allowed on homework, but solutions and code must be written independently. Writing should not be done in parallel, and students must list collaborators for each problem separately. See the syllabus for details.

Prerequisites: This course is mathematically rigorous, and is intended for graduate students and advanced undergraduates. Formally we require previous courses in machine learning, algorithms, and linear algebra. Experience with probability and random variables is also necessary. See the syllabus for more details and email me if you have questions about your preparation for the course.

Resources: There is no textbook to purchase. Course material will consist of annotated course, as well as assorted online resources, including papers, notes from other courses, and publicly available surveys. Please refer to the course webpage before and after lectures to keep up-to-date as new resources are posted.

Problem Sets:
Problem Set 1 (due Monday, Feb 14th by 11:59pm ET).
Problem Set 2 (due Friday, Mar 11th by 11:59pm ET).
Problem Set 3, UScities.txt (due Monday, April 11 by 11:59pm ET).
Problem Set 4 (due Monday, May 2nd, by 11:59pm ET).

Week # Topic Reading Homework
The Power of Randomness
1. 1/24 Random variables and concentration, Markov's and Chebyshev's inequality, applications to Mark and Recapture and Frequent Item Detection.
  • Lecture 1 Slides (annotated)

  • Probability review! None of this should be new, but you might want to brush up if you haven't taken prob/stat in a while. Indu recommends this resource for review.
  • Typed lecture notes covering another application of Markov's inequality to analyzing hashing, and proving universality of random linear hash function.
  • Here is the original Count-Min Sketch paper. I encourage you to look through it for further background on the problem, and to additional details of the proof.
2. 1/31 Count-Sketch, Median Trick, Chernoff Bounds & Union Bound with applications
  • Lecture 2 notes (annotated).

  • A useful list of inequalities, especially the Bernulli, exponential, logarithmic, and binomial sections.
  • Additional reading on concentration bounds can be found in Terry Tao's notes.
  • My favorite proof of the Union Bound can be found here.
  • Here is the original Count-Sketch paper. Warning: This paper is a little more challenging to read than the count-min paper, but you may find it extra rewarding if you are interesting in algorithms research.
3. 2/7 High-Dimensional Geometry and the Johnson Lindenstrauss Lemma
4. 2/14 Linear Sketching, Sparse Recovery, and L_0 Sampling.
2/21 No Class: Presidents Day.
5. 2/28 Locality sensitive hashing and approximate nearest neighbor search
  • Lecture 5 Slides (annotated).

  • Good overview of similarity estimation and locality sensitive hashing in Chapter 3 here.
  • These lecture notes are also helpful. These resources use slightly different language (and a slightly different version of MinHash) than I used in class.
  • If you want to learn more about worst-case runtime guarantees (like the Indyk-Motwani result mentioned in class) take a look at Chris Musco's typed lecture notes.
  • Also, see this paper, which describes the origianl Indyk-Motwani result.
Optimization
6. 3/7 Convexity, Gradient Descent, and Conditioning
  • Lecture 6 Slides(annotated)

  • If you need to freshen up on linear algebra, now is good time! This quick reference from Stanford mostly covers what we need.
  • Good book on optimization which is freely available online through NYU libraries.
  • Excellent lecture notes from Aleksander Mądry for more reading on analyzing gradient descent.
  • Moritz Hardt's lecture notes with proofs of gradient descent convergence in all the regimes discussed in class.
  • Sébastien Bubeck's convex optimization book mentioned in class. Fairly technical, but great reference.
3/14 No Class: Spring Break
7. 3/21 Midterm Exam (first half of class)

Positive-Semi Definiteness and Preconditioning (second half).
8. 3/28 Online Gradient Descent, Stochastic Gradient Descent, and Online Learning with the Multiplicative Weights Algorithm.
Spectral Methods and Randomized Numerical Linear Algebra
9. 4/4 Singular value decomposition, Krylov methods
10. 4/11 Sketching for Linear Regression, ε-nets
  • Lecture 10 Slides(annotated)

  • Chris's written notes on sketched regression and ε-nets.
  • Jelani Nelson's course notes from Harvard with a lot more on randomized linear algebra, including methods for sparse JL sketching and randomized low-rank approximation.
  • ε-net arguments are used all over the place in learning theory, algorithm design, and high dimensional probability. Here's an example of how they appear in a different context.
11. 4/18 Sparse Recovery and Compressed Sensing.
12. 4/25 Spectral graph theory, spectral clustering, generative models for networks
11. 5/2 Spectral Sparsification