NYU CS-GY 6763 (3943)
Algorithmic Machine Learning
and Data Science

Advanced theory course exploring contemporary computational methods that enable machine learning and data science at scale.

Course Team:

Professor
Rajesh Jayaram

Course Assistant
Aarshvi Gajjar

Course Assistant
Teal Witter

Lectures: Mondays, 5:00-7:30, 2 Metrotech Center 9011

Professor office hours: Weekly on Wednesdays, 4:15-5:45pm Zoom link.

TA office hours: Tuesdays, 2:30 - :400pm. Zoom Link

Syllabus: here.

Grading breakdown: Problem Sets 50%, Midterm 15%, Final project OR final exam 25%, Partipation 10%
Final project guidelines: here.

Problem Sets: Problem sets must be turned in via Gradescope on NYU Brightspace. While not required, I highly encourage students to prepare problem sets in LaTeX. You can use this template for LaTeX. If you do write problems by hand, scan and upload as a PDF. Collaboration is allowed on homework, but solutions and code must be written independently. Writing should not be done in parallel, and students must list collaborators for each problem separately. See the syllabus for details.

Prerequisites: This course is mathematically rigorous, and is intended for graduate students and advanced undergraduates. Formally we require previous courses in machine learning, algorithms, and linear algebra. Experience with probability and random variables is also necessary. See the syllabus for more details and email me if you have questions about your preparation for the course.

Resources: There is no textbook to purchase. Course material will consist of annotated course, as well as assorted online resources, including papers, notes from other courses, and publicly available surveys. Please refer to the course webpage before and after lectures to keep up-to-date as new resources are posted.

Problem Sets:
Problem Set 1 (due Monday, Feb 14th by 11:59pm ET).
Problem Set 2 (due Friday, Mar 11th by 11:59pm ET).
Problem Set 3, UScities.txt (due Monday, April 11 by 11:59pm ET).
Problem Set 4 (due Monday, May 2nd, by 11:59pm ET).

Week #	Topic	Reading	Homework
The Power of Randomness
1. 1/24	Random variables and concentration, Markov's and Chebyshev's inequality, applications to Mark and Recapture and Frequent Item Detection.	Lecture 1 Slides (annotated) Probability review! None of this should be new, but you might want to brush up if you haven't taken prob/stat in a while. Indu recommends this resource for review. Typed lecture notes covering another application of Markov's inequality to analyzing hashing, and proving universality of random linear hash function. Here is the original Count-Min Sketch paper. I encourage you to look through it for further background on the problem, and to additional details of the proof.	Problem Set 1, due Mon. 2/14.
2. 1/31	Count-Sketch, Median Trick, Chernoff Bounds & Union Bound with applications	Lecture 2 notes (annotated). A useful list of inequalities, especially the Bernulli, exponential, logarithmic, and binomial sections. Additional reading on concentration bounds can be found in Terry Tao's notes. My favorite proof of the Union Bound can be found here. Here is the original Count-Sketch paper. Warning: This paper is a little more challenging to read than the count-min paper, but you may find it extra rewarding if you are interesting in algorithms research.
3. 2/7	High-Dimensional Geometry and the Johnson Lindenstrauss Lemma	Lecture 3 notes (annotated). Much of the lecture content was adapted from Chapter 2 of Foundations of Data Science. Typed lecture notes, by Professor Chris Musco, on the Johnson-Lindenstrauss lemma. Ignore Section 4.1 -- we will cover that in a future lecture. Helpful notes on the JL Lemma by Anupam Gupta.
4. 2/14	Linear Sketching, Sparse Recovery, and L_0 Sampling.	Lecture 4 Slides (annotated) See this write-up, taken from Jelani Nelson's incredible course notes, for an improved linear sketching algorithm for the k-sparse recovery problem, based on Vandermonde matrices.	Problem Set 2, due Mon. 3/7.
2/21	No Class: Presidents Day.
5. 2/28	Locality sensitive hashing and approximate nearest neighbor search	Lecture 5 Slides (annotated). Good overview of similarity estimation and locality sensitive hashing in Chapter 3 here. These lecture notes are also helpful. These resources use slightly different language (and a slightly different version of MinHash) than I used in class. If you want to learn more about worst-case runtime guarantees (like the Indyk-Motwani result mentioned in class) take a look at Chris Musco's typed lecture notes. Also, see this paper, which describes the origianl Indyk-Motwani result.
Optimization
6. 3/7	Convexity, Gradient Descent, and Conditioning	Lecture 6 Slides (annotated) If you need to freshen up on linear algebra, now is good time! This quick reference from Stanford mostly covers what we need. Good book on optimization which is freely available online through NYU libraries. Excellent lecture notes from Aleksander Mądry for more reading on analyzing gradient descent. Moritz Hardt's lecture notes with proofs of gradient descent convergence in all the regimes discussed in class. Sébastien Bubeck's convex optimization book mentioned in class. Fairly technical, but great reference.
3/14	No Class: Spring Break
7. 3/21	Midterm Exam (first half of class) Positive-Semi Definiteness and Preconditioning (second half).	Lecture 7 Slides (annotated) See this reference for a good review of basic matrix operations and matrix calculus.
8. 3/28	Online Gradient Descent, Stochastic Gradient Descent, and Online Learning with the Multiplicative Weights Algorithm.	Lecture 8 Slides (annotated) See this survey for background and additional info on the Multiplicative Weights Update algorithm.
Spectral Methods and Randomized Numerical Linear Algebra
9. 4/4	Singular value decomposition, Krylov methods	Lecture 9 Slides (annotated) Detailed notes on power method analysis. See Section 3.7 in Foundations of Data Science for an analysis of power method that does not involve singular value gaps. Read Foundations of Data Science, Chapter 3 for more on singular value decomposition and low-rank approximation. Other good notes here and here.
10. 4/11	Sketching for Linear Regression, ε-nets	Lecture 10 Slides (annotated) Chris's written notes on sketched regression and ε-nets. Jelani Nelson's course notes from Harvard with a lot more on randomized linear algebra, including methods for sparse JL sketching and randomized low-rank approximation. ε-net arguments are used all over the place in learning theory, algorithm design, and high dimensional probability. Here's an example of how they appear in a different context.
11. 4/18	Sparse Recovery and Compressed Sensing.	Lecture 11 Slides (annotated) Original FJLT paper. Written notes on compressed sensing. Lots of great resources under weeks 8 and 9 of Stanford's Algorithmic Toolbox course.
12. 4/25	Spectral graph theory, spectral clustering, generative models for networks	Lecture 12 Slides (annotated) For more details, consult written notes on the stochastic block model. Wayne Zachary's Karate Club paper. Stanford lecture notes introducing spectral graph theory. Tropp's Book An Introduction to Matrix Concentration Inequalities gives a comprehensive overview of a very powerful set of tools to analyze random matrices.
11. 5/2	Spectral Sparsification	Lecture 13 Slides See Spielman and Srivastava's seminal paper. The proof of their main result is discussed in the lecture.

NYU CS-GY 6763 (3943)Algorithmic Machine Learning and Data Science

Course Team: Professor Rajesh Jayaram Course Assistant Aarshvi Gajjar Course Assistant Teal Witter

NYU CS-GY 6763 (3943)
Algorithmic Machine Learning
and Data Science

Course Team:

Professor
Rajesh Jayaram

Course Assistant
Aarshvi Gajjar

Course Assistant
Teal Witter