Rajesh Jayaram

Rajesh Jayaram

Senior Research Scientist, Google Research NYC

About

I am a Senior Research Scientist at Google NYC in the Algorithms and Optimization Group. I received my PhD in computer science at Carnegie Mellon in the summer of 2021, where I was fortunate to be advised by David Woodruff. Prior to that, I received my bachelor's from Brown University in May of 2017.

My academic research background is in sublinear algorithms and high-dimensional geometry — specifically sketching, streaming, and distributed algorithms. At Google, I apply this theoretical foundation to practical problems, including industry-scale vector search, neural embedding model design and training, and inference-time scaling for mathematics and science.

Research Interests

High-Dimensional Geometry & Neural Embeddings

Leveraging techniques from High Dimensional Geometry and Probability to study the expressivity of Neural Embedding models, and the complexity of Geometric Optimization Tasks (MST, EMD, etc.)

Inference-Time Scaling for Science

Developing scaling methods to solve complex problems in mathematics and science, and accelerating scientific development with tools such as automated reviewing (see our ICML and STOC collaborations).

Theory & Practice of Nearest Neighbor Search

Developing faster algorithms in both the theory and practice of NNS, especially for complex metrics (e.g. Earth Mover's Distance, Chamfer). This fuels the development of production-grade vector search systems serving billions of queries, both for (classic) single-vector and multi-vector models.

Sketching & Sublinear Algorithms

Design and analysis of sublinear algorithms for big data, including streaming, distributed computing, and information compression. These techniques have broad applications, from efficient attention mechanisms in large language models to massively parallel clustering and graph algorithms.

Publications

Preprints

CRISP: Clustering Multi-Vector Representations for Denoising and Pruning
With João Veneroso, Jinmeng Rao, Gustavo Hernández Ábrego, Majid Hadian, and Daniel Cer.

2026

Near-Optimal Directed Euclidean Spanners in High Dimensions
With Shyamal Patel, Cliff Stein, Erik Waingarten, and Tian Zhang.
STOC 2026

2025

Hierarchical Retrieval: The Geometry and a Pretrain-Finetune Recipe
With Chong You, Ananda Theertha Suresh, Robin Nittka, Felix Yu, and Sanjiv Kumar.
NeurIPS 2025
Approximating High-Dimensional Earth Mover's Distance as Fast as Closest Pair
With Lorenzo Beretta, Vincent Cohen-Addad, and Erik Waingarten.
FOCS 2025
Metric Embeddings Beyond Bi-Lipschitz Distortion via Sherali-Adams
With Ainesh Bakshi, Vincent Cohen-Addad, Samuel B. Hopkins, and Silvio Lattanzi.
COLT 2025
Randomized Dimensionality Reduction for Euclidean Maximization and Diversity Measures
With Jie Gao, Benedikt Kolbe, Shay Sapir, Chris Schwiegelshohn, Sandeep Silwal, and Erik Waingarten.
ICML 2025
Unleashing Graph Partitioning for Large-Scale Nearest Neighbor Search
With Laxman Dhulipala, Lars Gottesbüren, and Jakub Lacki.
VLDB 2025
Near-Optimal Spectral Density Estimation via Explicit and Implicit Deflation
With Rajarshi Bhattacharjee, Cameron Musco, Christopher Musco, and Archan Ray.
SODA 2025
Massively Parallel Minimum Spanning Tree in General Metric Spaces
With Amir Azarmehr, Soheil Behnezhad, Jakub Łącki, Vahab Mirrokni, and Peilin Zhong.
SODA 2025

2024

MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings
With Laxman Dhulipala, Majid Hadian, Jason Lee, and Vahab Mirrokni.
NeurIPS 2024
Efficient Centroid-Linkage Clustering
With MohammadHossein Bateni, Laxman Dhulipala, Willem Fletcher, Kishen N Gowda, D Ellis Hershkowitz, and Jakub Łącki.
NeurIPS 2024
Metric Clustering and MST with Strong and Weak Distance Oracles
With MohammadHossein Bateni, Prathamesh Dharangutte, and Chen Wang.
COLT 2024
Parallel and Sequential Hardness of Hierarchical Graph Clustering
With Mohammad Hossein Bateni, Laxman Dhulipala, Kishen Gowda, D Ellis Hershkowitz, and Jakub Lacki.
ICALP 2024
Dynamic PageRank: Algorithms and Lower Bounds
With Jakub Łącki, Slobodan Mitrović, Krzysztof Onak, and Piotr Sankowski.
ICALP 2024
Data-Dependent LSH for the Earth Mover's Distance
With Erik Waingarten and Tian Zhang.
STOC 2024
HyperAttention: Long-Context Attention in Near-Linear Time
With Insu Han, Amin Karbasi, Vahab Mirrokni, David Woodruff, and Amir Zandieh.
ICLR 2024
Massively Parallel Algorithms for High-Dimensional Euclidean Minimum Spanning Tree
With Vahab Mirrokni, Shyam Narayanan, and Peilin Zhong.
SODA 2024
Fully Dynamic Consistent k-Center Clustering
With Christoph Grunau, Bernhard Haeupler, Jakub Łącki, and Václav Rozhoň.
SODA 2024
Streaming Algorithms with Few State Changes
With David Woodruff and Samson Zhou.
PODS 2024

2023

A Near-Linear Time Algorithm for the Chamfer Distance
With Ainesh Bakshi, Piotr Indyk, Sandeep Silwal, and Erik Waingarten.
NeurIPS 2023
Streaming Euclidean MST to a Constant Factor
With Vincent Cohen-Addad, Xi Chen, Amit Levi, and Erik Waingarten.
STOC 2023
Optimal Fully Dynamic k-Centers Clustering
With MohammadHossein Bateni, Hossein Esfandiari, and Vahab Mirrokni.
SODA 2023
Merged with Hendrik Fichtenberger, Monika Henzinger, and Andreas Wiese
Differentially Oblivious Relational Database Operators
With Lianke Qin, Elaine Shi, Zhao Song, Danyang Zhuo, and Shumo Chu.
VLDB 2023

2022

Stars: Tera-Scale Graph Building for Clustering and Learning
With CJ Carey, Jonathan Halcrow, Vahab Mirrokni, Warren Schudy, and Peilin Zhong.
NeurIPS 2022
New Streaming Algorithms for High Dimensional EMD and MST
With Xi Chen, Amit Levi, and Erik Waingarten.
STOC 2022
Truly Perfect Samplers for Data Streams and Sliding Windows
With David Woodruff and Samson Zhou.
PODS 2022

2021

An Optimal Algorithm for Triangle Counting in a Stream
With John Kallaugher.
APPROX 2021
Learning and Testing Junta Distributions with Subcube Conditioning
With Xi Chen, Amit Levi, and Erik Waingarten.
COLT 2021
In-Database Regression in Input Sparsity Time
With Alireza Samadian, David Woodruff, and Peng Ye.
ICML 2021
When is Approximate Counting for Conjunctive Queries Tractable?
With Marcelo Arenas, Luis Alberto Croquevielle, and Cristian Riveros.
STOC 2021

2020

Testing Positive Semi-Definiteness via Random Submatrices
With Ainesh Bakshi and Nadiia Chepurko.
FOCS 2020
A Framework for Adversarially Robust Streaming Algorithms
With Omri Ben-Eliezer, David Woodruff, and Eylon Yogev.
PODS 2020 & Journal of the ACM
PODS Best Paper Award 2020 Invited to Journal of the ACM 2021 SIGMOD Research Highlight Invited to HALG 2021
Span Recovery for Deep Neural Networks with Applications to Input Obfuscation
With Qiuyi Zhang and David Woodruff.
ICLR 2020

2019

Optimal Sketching for Kronecker Product Regression and Low Rank Approximation
With Huaian Diao, Zhao Song, Wen Sun, and David Woodruff.
NeurIPS 2019
Towards Optimal Moment Estimation in Streaming and Distributed Models
With David Woodruff.
APPROX 2019
Learning Two Layer Rectified Neural Networks in Polynomial Time
With Ainesh Bakshi and David Woodruff.
COLT 2019
Efficient Logspace Classes for Enumeration, Counting, and Uniform Generation
With Marcelo Arenas, Luis Alberto Croquevielle, and Cristian Riveros.
PODS 2019 & Journal of the ACM
PODS Best Paper Award 2019 Invited to Journal of the ACM 2021 SIGMOD Research Highlight
Weighted Reservoir Sampling from Distributed Streams
With Gokarna Sharma, Srikanta Tirthapura, and David P. Woodruff.
PODS 2019

2018

Perfect Lp Sampling in a Data Stream
With David Woodruff.
FOCS 2018 & SIAM Journal on Computing
Data Streams with Bounded Deletions
With David Woodruff.
PODS 2018

2017

Approximating Language Edit Distance Beyond Fast Matrix Multiplication: Ultralinear Grammars Are Where Parsing Becomes Hard!
With Barna Saha.
ICALP 2017

Dissertation

PhD Thesis, Carnegie Mellon University, May 2021
Committee: David Woodruff (advisor), Anupam Gupta (CMU), Andrej Risteski (CMU), Alexandr Andoni (Columbia), Jelani Nelson (Berkeley)

Teaching

I taught as an Adjunct Professor at NYU's Tandon School of Engineering.

Spring 2022: NYU CS-GY 6763 — Algorithmic Machine Learning and Data Science

Workshops

I co-organized the Approximate Nearest Neighbor Search (ANNS) workshop at FOCS 2025, with the goal of “Bridging Theoretical Foundations and Industrial Frontiers” for the central ANNS problem.

I co-organized the Industry Workshop at FOCS 2024, with the goal of bridging techniques between theory and practice.

I co-organized the Workshop on Robust Streaming, Sketching, and Sampling at STOC 2021. A full recording can be found here.

I co-organized the Workshop on Algorithms for Large Data (Online) (WALDO 2021), which took place August 23–25, 2021.