STOC 2021 Workshop: Robust Streaming, Sketching, and Sampling

When Thursday, June 24, 9:00 AM to 12:00 noon, US Eastern time
Where The workshop will be held virtually as part of STOC 2021.

Organizing Committee


Randomized streaming and sketching algorithms are a widely studied topic. However, the vast majority of such algorithms only guarantee correctness in the static setting, where the data is worst-case but fixed in advance. In contrast, designing and analyzing sketching algorithms that are adversarially robust — meaning that they provide correct outputs even if the data is chosen by an adaptive adversary — has received surprisingly little attention until recently. This has significantly changed within the last year, with a flurry of new results on the adversarially robust setting. A number of these results demonstrate strong connections to and/or implications for a plethora of other areas, including online learning, differential privacy, adaptive data analysis, and statistics.

The goal of this workshop is to present the results in this emerging area in a unified and approachable fashion. Given the nascent field of study, this workshop should appeal both to experts as well as to newcomers interested in streaming and the aforementioned related areas. The line of work highlighted in this workshop studies the robustness of fundamental problems in the streaming regime, such as norm Lp estimation, heavy hitters, distance estimation, and high dimensional geometry. The works presented devise new sketching algorithms and techniques, in addition to analyzing the robustness of existing algorithms. Furthermore, they include results with a complexity-theoretic flavour, such as a separation between the classical and robust streaming models.

The workshop will consist of a series of invited talks that generally do not assume prior knowledge, and provide the most relevant background in-talk. Researchers new to this topic should be able to get a taste of the state of the art and hear about challenges left open for future research.


Opening Remarks
Adversarially Robust Streaming Algorithms

A streaming algorithm is given a long sequence of items and seeks to compute or approximate some function of this sequence using a small amount of memory. A body of work has been developed over the last two decades, resulting in optimal streaming algorithms for a number of problems. I will start by surveying some of these problems. I will then investigate the adversarial robustness of streaming algorithms. An algorithm is considered robust if its performance guarantees hold even if the stream is chosen adaptively by an adversary that observes the outputs of the algorithm along the stream and can react in an online manner. While deterministic streaming algorithms are inherently robust, many central problems do not admit sublinear-space deterministic algorithms; on the other hand, space-efficient randomized algorithms for these problems are generally not adversarially robust. This raises the question of whether there exist efficient adversarially robust (randomized) streaming algorithms for these problems. I will discuss recent work showing for a number of streaming problems, one can obtain algorithms that are adversarially robust with a small overhead in their memory requirements. I will then discuss followup work achieving almost no overhead for a large number of problems in the streaming model, including any problem with a so-called efficient difference estimator.

Based on joint works with Omri Ben-Eliezer, Rajesh Jayaram, Eylon Yogev, and with Samson Zhou.

Adversarial Laws of Large Numbers for Sampling Algorithms

Laws of large numbers guarantee that given a large enough sample from some population, the expectation of any bounded function is well-estimated by its expectation in the sample. We discuss laws of large numbers in sequential sampling processes that can affect the environment they are acting upon and interact with it, according to the model by Ben-Eliezer and Yogev (2020). We characterize the function-classes which admit a uniform law of large numbers in this model: these are exactly the classes that are online learnable. This characterization may be interpreted as an online analogue to the equivalence between learnability and uniform convergence in statistical (PAC) learning. We further discuss the relationship with additional notions from learning theory such as the Rademacher complexity, covering numbers, Littlestone's dimension and shattering dimensions for real-valued functions classes.

This is based on joint works with Noga Alon, Omri Ben-Eliezer, Adam Block, Shay Moran, Moni Naor, Alexander Rakhlin and Eylon Yogev.

Break (10 minutes)

Adversarially Robust Streaming Algorithms via Differential Privacy

A streaming algorithm is said to be adversarially robust if its accuracy guarantees are maintained even when the data stream is chosen maliciously, by an adaptive adversary. We establish a connection between adversarial robustness of streaming algorithms and the notion of differential privacy. This connection allows us to design new adversarially robust streaming algorithms that outperform the current state-of-the-art constructions for many interesting regimes of parameters.

Joint work with Avinatan Hassidim, Yishay Mansour, Yossi Matias, and Uri Stemmer.

Separating Adaptive Streaming from Oblivious Streaming

By combining techniques from learning theory with cryptographic tools from the bounded storage model, we separate the oblivious streaming model from the adversarially-robust streaming model. Specifically, we present a streaming problem for which every adversarially-robust streaming algorithm must use polynomial space, while there exists a classical (oblivious) streaming algorithm that uses only polylogarithmic space. This is the first general separation between the capabilities of these two models, resolving one of the central open questions in adversarial robust streaming.

Joint work with Haim Kaplan, Yishay Mansour, Kobbi Nissim.

A General Framework for Adaptive Data Structures

In this talk, we will overview a simple framework for constructing adaptive data structures from non-adaptive ones. Abstractly, given a randomized data structure providing a high probability accuracy guarantee for any fixed input, we produce a data structure with comparable accuracy and runtime guarantees for any (possibly adaptively) chosen input to the data structure. We consider the concrete setting of Distance Estimation where one is given a database of n vectors {x_1,...,x_n} and the goal is construct a data structure which when given a query point q outputs {d_1,...,d_n} satisfying d_i ≈ ‖q - x_i‖. We instantiate the framework for distance estimation in ℓ_p norms for 0 < p ≤ 2 where we recover near-optimal query times and space complexities. Finally, for the Euclidean setting (p = 2), we show how to improve upon the update times of the aforementioned data structure through the use of pseudorandom structured matrices.

Joint work with Jelani Nelson.

Concluding Remarks & Open Questions