Research Themes

My research spans machine learning, information theory, wireless communication, decentralized control, and game theory. Here are some of the key areas I focus on together with a couple of recent papers. Overall, much of my work has a flavor of looking for simplified models that help build intuition about the behavior of systems.

Machine Learning

Most of my work in this area explores the fundamental limits for machine learning, with a specific interest in overparameterized models and their generalization properties, as well as the fundamentals of in-context learning in LLMs. For this work, I tend to focus on creating simple caricature toy models that permit the exploration and understanding of phenomena of interest. I also have some work in online learning (for example in games) as well as practical applications in communications.

Synthetic Error Injection Fails to Elicit Self-Correction In Language Models

arXiv preprint: 2512.02389 • 2025

Authors

David Wu, Shreyas Kapur, Anant Sahai, Stuart Russell

Abstract

Reinforcement learning has become the dominant paradigm for eliciting reasoning and self-correction capabilities in large language models, but its computational expense motivates exploration of alternatives. Inspired by techniques from autonomous driving and robotics, we investigate whether supervised learning with synthetic error injection can induce self-correction abilities in language models. Our approach inserts artificial errors into reasoning chains, masks them, and supervises the model to recognize and correct these mistakes. Despite the intuitive appeal of this method, we find that it fails to significantly improve performance even on simple synthetic tasks across multiple models. Moreover, even when the model catches its own error, it often parrots the original mistake. We find that the distribution shift of synthetic errors to on-policy errors significantly degrades the error-correction capabilities of the fine-tuned model, even with good synthetic coverage of on-policy errors. Our results help explain why on-policy reinforcement learning methods have proven uniquely effective for eliciting self-correction.

Different simultaneous mechanisms for in-context recall have distinct learning dynamics

ICML 3rd Workshop on High-dimensional Learning Dynamics (HiLD) • 2025

Authors

Sultan Daniels, Dylan Davis, Dhruv Gautam, Wentinn Liao, Gireeja Ranade, Anant Sahai

Abstract

We introduce a new family of toy problems that combine features of linear-regression-style continuous in-context learning (ICL) with discrete associative recall. We pretrain transformer models on sample traces from this toy, specifically symbolically-labeled interleaved state observations from randomly drawn linear deterministic dynamical systems. We study if the transformer models can recall the state of a sequence previously seen in its context when prompted to do so with the corresponding in-context label. Taking a closer look at this task, it becomes clear that the model must perform two functions: (1) identify which system's state should be recalled and apply that system to its last seen state, and (2) continuing to apply the correct system to predict the subsequent states. Training dynamics reveal that the first capability emerges well into a model's training. Surprisingly, the second capability, of continuing the prediction of a resumed sequence, develops much earlier. Via out-of-distribution experiments, and a mechanistic analysis on model weights via edge pruning, we find that next-token prediction for this toy problem involves at least two separate mechanisms. One mechanism uses the discrete symbolic labels to do the associative recall required to predict the start of a resumption of a previously seen sequence. The second mechanism, which is largely agnostic to the discrete symbolic labels, performs a `Bayesian-style' prediction based on the previous token and the context. These two mechanisms have different learning dynamics. To confirm that this multi-mechanism (manifesting as separate phase transitions) phenomenon is not just an artifact of our toy setting, we used OLMo training checkpoints on an ICL translation task to see a similar phenomenon: a decisive gap in the emergence of first-task-token performance vs second-task-token performance.

Explore more papers

Wireless

My work in wireless is largely split between two areas: cognitive radio writ large (including spectrum sharing as well as intelligence in radio systems) and Ultra-reliable low-latency wireless communication systems such as those that might be relevant for industrial control systems.

From Foe to Friend: The Surprising Turn of Mega Constellations in Radio Astronomy

ACM Workshop on Hot Topics in Networks • 2024

Authors

Ali Abedi, Joshua Sanz, Mariya Zheleva, Anant Sahai

Abstract

Cheap spaceflight has ushered in an explosive growth era for Low Earth Orbit (LEO) satellites. While this has brought us LEO satellite megaconstellations for ubiquitious highspeed data, it has also enabled a proliferation of nanosatellites (e.g. CubeSats) launched by diverse organizations. An unfortunate side-effect is harmful interference to sensitive receivers like those of radio astronomy --- no place on Earth is safe. How can we enjoy the fruits of the satellite revolution without blinding ourselves to the secrets of the universe? Networking is the key. This paper proposes InOrbitNet, which aggregates and backhauls traffic from low-capability nanosatellites using highly-capable LEO megaconstellations. By simulating LEO and nanosatellite orbit transitions, we show that orders-of-magnitude reductions in latency and significant increases in capacity are possible as compared to the current non-networked direct-to-ground approach. But more importantly, because LEO megaconstellations are highly capable and tightly managed, this consolidation of RF footprints also allows radio astronomy to be protected from interference.

Using Signals of Opportunity to Establish Trust in Distributed Spectrum Monitoring Systems

Workshop on Signal Reuse for Spectrum Efficiency at the IEEE International Symposium on Dynamic Spectrum Access Networks (DySPAN) • 2024

Authors

Ali Abedi, Joshua Sanz, Anant Sahai

Abstract

We present an automated approach for generating or verifying metadata associated with an RF sensor, such as the sensor location, using signals of opportunity (SoOp) in the environment. As an illustration of the approach, we demonstrate how existing signals from low-earth-orbit satellites can be employed to verify the location of a radio sensor. One application that can significantly benefit from SoOp-derived metadata labeling is distributed spectrum sensing, given that the sensors already cover a wide frequency range. Distributed spectrum monitoring is crucial for comprehensive coverage of large geographical areas over an extended period, and crowdsourcing emerges as a promising solution for establishing large distributed sensor networks. However, a persistent challenge in realizing these systems is the lack of trust in the data provided by each spectrum sensor. The compromise of a radio sensor due to incorrect metadata, such as its location, poses a significant challenge. Our proposed methods offer a solution to mitigate this problem.

Explore more papers

Information Theory

Information theory is a rich area and my work has touched upon different aspects of it, including the intersections with control theory and machine learning. Early in my career, I had a particularly deep interest in the nature/role of delay within information theory.

Lower Bounds for Multiclass Classification with Overparameterized Linear Models

IEEE International Symposium on Information Theory • 2023

Authors

David Wu, Anant Sahai

Abstract

Subramanian et al. introduced an asymptotic Gaussian-features model for overparameterized multiclass classification in which the number of classes, training points, and parameters all go to infinity. They provided some achievable regions where min-norm interpolating classifiers successfully asymptotically generalize as well as conjecturing the full form of the region based on a heuristic analysis. Here, we introduce a converse for such min-norm interpolating classifiers in their model which fully matches their conjectured regions. The key technical tool is a variant of the Hanson-Wright concentration inequality that applies to the sparse bilinear forms that arise.

Harmless Interpolation of Noisy Data in Regression

IEEE Journal on Selected Areas in Information Theory • 2020

Authors

Vidya Muthukumar, Kailas Vodrahalli, Vignesh Subramanian, Anant Sahai

Abstract

A continuing mystery in understanding the empirical success of deep neural networks is their ability to achieve zero training error and generalize well, even when the training data is noisy and there are more parameters than data points. We investigate this overparameterized regime in linear regression, where all solutions that minimize training error interpolate the data, including noise. We characterize the fundamental generalization (mean-squared) error of any interpolating solution in the presence of noise, and show that this error decays to zero with the number of features. Thus, overparameterization can be explicitly beneficial in ensuring harmless interpolation of noise. We discuss two root causes for poor generalization that are complementary in nature -- signal "bleeding" into a large number of alias features, and overfitting of noise by parsimonious feature selectors. For the sparse linear model with noise, we provide a hybrid interpolating scheme that mitigates both these issues and achieves order-optimal MSE over all possible interpolating solutions.

Explore more papers

Control

My work in control has largely focused on the intersections with information theory, with a particular interest in the Witsenhausen counterexample and the role of delay.

Wireless Channel Dynamics and Robustness for Ultra-Reliable Low-Latency Communications

IEEE Journal on Selected Areas in Communications • 2019

Authors

Vasuki Narasimha Swamy, Paul Rigge, Gireeja Ranade, Borivoje Nikolic, Anant Sahai

Abstract

Interactive, immersive, and other timing-critical applications demand ultra-reliable low-latency communication (URLLC). To build wireless communication systems that can support these applications, understanding the relevant characteristics of the wireless medium is paramount. Although wireless channel characteristics and dynamics have been extensively studied, it is important to revisit these concepts in the context of the strict demands of low-latency and ultra-high reliability. In this paper, we bring a modeling approach from robust control to wireless communication-the wireless channel characteristics are given a nominal model around which we allow for some quantified uncertainty. We propose certain key URLLC-relevant parameters along which the model uncertainty is to be bounded. To validate the nominal model of the spatially independent quasi-static Rayleigh fading, we take an in-depth look at the spatial and temporal correlations based on Jakes' model. We find that although the Rayleigh fading process is band-limited, the quasi-static assumption is not safe for relay selection even well within a single coherence time. We also find that under reasonable conditions, the spatial correlation of channels provide a fading distribution that is not too far off from an independent spatial fading model. In addition, we look at the impact of these channel models on cooperative communication-based systems. We find that while spatial-diversity-based techniques are necessary to combat the adverse effects of fading, time-diversity-based techniques are necessary to be robust against unmodeled errors. Robust URLLC systems need to operate with both an adequate SNR margin and a time margin through repetitions.

Control Capacity

IEEE Transactions on Information Theory • 2019

Authors

Gireeja Ranade, Anant Sahai

Abstract

Feedback control actively dissipates uncertainty from a dynamical system by means of actuation. We develop a notion of “control capacity” that gives a fundamental limit (in bits) on the rate at which a controller can dissipate the uncertainty from a system, i.e., stabilize to a known fixed point. We give a computable single-letter characterization of control capacity for memoryless stationary scalar multiplicative actuation channels. Control capacity allows us to answer questions of stabilizability for scalar linear systems: a system with actuation uncertainty is stabilizable if and only if the control capacity is larger than the log of the unstable open-loop eigenvalue. For second-moment senses of stability, we recover the classic uncertainty threshold principle result. However, our definition of control capacity can quantify the stabilizability limits for any moment of stability. Our formulation parallels the notion of Shannon's communication capacity and thus yields both a strong converse and a way to compute the value of side information in control.

Explore more papers

Game Theory

I have been interested in game-theoretic perspectives on regulation of intelligent radios, as well as the intersections with machine learning more generally.

On the Impossibility of Convergence of Mixed Strategies with Optimal No-Regret Learning

Mathematics of Operations Research • 2024

Authors

Vidya Muthukumar, Soham Phade, Anant Sahai

Abstract

We study the limiting behavior of the mixed strategies that result from optimal no-regret learning in a repeated game setting where the stage game is any 2x2 competitive game. We consider optimal no-regret algorithms that are mean-based and monotonic in their argument. We show that for any such algorithm, the limiting mixed strategies of the players cannot converge almost surely to any Nash equilibrium. This negative result is also shown to hold under a broad relaxation of these assumptions, including popular variants of Follow-the-Regularized Leader with optimism or adaptive step sizes. Finally, we provide partial evidence that the monotonicity and mean-based assumptions can be removed or relaxed. Our results identify the inherent stochasticity in players's realizations as a critical factor underlying this divergence, and demonstrate a crucial difference in outcomes between using the opponent's mixtures and realizations to make updates.

Blind Interactive Learning of Modulation Schemes: Multi-Agent Cooperation Without Co-Design

IEEE Access • 2020

Authors

Anant Sahai, Joshua Sanz, Vignesh Subramanian, Caryn Tran, Kailas Vodrahalli

Abstract

We examine the problem of learning to cooperate in the context of wireless communication. In our setting, two agents must learn modulation schemes that enable them to communicate across a power-constrained additive white Gaussian noise channel. We investigate whether learning is possible under different levels of information sharing between distributed agents which are not necessarily co-designed. We employ the "Echo" protocol, a blind interactive learning protocol where an agent hears, understands, and repeats (echoes) back the message received from another agent, simultaneously training itself to communicate. To capture the idea of cooperation between "not necessarily co-designed" agents we use two different populations of function approximators - neural networks and polynomials. We also include interactions between learning agents and non-learning agents with fixed modulation protocols such as QPSK and 16QAM. We verify the universality of the Echo learning approach, showing it succeeds independent of the inner workings of the agents. In addition to matching the communication expectations of others, we show that two learning agents can collaboratively invent a successful communication approach from independent random initializations. We complement our simulations with an implementation of the Echo protocol in software-defined radios. To explore the continuum of co-design, we study how learning is impacted by different levels of information sharing between agents, including sharing training symbols, losses, and full gradients. We find that co-design (increased information sharing) accelerates learning. Learning higher order modulation schemes is a more difficult task, and the beneficial effect of co-design becomes more pronounced as the task becomes harder.

Explore more papers

Browse Full Bibliography

Search and filter through all publications in the Explore interface.

All Publications

View on Google Scholar