Research

My research lies at the intersection of machine learning, artificial intelligence, and human decision-making. I am broadly interested in understanding and improving how humans interact with ML/AI systems such as Recommenders and Agentic systems. Below are the main threads of my research, along with representative publications.

Dynamics of Human-AI Interaction

Feedback loops in recommendations, adaptive preferences, and long-term engagement vs. utility trade-offs.

Adaptive preferences: recommendations for users whose tastes evolve [ALT’24, NeurIPS’22]
System-2 recommenders: disentangling utility from engagement using temporal point processes [FAccT’24]
Misalignment and limited attention: harnessing user behavior in ranking [WINE’25]
Advertising and behavior models: near-optimal ad optimization [ICLR’26]

Learning from Human Feedback

Algorithms that learn from pairwise comparisons, choice behavior, and noisy or adversarial responses, including dueling bandits, ranking, and peer prediction.

Dueling bandits: batched, non-stationary, and corrupted settings [NeurIPS’22, NeurIPS’23, ICML’22, TMLR’25]
Choice bandits: learning from discrete choice behavior [NeurIPS’20]
Ranking from comparisons: robust aggregation and spectral methods [ICML’20, ICML’18, AISTATS’22]
Peer prediction: eliciting truthful information from heterogeneous agents [EC’17, TEAC’20, EC’16]

Responsible AI Design

Model calibration, creator incentives on platforms, and benchmarking of LLMs and agentic systems.

Creator incentives: cooperative game-theoretic approach to platform design [AISTATS’26] — Oral presentation, top 10 of ~2100 submissions
Utility vs. engagement: disentangling what users want from what keeps them scrolling [FAccT’24]
Peer prediction and fairness: incentive-compatible mechanisms for heterogeneous populations [TEAC’20]

Algorithms & ML Theory

Submodular optimization, stochastic bandits, streaming and parallel algorithms.

Submodular optimization: learning-augmented and stochastic settings [NeurIPS’24, SODA’19]
Stochastic optimization with bandits: semi-bandit learning for monotone problems [FOCS’24]
Streaming and memory-limited learning: sharp memory-regret trade-offs [COLT’22, COLT’17]
Parallel algorithms: near-linear work max-flow [SODA’24]
Hierarchical clustering: sublinear algorithms [NeurIPS’22]

Arpit Agarwal

Dynamics of Human-AI Interaction

Learning from Human Feedback

Responsible AI Design

Algorithms & ML Theory