Adaptive Sampling Networks
Co-authored with Navneel Singhal.
Adaptive Sampling Networks explore a simple question: can the decoding strategy of a language model be learned, instead of fixed by hand-tuned heuristics like temperature, top-k, or nucleus sampling?
Problem
Most LLM deployments treat decoding as a hyperparameter choice. The same sampling rule is applied across prompts, uncertainty regimes, and output distributions.
That is useful, but rigid. A sampler should be able to respond to the shape of the probability distribution it receives.
Approach
We used a lightweight network that transforms the model’s logits before sampling.
The design keeps the base model frozen and learns a distribution-level transformation over logits. A key constraint is permutation equivariance: the sampler should respond to probability structure, not token identity.
Why It Matters
Decoding is part of model behavior.
If the sampler changes reliability, diversity, correctness, or instruction-following, then it belongs in the same conversation as evaluation and post-training. It is a small part of the system, but it can affect the behavior users actually see.