Verbalized Sampling

Verbalized Sampling (VS) is a training-free prompting strategy that improves the diversity of large language model outputs by asking the model to generate multiple responses with self-assigned probabilities. Introduced by researchers at Stanford University, Northeastern University, and West Virginia University in October 2025, VS counteracts typicality bias — the tendency of human annotators to systematically favor familiar, fluent text over novel but equally valid alternatives — that becomes embedded in preference data during reinforcement learning from human feedback (RLHF).[^c1] The technique recovers approximately 66.8% of the model's pre-alignment creativity without fine-tuning or sacrificing factual accuracy.[^c2] A related but independently developed method, Verbalized Rejection Sampling (VRS), was introduced four months earlier by researchers at the University of Tübingen to address a different problem — LLM sampling bias in generating faithful samples from known distributions — using a natural-language accept-reject mechanism rather than distribution verbalization.[^c7]

The study of mode collapse in language models has expanded beyond typicality bias to encompass multiple independent mechanisms. Preference collapse is driven by KL-regularization bias in the RLHF objective, where minority preferences are algorithmically suppressed. Self-recognition effects cause post-trained models to sharply reduce output entropy when generating their own responses. Token-level entropy imbalance in reinforcement learning with verifiable rewards (RLVR) produces premature determinism during training. At the representation level, geometric collapse confines the model's internal trajectory to low-dimensional regions of its state space during generation. At the input level, emergent retokenization symmetry reveals that alternative token segmentations of the same prompt can unlock outputs that conventional sampling does not find.[^c3][^c4][^c5] Empirical studies of narrative diversity confirm that frontier models converge on a generic "mean" story that common mitigation strategies such as temperature scaling and negative prompting fail to address. New methods for measuring diversity, such as the Decan metric, show that mode collapse is quantifiable with a single forward pass and progresses monotonically through each alignment stage. Training-side approaches such as ReDiPO demonstrate that base-model diversity can be reintroduced through carefully constructed preference data without modifying the optimization objective. Across all these lines of inquiry, a converging theme is the need to move beyond single-point estimates — whether of outputs, uncertainty, or preferences — toward distribution-aware methods that better capture the full range of what language models can produce.

The verbalized interface paradigm has also been extended beyond diversity recovery. Verbalized Action Masking (VAM) applies the same concept to reinforcement learning post-training, verbalizing action constraints in the prompt to guide exploration in domains such as chess. The paradigm's generality — covering distribution elicitation, sampling-bias correction, and action-space constraint — suggests that natural language can serve as a versatile interface for controlling LLM behavior at inference time, complementing weight-based and logit-based approaches.

Uncertainty quantification (UQ) for LLMs addresses the challenge of assessing output reliability, with methods spanning verbalized confidence, sampling-based consistency, and hybrid approaches. A systematic taxonomy categorizes black-box UE methods into five families — verbalization-based, sampling-based, explanation-based, multi-agent, and hybrid — with hybrid methods that combine multiple signals performing best overall. Large-scale benchmarks reveal that verbalized confidence can be systematically biased in reasoning-demanding tasks, while answer consistency across samples yields the most reliable calibration in long-form question answering.[^c6]