Verbalized Sampling

Verbalized Sampling (VS) is a training-free prompting strategy that improves the diversity of large language model outputs by asking the model to generate multiple responses with self-assigned probabilities. Introduced by researchers at Stanford University, Northeastern University, and West Virginia University in October 2025, VS addresses mode collapse — the tendency of aligned language models to produce narrow, repetitive outputs — by counteracting typicality bias embedded in human preference data[^c1].

The technique requires no fine-tuning, model modifications, or access to internal logits. Instead, a prompt instructs the model to produce several candidate responses along with numerical probabilities, enabling sampling from the full distribution rather than collapsing to a single mode[^c2]. Empirical evaluations show that VS improves creative writing diversity by 1.6–2.1x over standard prompting without compromising factual accuracy or safety[^c3]. Subsequent research has shown that mode collapse has a second contributing factor: post-trained models implicitly recognize their own on-policy generations and sharply reduce output entropy in assistant-role contexts, indicating that entropy collapse is not a uniform artifact but partly a self-recognition phenomenon[^c10].

The root cause of mode collapse identified by the VS paper is typicality bias in human preference data used for reinforcement learning from human feedback (RLHF). Human annotators systematically favor familiar, fluent, and predictable text due to cognitive biases. This bias is quantified as a weight of α = 0.57–0.65 in the reward model, demonstrating a strong annotator preference for typical responses independent of correctness[^c4].

Verbalized Sampling builds on a lineage of related research. Self-consistency decoding, introduced in 2022, samples multiple reasoning paths and aggregates them via majority voting[^c5]. Research on confidence elicitation has shown that hybrid approaches combining verbalized confidence with consistency-based methods achieve the best calibration[^c6]. Later work on difficulty-adaptive self-consistency addresses the computational costs of multi-path sampling by allocating resources based on question difficulty[^c7], while ConfTuner proposes fine-tuning methods that improve verbalized confidence calibration by up to 54.7%[^c8].

Broader research on diversity loss in alignment has identified both data-level causes (typicality bias) and algorithmic causes. Preference collapse in RLHF, resulting from KL-based regularization, causes minority preferences to be systematically disregarded[^c9]. Together, these findings indicate that mode collapse has multiple contributing factors requiring complementary mitigations.

A community-maintained Model Context Protocol server provides VS prompt generation and response processing tools for over 20 language models, enabling practical integration into existing AI tooling pipelines[^c11].