Published on 08/09/2025
Token confidence: token probability (from logprobs) used as a reliability proxy [1].
Group confidence: aggregated confidence over a sliding window of adjacent tokens [1].
Tail/lowest‑group: tail statistics or minimum‑group confidence for a trace [1].
Top‑x% filter: keep only traces within the desired confidence quantile [1].
DeepConf is a test‑time method that scores reasoning quality via internal confidence signals, discarding weak paths early and focusing budget on promising ones [1]. In multi‑trace settings (e.g., self‑consistency), this yields stronger decisions with fewer tokens [1].
Confidence is computed per token from model logprobs and aggregated over sliding windows to obtain more stable, local group confidence [1]. Statistics such as bottom‑10% groups, tail confidence, and lowest‑group confidence capture bottlenecks in the trace [1].
Offline: generate multiple full traces, score them by confidence, and apply confidence‑weighted majority voting [1]. Online: during generation, apply sliding‑window confidence filtering and early‑stop weak traces to save tokens [1].
On AIME 2025, DeepConf@512 achieves up to 99.9% accuracy and up to −84.7% generated tokens relative to standard parallel thinking at equal budget [1]. Other evaluated tasks show similar trends of large token savings with controlled accuracy trade‑offs when increasing filter strength [1].
Method | Budget K | Token (×10^8) | Accuracy % | Notes |
---|---|---|---|---|
DeepConf‑low (top‑10%) | 512 | — | 99.9 | AIME; ↓84.7% tokens vs standard [1] |
DeepConf‑high (top‑90%) | 512 | — | ~99–100 | Higher coverage; smaller savings [1] |
Majority Voting | 512 | — | ≤99.9 | No filtering; higher cost [1] |
logprobs
to derive per‑token confidence [1].τ
[1].enable_logprobs
[1].Logprob‑based confidence can be miscalibrated for some models/domains; future work includes calibration strategies and studying how optimal windowing and tail statistics generalize across tasks [1].
[1] Deep Think with Confidence (DeepConf), arXiv:2508.15260 (v1), 21 Aug 2025.