Can You Trust What I Think? Analyzing and Improving Verbalized Uncertainty and Factuality in Reasoning-Based Large Language Models

Abstract

Reasoning-based large language models often produce natural-language thinking traces with their answers, but it remains unclear whether the verbalized uncertainties expressed in thinking traces faithfully reflect model’s knowledge. We study this question on long-form, knowledge-intensive biography generation. Our pipeline decomposes thinking traces and responses into atomic facts, filters out planning content, labels factual reasoning by certainty, and aligns response facts to their supporting reasoning, enabling plan-based filtering, self-verification, and a classifier that predicts factuality from facts and associated reasoning. Preliminary results suggest that high-certainty reasoning is more likely to be included and correct and that structured use of these signals can improve factuality, though broader validation across models and dataset will be needed.

Published in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2026), 40(48), 41534-41536. Presented in Singapore, January 2026.

Recommended citation: Xu, Tianruo Rose. (2026). Can You Trust What I Think? Analyzing and Improving Verbalized Uncertainty and Factuality in Reasoning-Based Large Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 40(48), 41534-41536. https://doi.org/10.1609/aaai.v40i48.42331