title: @phoebeyao: model confidence tracks a shared model-agnostic signal for fact recall, not true...
author: phoebeyao
contenttype: twitterarticle
published: 2026-04-01T17:14:29+00:00
source_url: https://x.com/phoebeyao/status/2039399882486861977
word_count: 116
model confidence tracks a shared model-agnostic signal for fact recall, not true self-knowledge.
we
model confidence tracks a shared model-agnostic signal for fact recall, not true self-knowledge.
we tested metacognitive confidence across 19 frontier models on a closed-book SQuAD task. f1 scores look reasonable (0.6–0.8), but confidence and accuracy are nearly uncorrelated between models.
the variance traces to a single shared difficulty heuristic learned during training. models differ only in their decision threshold. claude is cautious. gpt is eager.
shifting one steering coefficient on mistral-7b recovers any target model's confidence profile at ~80% agreement.
full breakdown + methods in the article
Across 19 frontier models, metacognitive confidence on question and answer tasks tracks a shared difficulty heuristic with only a weak relationship to actual performance.
Do models know what they...
Posted: 2026-04-01T17:14:29.000Z
Engagement: 0 likes, 2 retweets, 1 replies