Total time: 2:50
0.35 [0.00]
- I am FM, SRS at Sony AI here in Barcelona
- In my talk this morning I proposed a new onto-epistemology of interpretability, which we then deployed to analyse chain-of-thoughts following one of the provocations of this workshop.
- Other scholars have already proved how CoT provides a false sense of intelligibility, as the human-legible reasoning does not necessarily correspond to the model’s internal logic.
- In our short paper we identified three other possible epistemic fallacies embedded in CoT
0.20 [0.35]
- First, CoT is about bridging human and machinic abstractions. However, human might not have epistemic access to features that are relevant for machines, and their abstractions cannot be measured against each other.
- CoT impose human reasoning on non-human processes, and the epistemic validity of this mapping is debatable.
0.55 [0.55]
- Second, in CoT, explanations do not require any human input in the epistemic explanatory process.
- In the paper I presented this morning, we explained how only partial and situated explanations can ever be generated, and that these depend, among other things, on the observer.
- We challenged the assumptions that some hidden explananda exist within AI models that can be recovered by an observer who is external to the interpretative process. .
- Explanations are not inherent within the model but emerge through sustained engagement and therefore depend on the human interpreter. From a semiotic angle, AI models might produce the sign, but can't have direct access to the reference as they need active interpretation from human users
- So, any automatic interpretation, which, by definition, excludes humans from the loop, might be a faulty epistemic object.
0.30 [1.50]
- Third, CoT relies on language as a vehicle for making the model intelligible. Language, however does not simply report a latent process but reorganises what can count as a reason by imposing human semantic norms (like coherence and intentionality) onto traces that were never produced under those norms.
- Therefore, CoT might be epistemically flawed: the trace reads as if it carried a stable reference to internal mechanisms, while in fact it might not.
0.30 [2.20]
- CoT outputs are fictional narratives decoupled from any objective referent. They sustain an illusion of understanding without providing any access to model behaviour.
- One major consequence is that CoT encourages users to treat narrative plausible coherence as evidence that inner machine working can be
- unequivocally understood without human input and
- correctly reported in human language.