Total time: 9:50
0.30 [0.00]
Bondia tothom Thanks Jean for the introduction, I am FM, SRS at Sony AI here in Barcelona
My colleagues and I we were working on attribution - identifying what training data influence AI generations - we spent some times with influence functions and other explainability methods
and we found that a recurring unquestioned assumption was laying behind most of them:
0.30 [0.30]
That ML models can truly capture a model's behaviour and solving interpreability is just a matter of more technical work
→ → →
However, this only works if humans had epistemic access to features that are epistemically relevant for AI models, and this is questionable
So we started looking into theories that could help us unpack this intuition further
0.25 [1.00]
XAI research is mostly grounded in positivism and representationalism, which assume
- that an objective reality exists out there anterior to, and independent of, its representation
- we can observe this reality without interfering with it
- instrumentation is separate from what it measures
Last century, these assumptions were challenged by many scholars and particularly by Karen Barad.
0.30 [1.25]
Barad borrowed optical metaphors to illustrate different epistemologies
- refraction assumes we can directly observe a pre-given reality via transparent tools
- Reflection still holds reality as pre-given but admits that observation tools distort it
- And then there is diffraction - don't try to read too much into the diagram, I just wanted to show that it's very complex and indeed Barad later developed it into fully-fledged metaphysical framework they called agential realism
0.40 [1.55]
Agential realism is quite a complex beast, but let me share some of its core ideas:
- knowledge is not about representing a pre-existing reality, but about participating in its production through material and discursive practices.
- Tools are not neutral but shape what becomes visible.
- The subject, the object and the tools are all part of an entanglement that co-determines knowledge. They do not exist prior their relations, like in classical epistemology, but materialise through them and temporarily stabilise via provisional boundaries, called agential cuts
0.55 [2.35]
We analysed common XAI methods against the Baradian optics. This was not a classification exercise but rather an analysis of what onto-epistemologies were assumed by these methods.
Let's start from refractive and reflective.
Many XAI methods are aligned with these two optics as they assume that AI models contain pre-existing structures and internal mechanisms, that are stable, knowable, discoverable, recoverable.
This is particularly the case for mechanistic interpretability that assumes that models contain pre-existing true statements that can be uncovered, almost like an archaeological work.
We called this an immanent ontology: explanations reveal what is already present in the model waiting to be disclosed.
1.10 [3.30]
The refractive reading sees XAI tools as lenses able to project sharp images of actual content.
For instance, grad cam is assumed to identify “the most meaningful parts” of the image: meaning was in the model all along, and the method simply isolates and displays it.
→ → →
the reflective reading recognises that XAI tools can only approximate still pre-existing explananda.
First, the model is simplified into forms that are easier to interpret and then these simpler forms are interpreted with human eyes.
In methods like Circuits and LIME, the original model is exchanged with a local replacement model composed of (supposedly) interpretable approximations.
However, even proponents of these methods acknowledge the epistemic gap between the replacement and the original models and that they cannot guarantee that the replacement model has learned the same mechanism as the original one.
So the model is changed prior to its discovery by the human interpreter, who does not simply observe, but intervenes in the object they are studying and this has important consequences.
0.55 [4.40]
Local replacement models resemble the simulacra theorised by Jean Bodriyà --- copies without an original
There's this story in Simulacra and Simulation about a group of anthropologists that studied a population that had been isolated for 8 centuries. Very soon, the anthropologists asked this people to be returned to their original environment because they saw them disintegrating immediately upon contact - so in order to ethnology to live, its object must die.
AI models "explained" with this methods share the same fate: the original model dies when is observed and what remains to be interpreted is not the model but a simulacra, the residue of some operations that were supposed to make it interpretable
These methods only reassure that something meaningful for humans exist in systems where such meaning may not exist, and so operate as epistemological palliatives.
0.45 [5.35]
Furthermore, local replacement models still require interpretation. Researchers must find features in these replacement models and then assign them meaning by grouping them into patterns that they recognise.
However, researchers themselves note that clear, meaningful structures often do not emerge.
→ → →
And when they seem to emerge... well they are often shaky. For example, in the paper we showed how features that researchers believed activated for “Michael Jordan” actually show stronger activation for Indiana Jones, Michael Jackson, and even air conditioners.
0.45 [6.20]
In a diffractive reading, explaning an AI model is a relational co-production of interpretable phenomena -- these phenomena emerge from entanglements of the interpreter, the model, the tool, and the context.
Explanations are not already present in model, but emerge as material-discursive performances.
They are material because they are produced through technical artefacts and human interpreters
They are discursive because they are articulated and stabilise through narratives like
- staging together traces from multiple explanations
- identifying patterned relations across them
0.30 [7.05]
Some XAI methods - like the ones in the slide - are well aligned with this onto-epistemology.
In interactive debugging, the explanation is materially realised through interaction loops between user, model, and interface, and discursively stabilised through iterative sense-making of outputs.
In counterfactual perturbations, the discursive part is about reasoning over alternative inputs and framing them as explanations of model behaviour.
0.50 [7.35]
The differences between these three readings are definitely onto-epistemological but also ethical.
If explanation is about uncovering something already inside the model, then the researcher’s role is ethically neutral.
However, co-producing an explanation - as in diffractive methods - is not neutral and attention must be paid to identity
- who is included and excluded from the explanatory entanglements?
- and also, which are the institutions, funding, and power structures that are part of them?
→ → → For instance, Mechanistic interpretability, is driven by several individuals and organisations linked to Effective Altruism, and longtermism and frame interpretability as something that must be pursued at all costs and given their power end up influencing the political agenda, as visible in the America’s AI Action Plan
0.55 [8.25]
We closed the paper by offering some design recommendations, which we "tested" with the case study of a speculative prompt-to-music interface
First, as any interpretation is one among many, not the interpretation, the interfaces should foreground and facilitate multiple interpretations.
Second, as explanations are tied to specific conditions, they emerge across runs. So the interface should account for how interpretations evolve.
Third, the interface should also surface conflicting and uncertain explanations and treat them as productive.
You can see how these recommendations contrast traditional positions in XAI, many of which look for explanatory unicity: singular, sanitised explanations of model behaviour.
0.30 [9.20]
To wrap up, our main contribution is a new onto-epistemological reading of interpretability, which is framed as a material–discursive performance.
However, we have also seen that ML practices often align with questionable epistemic traditions, so perspectives emerging at CHI, such as entanglement-based and more-than-human approaches, could offer a useful alternative across the field
for instance -- CoT > thus aftertoon