Encodes semantic ambiguity and multiple valid interpretations.
What It Does
Entropy.Ambiguity neurons activate when the model encounters or generates genuinely ambiguous content where multiple valid interpretations are possible: lexical ambiguity ('bank' as financial institution or riverbank), syntactic ambiguity ('I saw the man with the telescope'), pragmatic ambiguity (statements that could be sincere or sarcastic), and referential ambiguity (unclear pronoun antecedents). They represent the model's detection of interpretive uncertainty.
How It Behaves
Ambiguity neurons concentrate in the middle layers, where the model is attempting to integrate multiple possible interpretations of ambiguous input before committing to one. They interact with Boolean.Polarity neurons (sarcasm involves polarity ambiguity) and Relation.Reference neurons (anaphoric ambiguity involves reference uncertainty). A key diagnostic finding: on inputs where a human expert would expect ambiguity, models that fail to activate Ambiguity neurons tend to commit to one interpretation without signaling uncertainty — a form of overconfident disambiguation that produces systematic errors.
Research Example
In Llama 3.1 8B, Entropy.Ambiguity neurons fire strongly on 'the bank was steep' — firing because 'bank' is ambiguous between financial and geographical meanings — but immediately deactivate once surrounding context disambiguates (adding 'the hikers climbed...' suppresses Ambiguity neurons as 'bank' resolves to the geographical meaning). This disambiguation process is trackable through Ambiguity neuron firing: they fire while ambiguity exists and quiet as it resolves.