Encodes words as abstract symbolic units, independent of meaning.
What It Does
Symbol.Lexical neurons encode words as tokens — discrete symbolic units with identity, not meaning. They activate on the formal properties of words: their token boundaries, their position in the vocabulary, their surface form. They are distinct from Identity or Relation neurons that encode what words mean; Lexical neurons encode the fact that something is a word-symbol at all.
How It Behaves
Lexical neurons are most active in early layers, where the model is processing the raw token stream. They are among the neurons most strongly affected by tokenization: a word split across multiple tokens activates Lexical neurons differently than the same word as a single token. They are essential infrastructure for all downstream semantic processing — you cannot process meaning without first having a stable token representation.
Research Example
In GPT-2 Small, Symbol.Lexical neurons fire equally strongly on rare technical vocabulary ('photosynthesis') and common words ('the') when encountered as single tokens, while showing weaker activation on multi-token representations of the same words ('photo-syn-the-sis' split). This token-boundary sensitivity explains why tokenization choices affect model performance on specialized vocabulary.