Sy.Lexical — Periodic Table of AI

Element 15

Sy . Lexical

Encodes words as abstract symbolic units, independent of meaning.

What It Does

Symbol.Lexical neurons encode words as tokens — discrete symbolic units with identity, not meaning. They activate on the formal properties of words: their token boundaries, their position in the vocabulary, their surface form. They are distinct from Identity or Relation neurons that encode what words mean; Lexical neurons encode the fact that something is a word-symbol at all.

How It Behaves

Lexical neurons are most active in early layers, where the model is processing the raw token stream. They are among the neurons most strongly affected by tokenization: a word split across multiple tokens activates Lexical neurons differently than the same word as a single token. They are essential infrastructure for all downstream semantic processing — you cannot process meaning without first having a stable token representation.

Research Example

In GPT-2 Small, Symbol.Lexical neurons fire equally strongly on rare technical vocabulary ('photosynthesis') and common words ('the') when encountered as single tokens, while showing weaker activation on multi-token representations of the same words ('photo-syn-the-sis' split). This token-boundary sensitivity explains why tokenization choices affect model performance on specialized vocabulary.

Other Symbol Elements

16 Markup Encodes HTML, XML, Markdown, and formatting syntax. 17 Code Encodes programming language constructs and syntax. 18 Morphology Encodes word structure — prefixes, suffixes, and word formation. 19 Letter Encodes individual characters and alphabetical sequences. 20 Operator Encodes logical, mathematical, and relational operators. 21 General Symbolic neurons not fitting a specific sub-type.