Canonical Naming

Avi Santoso

Canonical naming is a way for AI systems, particularly in graph-based Retrieval-Augmented Generation (RAG), to distinguish between similar but fundamentally different concepts. For example, in “An apple a day keeps the doctor away” vs. “Apple just released a new MacBook Air”, the word Apple means entirely different things. Canonical names help AI understand these distinctions by defining semantic boundaries in the vector space, grouping related concepts under a single entity while keeping unrelated meanings separate.

The specificity of these boundaries is user-defined, meaning a system can treat “Apple company,” “Apple iPhone,” and “Apple MacBook” as either part of the same entity or as separate entities based on how fine-grained the naming is. A broad canonical name might group all Apple products together, while a narrow one might separate them into MacBooks, iPhones, and Vision Pro. This flexibility allows metadata and attributes like speed, simplicity, and price to be attached at the appropriate level—company-wide, product-wide, or model-specific.

In short, canonical naming is how AI system creators define entity boundaries, deciding how generalized or specific a concept should be, which in turn shapes how the system categorizes, retrieves, and processes information.