Canonical Naming

Canonical naming is a way for AI systems, particularly in graph-based Retrieval-Augmented Generation (RAG), to distinguish between similar but fundamentally different concepts. For example, in “An apple a day keeps the doctor away” vs. “Apple just released a new MacBook Air”, the word Apple means entirely different things. Canonical names help AI understand these distinctions by defining semantic boundaries in the vector space, grouping related concepts under a single entity while keeping unrelated meanings separate.

The specificity of these boundaries is user-defined, meaning a system can treat “Apple company,” “Apple iPhone,” and “Apple MacBook” as either part of the same entity or as separate entities based on how fine-grained the naming is. A broad canonical name might group all Apple products together, while a narrow one might separate them into MacBooks, iPhones, and Vision Pro. This flexibility allows metadata and attributes like speed, simplicity, and price to be attached at the appropriate level—company-wide, product-wide, or model-specific.

In short, canonical naming is how AI system creators define entity boundaries, deciding how generalized or specific a concept should be, which in turn shapes how the system categorizes, retrieves, and processes information.

Canonical Naming

Recent Posts

Building Your First Evaluation System for LLM Applications

Rethinking PERT, Introducing Wagn Project Tracker

Work Breakdown Structure (WBS)

Estimation Uncertainty and Quantifying Risk

PERT - Program Evaluation and Review Technique