AI Knowledge Base 2026

AI Glossary 2026

Clear definitions for the era of Agentic AI and Spatial Intelligence.

Agentic Business

Managed Agents

Managed Agents are AI agents deployed and operated through a managed infrastructure platform, where the provider handles hosting, scaling, monitoring, and operational continuity — rather than the developer building and maintaining their own infrastructure stack. The concept gained mainstream attention when Anthropic launched Claude Managed Agents in April 2026, allowing developers to run Claude-powered agents without managing servers. A managed agent platform typically provides automatic scaling for variable workloads, built-in logging and distributed tracing, Role-Based Access Control (RBAC) for enterprise governance, and OpenTelemetry integration for security monitoring and SIEM pipelines. Managed agents represent a maturation of the AI agent space: from proof-of-concept experiments running locally to production-grade systems embedded in enterprise workflows. This shift reduces the DevOps expertise required to ship agents, enabling non-engineering teams — operations, finance, marketing, legal — to own and operate their own AI workflows. The managed layer also introduces governance controls such as group spend limits and audit trails that make AI agents compliant with enterprise security requirements.

Explore Concept
Trust & Sovereignty

MCP Authorization

MCP authorization is the control layer that decides which tools, data sources and actions an MCP client may use through an MCP server. The Model Context Protocol is powerful because it gives AI systems a standard way to reach files, databases, APIs and internal workflows. That same power becomes risky when authorization is vague: an agent may discover a tool, but the system still has to know which user it is acting for, which permissions apply, how long access lasts and whether the requested action is allowed in that context. Strong MCP authorization separates identity, consent, scope and runtime enforcement. It can use OAuth, short-lived tokens, tenant-aware roles, per-tool scopes and server-side approval checks, but the important part is where the decision lives. It should not be hidden in a prompt or left to model judgment; it needs to be enforced by protocol, infrastructure and logs. In production agent systems, MCP authorization turns natural-language requests into bounded system actions. The agent can still get work done, but it cannot freely cross into sensitive data, privileged APIs or destructive operations just because a user phrased a request convincingly.

Explore Concept
AI Safety & Guardrails

Mechanistic Interpretability

Mechanistic interpretability is a field of AI safety research that reverse-engineers the internal computations of neural networks. Where conventional explainability only relates a model's inputs to its outputs, mechanistic interpretability opens up the model itself, identifying the individual circuits, features, and activation patterns that produce a given answer. The goal is not to observe what a model says, but to understand the mechanisms inside it that generate that behaviour. In practice, the field draws on techniques such as analysing activations, isolating interpretable features with sparse autoencoders, and intervening directly on individual components to test what each one does. This yields a causal account of model behaviour rather than a merely correlational one, letting researchers point to the specific internal structure responsible for an output. The discipline matters most wherever trust, safety, and accountability are at stake. It makes it possible to surface hidden misaligned incentives, deceptive behaviour, or unexpected capabilities before a model is deployed in production. As systems grow more capable and more autonomous, the ability to inspect their inner workings shifts from a research curiosity to a core requirement of responsible AI development.

Explore Concept
Agentic Infrastructure

Model Efficiency

Model Efficiency describes how much useful quality an AI model delivers per unit of compute, tokens, time, and budget. It is not simply about choosing the smallest or cheapest model; it is about choosing the most efficient model for a specific job: one that reliably clears the quality bar without unnecessary inference spend, latency, or context-window usage. In production AI systems, model efficiency is measured across several signals: answer quality, error rate, latency, tokens per task, cost per accepted outcome, energy or GPU consumption, and stability under load. A highly efficient model may outperform a frontier model for routine classification, research preparation, summarization, or drafting because it achieves the required result with fewer resources. For critical architecture decisions, legal-risk analysis, or complex code review, a stronger model may still be the efficient choice because failure is more expensive than compute. The concept is closely related to model routing, inference optimization, and model-selection policy, but it names the evaluation standard behind those decisions. For businesses, model efficiency becomes essential once AI moves from experiments into repeatable workflows: it reveals where quality is being overpaid for and where leaner models can deliver the same business value.

Explore Concept
Inference & Engineering

Model Migration

Model migration is the planned move from one AI model or model version to another — for example when a provider retires an existing model, a stronger version ships, or cost, latency, or compliance requirements change. Unlike an automatic fallback that only kicks in during an outage, migration is a deliberately orchestrated project with a test phase, side-by-side measurement, and a fixed cut-over date. A typical migration starts by inventorying every place the old model is called, then evaluates the new model in parallel against real prompts and quality criteria, adjusts system prompts and parameters, and finally switches over in a controlled way — often gradually through feature flags or a canary share of traffic. Because models behave differently, swapping the model name is rarely enough on its own: tone, formatting, tool calls, and the cost profile all have to be re-verified before and after the change. A well-planned migration keeps deprecation deadlines from turning into frantic last-minute scrambles and ensures an application's quality and behavior stay stable across the switch.

Explore Concept
Agentic Infrastructure

Model Quality Drift

Model Quality Drift is the measurable decline in AI output quality during real-world operation. A system that performed well at launch can produce weaker results weeks or months later, even when serving the same use case. Common causes include shifts in input data, changing user behavior, prompt template updates, toolchain changes, or upstream model updates from providers. In production, drift often appears first as higher correction effort, more hallucinations, lower classification accuracy, or slower completion in agent workflows. The key point is that drift is not a one-off bug; it is an ongoing operational risk. That is why teams need continuous quality control with explicit metrics such as task success rate, error rate, response consistency, and process-level business KPIs. Mature teams combine offline evaluations on fixed benchmark sets with online monitoring in live traffic. When quality drops beyond defined thresholds, they trigger mitigations such as prompt rollback, guardrail tuning, model routing changes, or targeted fine-tuning. This keeps AI performance governable over time instead of relying on luck.

Explore Concept
Agentic Infrastructure

Model Routing

Model routing is the practice of automatically directing incoming requests or tasks to the most appropriate AI model based on task type, required quality, cost constraints, and latency requirements. In modern AI agent stacks, there is no longer a single model at the center — instead, an ensemble of frontier models, open-source alternatives, and specialized systems work in concert, with model routing determining which model handles which request. Typical routing strategies include: task-based routing (complex reasoning tasks go to powerful frontier models such as Claude Opus or GPT-5.5, while simpler classification or summarization tasks go to smaller, cheaper models), cost-based routing (requests below a complexity threshold are automatically redirected to lower-cost open-source models such as DeepSeek V4 or Llama 4), latency-aware routing (time-sensitive requests are sent to models with the lowest response-time profile), and fallback routing (when a primary model fails or is overloaded, a backup model automatically takes over without interrupting the workflow). In AI agent architectures like OpenClaw, model routing is a critical infrastructure component: it creates the flexibility to optimally balance performance and cost across different models while maintaining provider independence.

Explore Concept
Agentic Business

Multi-Agent System

A multi-agent system is an AI architecture in which several specialized agents work together on one goal. Instead of asking one model to plan, research, execute, check and report every step, the system splits work across roles: a planner decomposes the task, research agents gather context, coding or data agents take action, and reviewer agents validate the result. The key feature is not simply having many agents; it is the coordination layer between them. That layer defines task handoffs, shared state, tool permissions, failure handling, cost controls and stop conditions. Multi-agent systems become useful when a workflow is too complex for a single prompt or a linear automation. They can run work in parallel, route steps to different models based on capability or price, and cross-check outputs before humans see them. In production, however, they need a disciplined runtime with logging, observability, permissions and human approval points. Without those controls, a multi-agent setup can quickly become expensive, hard to debug and operationally unsafe.

Explore Concept
Agentic Infrastructure

Mixture-of-Experts (MoE)

Mixture-of-Experts (MoE) is a neural network architecture in which a model consists of multiple specialized sub-networks called experts, paired with a learned gating mechanism that dynamically routes each input token to the most relevant subset of those experts. Rather than activating all parameters for every token, a MoE model selects only a small number of experts per forward pass — typically two to eight out of dozens — dramatically reducing active compute while preserving or even increasing overall model capacity. Google Brain popularized this design with the Switch Transformer, and Mistral AI brought it to the open-source community with Mixtral 8x7B and Mixtral 8x22B. Today, GPT-4, Gemini 1.5 Pro, DeepSeek V3, and GLM-5 all rely on MoE architectures. MoE enables scaling total parameter counts to hundreds of billions or even trillions without a proportional rise in inference cost: a 700B-parameter MoE model may activate only 40 to 70 billion parameters per token, matching the serving economics of a far smaller dense model. The key tradeoff is memory: all expert weights must reside in VRAM or RAM during inference even if only a fraction are used, and routing complexity requires careful load-balancing engineering. MoE is now a foundational pattern in frontier AI, enabling the knowledge capacity of a massive model at a cost structure closer to a compact one. Anthropic, Google DeepMind, Meta, and Zhipu AI all invest heavily in MoE research. At Context Studios, understanding MoE is essential when advising clients on GPU infrastructure for self-hosted deployments, since active and total parameter counts diverge significantly.

Explore Concept
Agentic Business

Multi-Agent Communication

Multi-agent communication encompasses the protocols, mechanisms, and patterns through which multiple AI agents interact, exchange information, and coordinate tasks. In complex AI systems, specialized agents frequently collaborate: an orchestrator coordinates sub-agents for research, writing, quality checking, and publishing. Dominant communication models: direct orchestration (a parent agent invokes sub-agents and integrates outputs), MCP (Model Context Protocol) from Anthropic as a standardized tool-call protocol between agents and external services, A2A (Agent-to-Agent Protocol) from Google as an open standard for peer-to-peer agent communication, and message queue-based systems for asynchronous communication. Critical design decisions: synchronous vs. asynchronous (synchronous is simpler, asynchronous scales better); push vs. pull; error handling (what happens when a sub-agent fails or times out?); state management (how is shared context kept consistent across agent boundaries?). Every agent-to-agent interface must be explicitly specified, versioned, and tested independently. Real-world example: a content creation multi-agent system consists of a Research Agent (fetches current data via MCP), Writing Agent (receives research output, generates draft), Quality Agent (checks draft against editorial rules), and Publishing Agent. Without clear communication contracts, multi-agent systems become brittle and difficult to debug.

Explore Concept
Reasoning & Reliability

Multimodal AI

Multimodal AI refers to artificial intelligence systems capable of processing, understanding, and generating information across multiple data modalities — including text, images, audio, video, and structured data — within a single unified model. Unlike unimodal systems specialized for one data type, multimodal AI models can reason across modalities simultaneously: describing an image, answering questions about a video, transcribing and analyzing speech, or generating images from text descriptions. The transformer architecture, pioneered by Google Brain and later refined by OpenAI, DeepMind, and Anthropic, proved to be a natural fit for multimodal learning through attention mechanisms that operate uniformly over diverse token sequences. Landmark multimodal models include OpenAI's GPT-4V and GPT-4o, Google DeepMind's Gemini 1.5 and 2.0, Anthropic's Claude 3 family, and Meta's Llama 3.2 Vision. ByteDance's Seedance 2.0 represents multimodal AI applied to video generation, accepting both text and image inputs. The practical applications of multimodal AI span healthcare (analyzing medical images and clinical notes together), manufacturing (combining sensor data with visual inspection), retail (product search by image), and media (automatic video captioning and scene understanding). Multimodal AI is rapidly becoming the default paradigm for foundation models, as real-world intelligence inherently spans multiple senses and data streams. At Context Studios, we deploy multimodal AI in client applications ranging from document intelligence pipelines that process both text and embedded images to product visualization tools that combine customer descriptions with generated imagery.

Explore Concept