Agentic Engineering is not vibe coding with a cleaner title. It is the operating model that turns AI-generated code from a lucky sprint into a reviewable system: scoped context, explicit plans, small changes, security gates, and proof before merge.
That distinction matters because the phrase "vibe coding" was always useful as a warning label. Andrej Karpathy introduced the term in February 2025 for a style where the developer can almost forget the code exists. That is fun for experiments. It is reckless as an operating model for software that touches customers, money, security, or a production data path.
A May 17, 2026 David Ondrej episode, Stop Vibe Coding, Start Agentic Engineering – Micky, sharpened the split. The guest describes a workflow where AI writes most of the code, but the human still designs the harness: source-of-truth context, small PRs, review loops, package rules, and explicit handoff. The useful lesson is not the 95% AI-generated-code claim. The useful lesson is the discipline around the agent.
Why the slogan is not enough
Vibe coding optimizes for the first working demo. Agentic Engineering optimizes for the second month of maintenance.
That sounds less glamorous, but it is where the value is. A prompt can create a login screen, a dashboard, or a CRUD flow. The hard part starts when the code has to survive schema drift, weird user input, permissions, deployment, tests, feature flags, rollback, and another engineer touching it two months later.
The old mistake was to compare human typing speed against model typing speed. That comparison is finished. Models can generate code faster. The new question is whether a team can direct that generation inside a system that makes bad changes visible before they become expensive.
Agentic Engineering keeps the human in the role that matters: intent, scope, architecture, policy, and acceptance criteria. The agent becomes a fast implementation worker, not the owner of the problem. That is the same operating logic behind 5 Claude Skills for Structured AI Development: the skill is not magic, the repeatable loop is the product.
A better definition is simple: Agentic Engineering is software delivery where AI agents do substantial implementation work inside human-designed constraints, evidence gates, and review loops.
The Agentic Engineering operating model
The operating model has six parts.
First, the task is decomposed before the agent writes. A good prompt is not "build the feature." It is a scoped change with the files, interfaces, data assumptions, and out-of-scope boundaries made visible. If the task cannot fit into a small PR, the task is too large for one agent pass.
Second, the codebase remains the source of truth. Documentation helps, but the real contract is the code that compiles, the tests that run, the routes that exist, and the schemas that production uses. The Ondrej episode spends real time on giving the agent source code from the packages and repositories it must work with. That is the right instinct: agents improve when they read the actual system instead of guessing from stale docs.
Third, the harness decides what tools the agent can use. A model without tools predicts text. A coding agent inside a harness can read files, search symbols, run tests, open browsers, call APIs, and create diffs. That is power, so it needs boundaries.
Fourth, the plan becomes an accountability object. Plans are not paperwork. They let the human inspect whether the agent understood the size of the change. If the plan implies a 9,000-line PR, the plan failed. Split it before implementation.
Fifth, review happens as a loop, not a ceremony. The best teams run tests, static checks, second-agent critique, and human review as a sequence. Our piece on Reviewmaxxing for Agent PRs goes deeper on this: token volume is not the advantage; review quality is.
Sixth, handoff is designed. When one agent or session stops, the next operator needs a concise state summary, known failures, changed files, commands run, and open decisions. That is why agent runtimes are starting to look like operating systems, a pattern we covered in Hermes v0.14: Agent Runtimes Become Operating Systems.
Context budgets beat giant prompts
Context engineering is the quiet skill behind Agentic Engineering.
A giant context window does not mean every task should consume a giant context window. The larger the prompt becomes, the more the agent has to separate signal from noise. The practical move is to give the agent the exact code, contract, error, and acceptance test it needs for the next step.
This is where production teams should be stricter than demo builders. They should maintain small working sets: one feature slice, one service layer, one interface, one migration, one failing test, one review target. Large tasks should become a sequence of small agent runs with clean handoff between them.
OpenAI's May 8, 2026 write-up on running Codex safely makes the same production point from another angle. Sandboxes, approvals, managed network access, identity, rules, and telemetry exist because a capable coding agent is not just a text generator. It is an actor inside a development environment.
The May 14, 2026 Codex mobile update adds a second signal. OpenAI describes live state, approvals, diffs, terminal output, test results, Remote SSH, hooks, scoped access tokens, and secure relay infrastructure in Work with Codex from anywhere. Those are not vibe features. They are control-plane features.
Agentic Engineering starts when teams treat context, tools, and approvals as architecture.
Review loops are the real productivity multiplier
The productivity story is not "AI writes 95% of the code." The productivity story is "the team can safely review more useful changes per week."
A bad agent workflow creates a bigger review burden. It floods the branch with plausible code, hides mistakes in a large diff, and leaves humans to reverse-engineer intent. A good Agentic Engineering workflow reduces review burden by making intent and evidence visible from the start.
That means every agent task should produce at least four artifacts: the plan, the diff, the validation evidence, and the remaining risks. If one is missing, the task is not done.
The validation evidence can be tests, type checks, lint, route checks, browser screenshots, security scan output, or a small manual acceptance note. The exact evidence depends on the feature, but the rule is universal: no proof, no merge.
This is why deterministic workflow shells matter. In Archon Workflow Marketplace: Deterministic AI Coding at Scale, the useful idea is not another agent prompt. It is the workflow shape: plan, implement, validate, review, approve, and only then create the PR. Determinism gives AI speed a safety rail.
For teams adopting Agentic Engineering, the first metric should not be lines of code generated. Track PR size, review findings, rollback rate, test pass rate, time-to-fix after review, and how often an agent finishes with clear evidence. Those metrics expose whether the workflow is improving engineering, not just producing more text.
Security rules belong in the workflow
The Ondrej episode includes a practical security rule: do not let agents install packages younger than 14 days. Treat that as a policy pattern, not a universal law.
The underlying idea is sound. AI agents are excellent at grabbing a dependency that appears to solve the problem. That creates supply-chain risk when the package is new, obscure, typo-squatted, or not maintained. A human might pause. An agent may optimize for completion unless the harness blocks it.
Agentic Engineering turns that pause into policy. Package-age rules, allowlists, denied domains, read-only modes, secret scanning, dependency review, and approval gates should live inside the workflow. They should not depend on someone remembering to be careful at 23:00.
That is also the core lesson from Security Harnesses, Not Vibes: Vercel deepsec. Security review works when it is repeatable: scan, investigate, revalidate, enrich with ownership context, and export a fix path. Agentic Engineering applies the same pattern to everyday feature work.
A simple starter policy works well:
- agents may not install new packages without an explicit approval note;
- packages below a defined age threshold require human review;
- shell commands that touch secrets, production data, auth, billing, or deployment require approval;
- network access is allowlisted where possible;
- every external write is logged and reversible;
- the final answer must list changed files, checks run, and unresolved risks.
Those rules do not slow good agents down. They stop weak agent runs from becoming incidents.
A practical checklist for teams
Teams do not need a huge platform to start practicing Agentic Engineering. They need a small repeatable loop.
Start with the task brief. Write the user outcome, affected surfaces, constraints, non-goals, and acceptance tests. If the brief takes five minutes, it will usually save an hour of cleanup.
Then define the context pack. Include the relevant files, source contracts, API references, data shapes, and failing output. Exclude unrelated docs and old conversations. The goal is not maximum context. The goal is minimum sufficient context.
Next, ask for a plan before code. Review the plan for scope creep, hidden migrations, broad refactors, and missing tests. If the plan is too large, split it. This is where human judgment compounds.
After implementation, require evidence. The agent should run the smallest meaningful checks and report the result. If the check cannot run, the agent should say why and name the fallback evidence.
Then run review as a loop. Use a second agent, static tools, and a human reviewer. Ask for risk, not praise. A useful reviewer says what could break, not just what changed.
Finally, capture handoff. The next person should know the objective, changed files, decisions made, commands run, checks passed, checks skipped, and remaining risks. Without handoff, the team loses the compounding benefit of agent work.
That checklist is the practical difference between vibe coding and Agentic Engineering. Vibe coding asks whether the app appears to work. Agentic Engineering asks whether the change is scoped, evidenced, reviewable, secure, and maintainable.
FAQ
What is Agentic Engineering?
Agentic Engineering is software delivery where AI agents implement meaningful parts of the work inside human-designed constraints, review loops, and evidence gates. It combines context discipline, scoped planning, tool permissions, validation, and handoff.
The goal is not to remove engineers. The goal is to let engineers direct more implementation safely.
How is Agentic Engineering different from vibe coding?
Vibe coding optimizes for fast output from natural language prompts. Agentic Engineering optimizes for reliable delivery: small scopes, source-grounded context, test evidence, security policy, review, and maintainable handoff.
The same model can be used in both modes. The difference is the operating system around the model.
Should teams let AI agents write most of the code?
Teams can let agents write a large share of code when scope, tests, review, and rollback are strong. The percentage matters less than the control loop around the work.
A small, evidenced agent change is safer than a large human-unreviewed change. A giant agent PR without proof is just expensive cleanup.
What is a good first Agentic Engineering workflow?
Start with one bounded workflow: brief, context pack, plan, implementation, validation, second review, human approval, and handoff. Use it on low-risk internal features before expanding to customer-facing or security-sensitive work.
Measure PR size, review findings, test pass rate, and rollback rate before scaling the pattern.
What security controls should Agentic Engineering include?
At minimum, use package approval, secret scanning, network limits, protected paths, command approval, audit logs, and a final changed-files report. Add stricter rules for auth, billing, production data, and external posting.
The point is to encode caution into the workflow instead of relying on memory.
Conclusion: keep the speed, add the system
Agentic Engineering does not reject vibe coding because vibe coding is useless. It rejects the idea that a demo workflow should become a production workflow unchanged.
The best teams will keep the speed and add the system: scoped tasks, small context packs, source-grounded plans, security rules, review loops, validation evidence, and clean handoff. That is how AI coding stops being a stunt and becomes an engineering capability.
If your team is moving from AI demos to dependable software delivery, Context Studios can help you design the operating loop: agents, policies, review gates, and production workflows that ship faster without pretending risk disappeared.