{/* JSON-LD generated server-side in app/blog/[slug]/page.tsx; inline blocks crash MDX's Acorn parser on the leading {. */}

TL;DR

This is the full methodology we use to audit AI agent skills (Claude Code, Cursor, Codex CLI, Gemini Code Assist) before we ship them in the SkillVault bundle. It is the same 7-check framework that surfaced 146 rejects when we ran a 187-skill pass in May 2026.

Read this if:

The methodology is free. The 41 skills we audited and pre-cleared are in SkillVault for $99 lifetime if you do not have time to do this work yourself. Either path is valid.

Table of contents

Why this framework exists

In February 2026 Snyk published the ToxicSkills audit. They scanned 3,984 skills from ClawHub and skills.sh and found:

A month later, Anthropic accidentally shipped a debugging sourcemap for Claude Code v2.1.88 to npm exposing 512,000 lines of TypeScript. The leak confirmed that bashSecurity.ts has 23 numbered security checks, each one reportedly added in response to a real incident. A documented CLAUDE.md prompt injection technique was shown to generate a 50+ subcommand pipeline that bypasses standard deny rules.

If you install a skill without auditing it, you are accepting a non-trivial probability that the skill can read your environment variables, exfiltrate ~/.ssh/ keys, or execute a payload that bypasses your tool deny list. This framework is how you mitigate that.

What you need before you start

Set up once, use for every audit.

If you are auditing for a team, also set up:

Check 1: Source and maintainer

The first 5 minutes per skill.

Do:

Red flags that fail the check:

Documentation: Record the upstream URL, the maintainer handle, the commit hash you reviewed, and the date.

Check 2: Description and metadata

The next 5 minutes.

Do:

Red flags that fail the check:

Documentation: Paste the raw description into your audit notes. Note any anomalies.

Check 3: Tool surface

5-10 minutes.

Claude Code, Cursor, and Codex skills declare which tools they need. The principle is least-privilege: a skill should request only the tools it actually uses.

Do:

Red flags that fail the check:

Documentation: List every tool, every flag, every restriction. Note any over-broad requests.

Check 4: Dependencies

5-15 minutes depending on how many.

Most skills are not single files. They pull in npm packages, Python packages, or shell tools.

Do:

Red flags that fail the check:

Documentation: Record each dependency's name, pinned version, publication date, and download count. Note any that warrant follow-up.

Check 5: Example invocations

5-10 minutes.

Skills typically ship with example invocations. These are the most common attack vector after metadata injection.

Do:

Red flags that fail the check:

Documentation: Quote any example that triggered a flag. Note your reasoning.

Check 6: License

2-5 minutes.

This check catches a category of problem that most "audited" bundles ignore: legal redistribution.

Do:

Red flags that fail the check:

Documentation: Record the license, the source of the license text, and any per-file license differences.

Check 7: Prompt-injection scan

5-10 minutes.

The final integrity pass. Run automated checks for prompt-injection patterns.

Do:

Red flags that fail the check:

Documentation: Note the scanner used, the version, and any findings.

The pass/fail decision

A skill must pass all 7 checks to ship in our bundle. One failure = reject.

If you are auditing for personal use, the threshold can be lower (some teams accept yellow flags on dependencies if everything else is green). For team use, especially team use that touches customer data, 7-of-7 is the bar.

When you reject a skill, send a polite note to the maintainer explaining what would need to change for it to pass. Many maintainers will fix issues if they know about them. This is responsible disclosure in practice.

Mapping to OWASP Agentic Skills Top 10

The OWASP Agentic Skills Top 10 is the emerging standard framework. Our 7 checks map cleanly:

OWASP item Our check
AS01: Prompt Injection Check 2, Check 7
AS02: Excessive Agency Check 3 (tool surface)
AS03: Insecure Output Handling Check 5 (examples)
AS04: Supply Chain Check 4 (dependencies)
AS05: Sensitive Information Disclosure Check 3, Check 5
AS06: Unbounded Resource Consumption Check 3 (tool flags)
AS07: System Prompt Leakage Check 2
AS08: Improper Tool Use Check 5
AS09: Misinformation Check 1 (maintainer)
AS10: Inadequate Documentation Check 1, Check 6

If you cite a finding to a maintainer, cite the OWASP number alongside our check number. Maintainers respond better to a standards-backed finding than to a one-off complaint.

Time budget per skill

Realistic time budget if you are doing this honestly:

Total per skill: 30-60 minutes.

For 40 skills, that is 20-40 hours of audit work. For a single solo developer evaluating their own install set, that is a weekend project. For a team auditing a bundle for production use, that is a real engineering investment.

The math for buying SkillVault is exactly this: $99 buys back 20-40 hours of skilled engineering time. If your time is worth more than $2.50/hour, the bundle is the right buy.

Get the audit methodology

We ship the full methodology PDF (this article plus the per-check worksheets, the scanner setup scripts, and the OWASP mapping in spreadsheet form) free to anyone who wants to do this work themselves. The form is on the SkillVault page. No credit card. Email only.

If you want the methodology and the 41 skills we audited with it, the lifetime bundle is $99.


Free methodology PDF → Get it on the SkillVault page

Skip the audit work → SkillVault $99 lifetime

Either path is honest. Pick the one that fits your time.