2026-06-05

Anthropic’s Reference Harness for Autonomous Vulnerability Discovery

Anthropic’s new defending-code-reference-harness repo is interesting because it turns “Claude for security” from a vague demo into a concrete workflow. If you build with Claude Code or are experimenting with agentic security tooling, this is the kind of repo that shows how far the model can be pushed when the task is tightly scoped and the environment is carefully sandboxed.

Key Points

The repo is a reference implementation for autonomous vulnerability discovery and remediation with Claude.
It packages both:
- Claude Code skills for interactive work like /quickstart, /threat-model, /vuln-scan, /triage, /patch, and /customize
- a pipeline harness that runs the full recon → find → verify → report → patch loop autonomously
The harness is explicitly aimed at C/C++ memory vulnerabilities and uses Docker + ASAN in the reference setup.
Anthropic is clear that this is not a product and not maintained; the repo is provided as a reference implementation.
The repo points readers to a managed option, Claude Security, for teams that want a hosted product that scans repositories, reduces false positives with multi-stage verification, and manages findings through triage, fix validation, and rapid fix generation.
The skills are meant to help with the human-in-the-loop side of the workflow:
- build a threat model
- run scoped scans
- triage findings
- draft patches
The autonomous pipeline is more constrained and security-conscious:
- some skills only read/write files
- /customize edits harness code and runs validation commands
- the pipeline itself can execute target code
- it refuses to run outside a gVisor sandbox unless explicitly overridden
Anthropic recommends starting small:
- Day 1: threat model + static scan + triage
- Day 2: run the reference pipeline on a C/C++ library
- Days 3–5: customize it for your target
- Week 2: move into autonomous scanning, triage, and patching
The repo includes guidance for setup, sandboxing, customization, patching, troubleshooting, and safeguards for dangerous cyber work.
It also makes a strong point that the best security teams are the ones that get hands-on quickly instead of trying to design the perfect pipeline first.

My Take

What strikes me is that this repo is less about “look, Claude can find bugs” and more about operationalizing the boring, hard parts: scoping, triage, verification, sandboxing, and patch generation. That’s the real story here. Anyone can wave around an agent that claims to discover vulnerabilities; much fewer teams can make the loop reliable enough to use repeatedly.

I think the most useful idea here is the split between interactive skills and the autonomous harness. That division feels practical. In real security workflows, you usually want a human to frame the problem and inspect the output first, then let automation take over only where it’s earned trust. The repo seems designed around that assumption, and I like that a lot.

The sandboxing story also matters. It’s easy to get dazzled by autonomous security agents and forget that they are literally running code from the target. Anthropic’s insistence on gVisor isolation and explicit approvals is the right kind of unsexy engineering. If you’re building anything similar, I’d treat that as the lesson, not an optional detail.

At the same time, I’d be cautious about overgeneralizing this to “Claude can autonomously secure any codebase.” The repo itself says the harness is a reference, not something that works out of the box everywhere. That honesty is refreshing. I think the customization step is where the real effort lives, and perhaps that’s where many teams will discover that their stack, build system, or vuln class needs substantial adaptation.

I’d actually try this if I were using Claude Code on a security-heavy repo, especially for a C/C++ target or a codebase where crash reproduction matters. I’d start with the interactive flow first, because that seems like the fastest way to understand how Claude behaves on security tasks before letting the autonomous pipeline loose. If I were evaluating it for a team, I’d care most about false positive rates, reproducibility, and how often the patching stage produces something I’d trust without major cleanup.

The bigger takeaway: this is a serious, grounded attempt to make agentic security workflows reproducible rather than magical. That’s a good sign for Claude developers who want useful automation instead of flashy demos.

Reference: GitHub - anthropics/defending-code-reference-harness: Skills for threat modeling, scanning, triage, patching, plus an autonomous scanning harness you can /customize

同じ著者の記事