Vulnerability Research

Vulnerability research is the systematic process of identifying, analyzing, and validating weaknesses in software, hardware, infrastructure, or operational processes that could be exploited to compromise confidentiality, integrity, availability, or expected system behavior. It combines threat modeling, code review, dynamic testing, reverse engineering, and exploit validation to understand whether a weakness is real, how severe it is, and how it can be remediated.

Before reaching for AI, start with classic DevSecOps methods that are deterministic and cheap:

Scanning Docker images and dependencies using tools like Dependabot Security Updates GitHub., Grype Anchore. or Syft Anchore.
Securing package managers using minimumReleaseAge or equivalent
Running audits like npm audit in the CI pipeline
Locking dependency versions and enforcing SHA pinning
Rebuilding Docker images regularly to pick up patched base images and OS packages after newly disclosed vulnerabilities
Linters to protect the codebase from common attacks like XSS or SQL injection
Static analysis tools

AI-assisted audits are the next layer to consider once the deterministic foundations are in place. With AI, we can scan our codebase more thoroughly for vulnerabilities that deterministic tools miss.

Keep in mind: finding vulnerabilities with AI must cost defenders less than it would cost attackers to exploit them.

Signal vs. noise

In the early days of AI in 2024–2025, vulnerability scanning produced more noise than real findings. Many of the detected issues could have been found by existing tools in a simpler and cheaper way.

The situation is changing, driven by competition between companies like Anthropic and OpenAI to supply AI capabilities to the defense industry, governments, and militaries. Claude Mythos Preview Anthropic’s Frontier Red Team. 2026-04-07 shows what the latest Anthropic model found in real codebases:

A 27-year-old integer overflow in OpenBSD’s TCP stack
A 16-year-old codec bug in FFmpeg that had survived extensive automated fuzzing
A 17-year-old remote code execution path in FreeBSD’s NFS implementation

Projects that signal major labs now take defensive capability seriously:

Project Glasswing Anthropic. 2026-04-14
Claude Code Security Anthropic. 2026-04-16
Codex Security OpenAI.

It is worth following AI security research tag - Simon Willison's Weblog Simon Willison., which tracks this space closely.

…

What about weaker models?

There is an important caveat: many of these bugs can also be found by weaker models when the scope is narrow enough. Models that fail to spot a vulnerability in a large codebase often succeed when pointed directly at the relevant function. This effect — sometimes called peepholing — means that model capability matters less than how well you scope the task.

Real capability exists, but it is early and uneven. AI augments a security review; it does not replace it. A false positive is annoying; a false negative you acted on is dangerous.

…

Doing it yourself

How do you make AI useful rather than producing unnecessary noise? A few practices:

Use classic detection methods first. Start by ensuring your repository has automated, deterministic vulnerability detection for dependencies and the codebase. Reserve AI and token usage for the hardest cases that deterministic tools miss — every detectable vulnerability left unaddressed in the pipeline is also noise for the model, and that costs tokens.
Narrow the scope. Point the agent at a specific attack surface — an authentication flow, a query layer, a deserialization path — rather than the entire repository. The more context it has, the more precisely it can reason; the more code you give it, the more it will hallucinate. Good application architecture makes this easier.
Name the vulnerability classes you care about. Injection flaws, authentication bypasses, insecure direct object references, privilege escalation, race conditions — naming them explicitly yields better results than a generic “find security issues” prompt. Use ready-made checklists like API Security Best Practices and work through them one by one.
Demand structured output. Ask for each finding to include severity, file path, line range, vulnerability class, explanation, and a suggested fix. Structured output makes it much easier to triage and verify.
Verify every finding manually. LLMs hallucinate proof-of-concept exploits and misread control flow. A finding is a hypothesis, not a confirmed bug.

A starting prompt for a focused, single-class audit:

You are a security researcher performing a focused audit.
Audit only for the ONE vulnerability class specified below — do not report anything else.

Vulnerability class: [e.g. SQL injection / authentication bypass / privilege escalation]

For each finding: severity (critical/high/medium/low), file path, line range,
explanation, and a minimal fix.
Do not report issues already covered by static analysis or linters.
Do not report issues that are clearly handled by framework-level defenses.

<code>
...
</code>

…

CI workflow

The tooling changes quickly, but the underlying pattern is stable.

Run deterministic tools first. Grype, Dependabot, npm audit, and linters cover the well-defined surface cheaply and reliably. Only send to AI what they miss — issues already detectable by static tools are noise for the model.
Trigger on security-sensitive changes. Run AI-assisted analysis on pull requests that touch authentication, payments, or data access layers, and consider a nightly full-repo scan. Scanning everything on every commit is usually too expensive to be useful.
Feed the diff, not the whole codebase. Give the model the changed files plus enough surrounding context. Diff-only scanning keeps token costs proportional to the amount of new code introduced.
Get structured findings. Return findings as structured data — JSON works well. This makes it straightforward to gate on severity, route to a review queue, or track trends over time.
Gate on severity, not on volume. Block the pull request only on critical findings. Route lower-severity findings to a review queue and address them on a schedule. If you block on everything, developers will learn to ignore the scanner.
Treat AI findings like notes from a junior reviewer. Worth reading, but not merge-blocking without a human confirming the issue is real.

…

Vulnerability Research

Signal vs. noise

What about weaker models?

Doing it yourself

CI workflow

Further reading

Attribution

Authors