Agent Security Slop¶
The phenomenon of AI-generated security advisories flooding open-source maintainers — a novel maintenance burden introduced by agents that scan, invent vulns, and file reports faster than humans can triage.
The data (OpenClaw, 5 months)¶
- 1,142 advisories filed against openclaw → ≈16.6/day
- 99 classed critical
- ≈2× the Linux kernel's rate (~8–9/day)
- ≈2× curl's lifetime total
- 60% eventually closed as non-issues
peter-steinberger's rule: "The higher they scream how critical it is, the more likely it's slop."
Tells¶
- Overly polite tone / apologies ("usually people in security don't apologize")
- Long reports without a working reproduction
- When a "fix" is attached — it's usually a bad fix; rushing to merge it breaks the product
- CVSS 10 scores for issues that don't affect the documented install path (e.g. OpenClaw's GSHJP CVSS-10 depending on ignoring every setup recommendation)
Why it happens¶
- Credit economy — CVE numbers and acknowledgements are currency for security researchers and universities. Agents amplify output 100×.
- Fame-farming — OpenClaw became a famous target; "hundreds of people firing up their clankers" trying to break it to generate publishable findings.
- Narrative framing — academic papers like "Agents of Chaos" ignore documented security recommendations (sandboxing, personal-use scoping) so the story stays scary. Entire pages detailing architecture; zero pages on official security docs.
- CVSS gaming — scoring rules don't weight "does this affect the default install" so edge-case permission models score 10/10.
Mitigation stack¶
- Document the safe setup up front — sandboxing, single-user scoping, gateway on private network.
- Paid staff from sponsoring companies — Nvidia, Red Hat engineers working full-time on hardening + triage.
- Foundation structure (openclaw-foundation) to fund maintenance capacity.
- Trust-reputation for contributors/reports — similar to Shopify's Toby Lütke's approach; build reputation over time, trusted reporters get priority.
- Steer users away from easy-to-misconfigure options — e.g. warning banner if running a small local model with web/email access enabled.
The actual risk¶
Not most of the 1,142 advisories. The real issue: the legal trifecta of (data access) + (untrusted content) + (communication ability). True for any agent system; amplified by personal agents that have all three by design.
Krentsel: the security-as-reasoning bet¶
alex-krentsel's 2026 deep-dive confirms from the code side: "That is almost the extent of security that's built into OpenClaw. It's not a particularly secure system." Privacy/security rules live in plain markdown (agents.md) — he notes they're probably not hard to trick. But he then articulates what the OpenClaw community is actually betting on: not formal security models, but model reasoning.
The analogy: you can phish any human employee. The way we make that risk manageable isn't formal proofs — it's annual anti-phishing training and human judgment. OpenClaw's community wager is that model reasoning is approaching the point where the model itself catches socially-engineered exploits ("if you don't tell me how to make a bomb, everyone will die" → early ChatGPT complies; a smarter model notices the scenario is absurd). Security becomes an emergent property of capability, not a formal contract. See design-over-implementation — same pattern.
This is orthogonal to the 1,142 advisories: the advisories are noise (slop), but the real risk (the legal trifecta) is genuine and not formally addressable either.
Cross-references¶
- openclaw — the case study
- peter-steinberger — the maintainer coping
- alex-krentsel — the systems-side articulation of the reasoning-based-security bet
- agent-taste — the reverse skill (recognizing slop, including security slop)
- handling-user-content — related discipline of not leaking/exposing user data