2026-06

Building dark-pattern audit agents for AI-generated UI

TL;DR — Vibe coding generates checkouts, consent banners, and pricing pages faster than anyone can review them, and AI-generated UI reproduces the web's manipulative defaults without anyone intending to. I built two surfaces — a Claude skill, daylight, for design time, and a plugin, floodlight, for shipped pages — that share one methodology and one test: would this design survive being honestly described to the user? This is the problem, the build, what dogfooding three real interfaces turned up, and what I'd fix next.

The problem

For most of the web's history, deceptive design and a review backlog shared a bottleneck: humans produced the interface, so humans could, in principle, review it before it shipped. That assumption broke this year. Cursor, v0, Lovable, and a hundred Claude skills now generate flows faster than any design or legal review can keep up with — and they generate them from a training corpus saturated with manipulation, so they ship deceptive defaults at a volume and velocity no review process was built to catch.

The tooling meant to catch it doesn't close the gap. Automated detectors flag the shape of a known pattern, but none can tell whether a given design is actually manipulative, because that turns on context an interface alone doesn't reveal. Meanwhile the stakes have stopped being abstract — regulators in the EU and US are now fining and rule-making against exactly these patterns. So the thing that's missing is a judgment layer: something that reasons about manipulation the way a careful designer would, but runs at the speed the generation runs. That gap is what these two tools are built to fill.

Where this breaks today

A dark pattern isn't a style. It's a design that only works because the user can't quite see what it's doing — the pre-checked box, the buried "reject," the trial that quietly converts. Harry Brignull named them in 2010, Mathur and colleagues found them on a measurable share of 11,000 shopping sites in 2019, and they've been a known failure mode of the web ever since.

The new wrinkle is supply. AI now generates the flow logic and the copy, and because it learned from that same web, it ships the manipulation by default — the pricing table where the costliest plan is pre-selected, the dismiss link worded to make you feel foolish, the cookie banner with a bright "Accept all" and a ghosted "Manage." Nobody designed those to be cynical. The model absorbed them as the statistically normal way to build a paywall and reproduced them. The audit problem isn't "catch the cynical growth team." It's "catch the defaults the model inherited."

That reframing matters, because it tells you where the audit belongs: in the generation loop, at AI speed, not in a quarterly review nobody has time for.

The tool gap

There are automated detectors — AidUI, UIGuard, the more recent multimodal LLM approaches like DPDGPT — and they hit respectable accuracy at classifying known pattern types from pixels and text. But two limits are structural. Rule-based and pre-trained detectors struggle to adapt to emerging patterns, and more fundamentally, the same mechanism is honest or manipulative depending on context: a "Only 2 left" badge is fine if the warehouse has two and deceptive if the number is hardcoded. Google's own 2025 research on deceptive design makes the point plainly — design mechanisms aren't inherently deceptive; what makes them so is whether the claim is accurate and the user's interest is aligned.

So you can't classify by form. You have to reason about the case. The test I built both tools around is disclosure: write one plain sentence describing what the element is actually doing — including the part the user isn't meant to notice — and ask whether a fully-informed person would still go along with it.

  • "We worded the decline option to make you feel foolish for choosing it." No one informed wants that. Dark pattern.
  • "We turned on security-alert emails by default." A fully-informed user generally wants those. Persuasion. Not a finding.

Persuasion survives the sentence because it works in full view. Manipulation collapses, because it needs the concealment. That's a reasoning move, not a feature a classifier can spot — which is the whole reason an agent earns its place here.

What I built

Two surfaces, one methodology, two stages of the lifecycle.

daylight is a skill for design time. Point it at a component or flow you're building or reviewing; it finds the consequential choices, runs the disclosure test at each, classifies the finding under the autonomy-violation type it maps to, and rates it. It follows the flow across files — and it always looks for the exit, because a missing or harder cancel path is the finding most often absent from a single signup component.

floodlight is a plugin for shipped UI. Give it a URL; it opens a real browser, renders the page at desktop and mobile widths, reads the live DOM, and runs read-only probes the static read can't: reload to see whether a countdown resets, count the clicks between "subscribe" and "cancel," check whether a "Reject all" control exists at all. It observes and probes only — it never submits, purchases, or consents, because an audit that buys the subscription to test cancellation has become the harm it was measuring.

Both write to the same methodology.md, so "confirmshaming, Major" means the same thing whether it surfaced in a Figma handoff or on a competitor's checkout.

Standards

The methodology is organized around the three autonomy violations the EU Digital Services Act prohibits in Article 25 — deception, manipulation, and distortion — because they're a cleaner spine than the long pattern list, and they're enforced: the Commission has already issued a €200M DSA fine to Temu. The Digital Fairness Act, confirmed in the EU's 2030 Consumer Agenda adopted in November 2025 with a proposal expected late 2026, takes direct aim at the patterns this audit targets most — cancellation flows, drip pricing, addictive design. In the US, the FTC's click-to-cancel rule was vacated on procedural grounds in July 2025 and re-opened for fresh rulemaking in January 2026, but cancellation symmetry is still enforced under Section 5 and ROSCA, and California's CPRA already voids consent obtained through a dark pattern.

The tools surface the relevant rule as context. They never declare a design lawful or compliant — that's a determination for counsel, not an agent. Cite, never certify.

Dogfooding

A specimen built from a neutral brief

I started where the honest test is: on something I generated myself. I gave a model a plain brief — a signup-and-pricing modal for a free trial, plus the marketing-site cookie banner — and built it the obvious way, planting nothing. Then I audited the result.

Eight findings: three Critical, three Major, two AI-uncertain, one cleared. The Criticals all clustered at the money-and-consent moments, and every one was a place where the visible copy and the actual mechanism pointed in opposite directions. The subhead promised "No commitment, cancel anytime" over a cancellation that ran through a business-hours chat queue. A reassuring 14-day trial sat above a $348 auto-charge rendered in the faintest text on the screen. "Improve your experience" sat above ad-sharing toggled on by default. The one I cleared — a pre-checked "email me about security activity" — is the discrimination that matters: a skill that flags every default is noise.

The run also taught me something about the skill. It caught the cancellation roach-motel only because the fine print happened to mention the cancel mechanism; on a bare signup component the exit lives elsewhere or doesn't exist yet, and the most important finding would have been missed. I made "always locate the exit" a hard step, not a suggestion.

Two real pricing flows

Then I pointed the methodology at two live pages chosen for contrast — a mature SaaS and a conversion-aggressive consumer subscription.

Notion came out clean, and that's the result that proves the test discriminates rather than just flagging everyone. Cancellation is self-serve and documented in plain words. Auto-renewal is disclosed. The refund policy is unusually consumer-forward, explicitly honoring the EU's 14-day right. The one asymmetry — you're charged immediately when you add a seat mid-cycle but get no credit when you remove one — is stated openly and leaves you the seat's value, so by the disclosure test it isn't a dark pattern at all. It's a billing-fairness question. The skill's job there is to not inflate it.

NordVPN was busier, but in an instructive way: nothing is hidden. The headline is "$3.09 per month," while the transaction is $83.43 billed upfront; the true total sits in the sub-line. The struck-through "73% off" anchor turns out to be measured against the renewal price — a number no new buyer pays — which the page itself notes in small text. The renewal jump is disclosed; the auto-renew terms are in a footnote. The 30-day money-back guarantee is prominent, and the add-ons are clearly labeled, not pre-checked. So the findings landed at Major and AI-uncertain, not Critical: the manipulation here is hierarchy, not concealment — the flattering number foregrounded, the true cost rendered small.

The pattern that emerged

Three interfaces, three different places on the same scale. The specimen earned Criticals because it actually hid things. NordVPN earned Majors and AI-uncertains because it discloses everything and buries it. Notion came out clean. A naïve scanner that flags every pre-checked box and urgency cue would have screamed at all three equally. The thing that separates them — does the design need concealment to work? — is exactly the judgment automated detection can't make, and exactly what the disclosure test is for.

The runs also drew the line between my two surfaces sharply. Most of NordVPN's open questions were prominence — is the renewal price buried under the discount badge? — and prominence is a rendered-page judgment that fetched markup flattens. That's not a weakness of the design-time skill; it's the precise seam where daylight hands off to floodlight, which can see the visual weight and resolve it.

Takeaways

Cite, never certify. The agent surfaces the rule and the reasoning. Whether a real user was actually deceived, and whether that's unlawful, needs user testing and counsel.

Most real manipulation is disclosed. Outright concealment is legally risky, so mature products rarely hide the unfavorable fact — they render it small. An audit that only catches lies will clear most of the web. The hierarchy is the pattern.

Inherited, not intended. AI generates the next web's defaults. If those defaults are manipulative because the training web was, the manipulation compounds silently unless the fix is in the loop that produces it.

Property, not phase. Honesty isn't a review step near launch. It's a property of the work, owed to the person using it, and it belongs in the tools that build it.

Next. Run floodlight live across a wider set — cancellation flows, consent banners, AI-generated landing pages — and test whether running daylight during prototyping measurably reduces the deceptive defaults that reach ship.

Sources