The Productivity Illusion

01 — The Perception Gap

In the spring of 2025, METR — an AI safety research organisation — ran a rigorous randomised controlled trial with 16 experienced open-source developers across 246 real tasks. Half used AI tools. Half didn't. The result contradicted almost everything in the AI productivity conversation.

What Developers Felt

+20%

Developers using AI tools estimated they were 20% faster. The instant suggestions, rapid code generation, and feeling of momentum felt like genuine speed.METR

Source: METR RCT, early 2025. Developer self-report.

What Was Measured

-19%

Developers using AI tools actually took 19% longer to complete the same tasks. The overhead of prompting, waiting, reviewing, debugging, and correcting AI output exceeded the speed gains.METR

Source: METR RCT, early 2025. Objective measurement.

39pt

GAP

"LLMs give the same feeling of achievement one would get from doing the work themselves, but without any of the heavy lifting. You remember the jackpots. You don't remember sitting there plugging tokens into the slot machine for two hours."

— Marcus Hutchins, security researcher / James Liu, Director of Software Engineering, Mediaocean — MIT Technology Review, December 2025MIT

The Faros AI report, analysing telemetry from over 10,000 developers across 1,255 teams, confirmed the pattern at scale: teams using AI completed 21% more tasks and created 98% more pull requests. PR size grew 154%. Review time increased 91%. Bug counts rose. Company-level DORA metrics — deployment frequency, change failure rate, lead time — were flat. More code going in. No improvement coming out.FAR

The Stack Overflow Developer Survey 2025 backs this up from the developer side: 41.4% of developers say AI has little or no effect on their productivity. Only 16.3% say it makes them significantly more productive.SO

So what does this mean?

If you feel more productive with AI, you're in the majority. You might also be wrong.

The METR study found that experienced developers consistently overestimated their speed by 39 percentage points when using AI tools. The tools make the work feel effortless — instant suggestions, rapid generation — but the overhead of prompting, reviewing, and fixing adds up invisibly. Track your actual output metrics, not your gut feeling, and compare against the real data on AI productivity gains.

02 — Three Structural Reasons

The METR finding isn't a fluke. There are three structural reasons why AI coding tools consistently fail to deliver the productivity gains their users feel.

Amdahl's Law

Writing code has never been the bottleneck in software delivery. It accounts for roughly 25-35% of the software development lifecycle. Even a 100% coding speedup yields at most a 15-25% system improvement. The bottleneck — code review, testing, QA, deployment — didn't move. AI made the fast step faster.PD

Code review time UP 91% with high AI adoption teamsFAR

Automation Bias

When people receive AI-assisted output, they unconsciously lower their critical threshold. The fluency and confidence of AI-generated code signals competence — even when the code is wrong. Developers catch fewer errors in AI-assisted code than code they wrote themselves.MIT

Only 39% of Cursor code generations accepted without modificationSA

III

The Illusion of Understanding

AI can generate an app that runs, passes basic tests, and looks production-ready. It cannot tell you whether the app is secure, accessible, compliant, or well-architected — because those questions require understanding your specific context, users, and regulatory environment.VER

45% of AI-generated code contains security vulnerabilities across 100+ LLMsVER

Goldratt's Theory of Constraints

Optimising a step that isn't the bottleneck doesn't improve system throughput. You can make the fastest machine on the factory floor twice as fast — if it's feeding a queue that's already backed up, you've accomplished nothing at the output level. Writing code is not the bottleneck in software delivery. It never was. AI made the non-bottleneck faster. The bottleneck — understanding, reviewing, testing, securing, and maintaining — got worse.PD

So what does this mean?

AI made the fast part of software development faster. The slow parts — reviewing, testing, securing, understanding — got measurably worse.

This isn't a tool problem. It's a workflow problem — exactly the kind of thing how to use AI at work exists to fix. If your team adopted AI without changing how you review and test, you're likely shipping more code with more bugs at the same pace.

03 — What "It Works" Hides

An app that runs is not an app that's safe. The gap between "it works on my machine" and "it's production-ready" is where professional knowledge lives. AI optimises for syntactically correct, functionally passing code. It does not optimise for the things that matter when real users, real attackers, and real regulators interact with your product.

Cross-Site Scripting (XSS)

AI-generated code fails to sanitise user input in 86% of cases. Malicious scripts can run in your users' browsers, steal sessions, or redirect to phishing pages.ACC

AI adds dangerouslySetInnerHTML to a React component without DOMPurify sanitisation — standard pattern from training data, textbook XSS vector.

86%

failure

Log Injection

AI-generated logging code fails to sanitise inputs 88% of the time. Attackers can forge log entries, cover tracks, or inject malicious content into monitoring systems.ACC

Standard logging snippet that interpolates user-supplied values directly into log output — because that's how most logging examples in training data are written.

88%

failure

Broken Access Control

AI agents consistently struggle with authorisation logic. The code fixes the functional error but creates a permissions bypass. Users can access resources they shouldn't.TDS

A "Permission Denied" Supabase error leads AI to suggest USING (true) policy — entire database becomes publicly readable. The error disappears. The data breach begins.

High

risk

Hallucinated Dependencies (Slopsquatting)

19.7% of package dependencies in AI-generated code don't exist. Attackers register these hallucinated package names on PyPI and npm. Developers install them without checking.SA

Attacker monitors commonly hallucinated package names and publishes malicious versions. Developer copies AI-generated import. Installs attacker's code. Active exploitation since 2024.

19.7%

hallucinated

GDPR Non-Compliance

AI doesn't know your data processing basis, your user consent architecture, or your right-to-erasure obligations. It generates code that works. GDPR compliance is a legal and architectural requirement that no prompt can substitute for.RET

A vibe-coded app collects email addresses, stores them without a consent mechanism, has no deletion pathway, and has no privacy policy. Article 83 fines can reach 20M euros or 4% of global turnover.

Structural

risk

Accessibility (WCAG)

AI generates visually functional interfaces without WCAG consideration unless explicitly prompted — and even then it often gets it wrong. Missing alt text, poor colour contrast, keyboard navigation failures, missing ARIA labels.RET

The European Accessibility Act (EAA) came into force for products and services in June 2025. Non-compliance is now a legal exposure, not just a best-practice failure.

Rarely

checked

The Base44 platform — where the founder publicly celebrated that 100% of code was written by Cursor AI with "zero hand-written code" — was found full of basic security flaws allowing anyone to access paid features or alter data. It was shut down days after launch.KAS A separate incident: the Moltbook AI social network exposed 1.5 million API keys and 35,000 user email addresses because vibe-coded infrastructure left a Supabase database misconfigured and public.DEV

So what does this mean?

An app that runs is not an app that's safe. 86% of AI-generated code fails basic XSS protection. 19.7% of suggested dependencies don't even exist — and attackers are actively exploiting this.

If you're shipping AI-generated code without security review, you're not moving fast. You're accumulating exposure — exactly the kind of outcome that makes people ask whether AI is dangerous in the first place. The European Accessibility Act and GDPR mean this exposure now has legal teeth.

39pts

Gap between perceived productivity (+20%) and measured productivity (-19%) in METR's randomised controlled trialMETR

45%

Of AI-generated code contains security vulnerabilities, across 100+ LLMs and 80 coding tasksVER

19.7%

Of package dependencies in AI-generated code are hallucinated — they don't exist on any registrySA

5 things you can do this week
to ship AI-assisted code safely.

Subscribe to Veltrix Collective for weekly AI insights. The gap between AI hype and AI reality changes every week. We track the data so you don't have to — tools, rankings, and the numbers that matter, delivered every Tuesday.

Measure your actual cycle time, not your feelings. Set up a DORA metrics dashboard using LinearB, Sleuth, or even a simple spreadsheet tracking deployment frequency and change failure rate. The METR study proved perceived productivity and measured productivity are 39 points apart. Track the real number.

Run a security scanner on your last AI-generated project. Install Semgrep or Snyk (both have free tiers) and scan your most recent AI-assisted codebase. The Veracode report found 45% of AI code contains vulnerabilities. You need 20 minutes and a terminal to find out if yours does too.

Verify every dependency before you install it. Before running any AI-suggested npm install or pip install, check the package exists on the official registry with real maintainers. Use Socket.dev or npm audit to flag known risks. 19.7% of AI-suggested packages are hallucinated — and attackers register them.

Add one accessibility check to your workflow. Install axe DevTools (free Chrome extension) and run it on your latest project. The European Accessibility Act is now in force. 20 minutes per page catches 30-40% of WCAG violations automatically. Claude can help you fix what it finds.

The 39-point gap between feeling productive and being productive is the defining number of AI-assisted development. Close it with data, not intuition.

Sources

METR

METR"Measuring the Impact of Early-2025 AI on Developer Productivity," July 10, 2025 — 19% slower, 20% perceived faster, 16 developers, 246 tasks

FAR

Faros AI"AI Productivity Paradox Report 2025," June 2025 — 10,000+ devs, 1,255 teams, DORA metrics flat, review time +91%

VER

Veracode"2025 GenAI Code Security Report" — 45% of AI code has vulnerabilities, 100+ LLMs, 80 coding tasks

Stack OverflowDeveloper Survey 2025 — 41.4% little/no effect, 16.3% significantly more productive

MIT

MIT Technology Review"AI Coding is Now Everywhere," December 2025 — slot machine metaphor, automation bias, Mediaocean quote

SmarterArticles"The AI Coding Productivity Illusion," January 2026 — 19.7% hallucinated dependencies, slopsquatting, Cursor acceptance rate

KAS

Kaspersky"Security Risks of Vibe Coding," October 2025 — Base44 shutdown, Cursor CVE-2025-54135, Replit deletion

TDS

Towards Data Science"The Reality of Vibe Coding," February 2026 — Supabase USING(true), dangerouslySetInnerHTML patterns

ACC

Accorian"Security Impact of Vibe Coding," January 2026 — XSS 86%, log injection 88%, Java 70%+ failures

philippdubach.com"93% Adoption, 10% Gains," February 2026 — Amdahl's Law, Goldratt's Theory of Constraints, DORA 2025

RET

Retool Blog"Risks of Vibe Coding," March 2026 — GDPR, HIPAA, broken auth, insecure dependencies, WCAG

DEV

devclass.com"Vibe Coded Applications Full of Security Blunders," January 2026 — 5 agents tested, auth logic failures, Moltbook breach

04 — Stay Ahead of the Data

Veltrix Collective

The data satisfies
your curiosity.

Weekly data-backed analysis of AI adoption, tools, and what actually works. No hype. No jargon. Just the numbers that matter.

Subscribe →

Weekly, every Tuesday · No spam · Privacy policy · Unsubscribe anytime