VeltrixVeltrix.
← All articles
12 / 62March 15, 2026

The Productivity Illusion

The app works. It is just not safe. Why AI productivity tools carry risks most users ignore.

In the spring of 2025, METR — an AI safety research organisation — ran a rigorous randomised controlled trial with 16 experienced open-source developers across 246 real tasks. Half used AI tools. Half didn't. The result contradicted almost everything in the AI productivity conversation.

"LLMs give the same feeling of achievement one would get from doing the work themselves, but without any of the heavy lifting. You remember the jackpots. You don't remember sitting there plugging tokens into the slot machine for two hours."

— Marcus Hutchins, security researcher / James Liu, Director of Software Engineering, Mediaocean — MIT Technology Review, December 2025MIT

The Faros AI report, analysing telemetry from over 10,000 developers across 1,255 teams, confirmed the pattern at scale: teams using AI completed 21% more tasks and created 98% more pull requests. PR size grew 154%. Review time increased 91%. Bug counts rose. Company-level DORA metrics — deployment frequency, change failure rate, lead time — were flat. More code going in. No improvement coming out.FAR

The Stack Overflow Developer Survey 2025 backs this up from the developer side: 41.4% of developers say AI has little or no effect on their productivity. Only 16.3% say it makes them significantly more productive.SO

So what does this mean?

If you feel more productive with AI, you're in the majority. You might also be wrong.

The METR study found that experienced developers consistently overestimated their speed by 39 percentage points when using AI tools. The tools make the work feel effortless — instant suggestions, rapid generation — but the overhead of prompting, reviewing, and fixing adds up invisibly. Track your actual output metrics, not your gut feeling.

The METR finding isn't a fluke. There are three structural reasons why AI coding tools consistently fail to deliver the productivity gains their users feel.

I
Amdahl's Law
Writing code has never been the bottleneck in software delivery. It accounts for roughly 25-35% of the software development lifecycle. Even a 100% coding speedup yields at most a 15-25% system improvement. The bottleneck — code review, testing, QA, deployment — didn't move. AI made the fast step faster.PD
Code review time UP 91% with high AI adoption teamsFAR
II
Automation Bias
When people receive AI-assisted output, they unconsciously lower their critical threshold. The fluency and confidence of AI-generated code signals competence — even when the code is wrong. Developers catch fewer errors in AI-assisted code than code they wrote themselves.MIT
Only 39% of Cursor code generations accepted without modificationSA
III
The Illusion of Understanding
AI can generate an app that runs, passes basic tests, and looks production-ready. It cannot tell you whether the app is secure, accessible, compliant, or well-architected — because those questions require understanding your specific context, users, and regulatory environment.VER
45% of AI-generated code contains security vulnerabilities across 100+ LLMsVER
Goldratt's Theory of Constraints

Optimising a step that isn't the bottleneck doesn't improve system throughput. You can make the fastest machine on the factory floor twice as fast — if it's feeding a queue that's already backed up, you've accomplished nothing at the output level. Writing code is not the bottleneck in software delivery. It never was. AI made the non-bottleneck faster. The bottleneck — understanding, reviewing, testing, securing, and maintaining — got worse.PD

So what does this mean?

AI made the fast part of software development faster. The slow parts — reviewing, testing, securing, understanding — got measurably worse.

This isn't a tool problem. It's a workflow problem. If your team adopted AI without changing how you review and test, you're likely shipping more code with more bugs at the same pace.

An app that runs is not an app that's safe. The gap between "it works on my machine" and "it's production-ready" is where professional knowledge lives. AI optimises for syntactically correct, functionally passing code. It does not optimise for the things that matter when real users, real attackers, and real regulators interact with your product.

The Base44 platform — where the founder publicly celebrated that 100% of code was written by Cursor AI with "zero hand-written code" — was found full of basic security flaws allowing anyone to access paid features or alter data. It was shut down days after launch.KAS A separate incident: the Moltbook AI social network exposed 1.5 million API keys and 35,000 user email addresses because vibe-coded infrastructure left a Supabase database misconfigured and public.DEV

So what does this mean?

An app that runs is not an app that's safe. 86% of AI-generated code fails basic XSS protection. 19.7% of suggested dependencies don't even exist — and attackers are actively exploiting this.

If you're shipping AI-generated code without security review, you're not moving fast. You're accumulating exposure. The European Accessibility Act and GDPR mean this exposure now has legal teeth.

5 things you can do this week
to ship AI-assisted code safely.
1.

Subscribe to Veltrix Collective for weekly AI insights. The gap between AI hype and AI reality changes every week. We track the data so you don't have to — tools, rankings, and the numbers that matter, delivered every Tuesday.

2.

Measure your actual cycle time, not your feelings. Set up a DORA metrics dashboard using LinearB, Sleuth, or even a simple spreadsheet tracking deployment frequency and change failure rate. The METR study proved perceived productivity and measured productivity are 39 points apart. Track the real number.

3.

Run a security scanner on your last AI-generated project. Install Semgrep or Snyk (both have free tiers) and scan your most recent AI-assisted codebase. The Veracode report found 45% of AI code contains vulnerabilities. You need 20 minutes and a terminal to find out if yours does too.

4.

Verify every dependency before you install it. Before running any AI-suggested npm install or pip install, check the package exists on the official registry with real maintainers. Use Socket.dev or npm audit to flag known risks. 19.7% of AI-suggested packages are hallucinated — and attackers register them.

5.

Add one accessibility check to your workflow. Install axe DevTools (free Chrome extension) and run it on your latest project. The European Accessibility Act is now in force. 20 minutes per page catches 30-40% of WCAG violations automatically. Claude can help you fix what it finds.

The 39-point gap between feeling productive and being productive is the defining number of AI-assisted development. Close it with data, not intuition.
Sources
METR
METR"Measuring the Impact of Early-2025 AI on Developer Productivity," July 10, 2025 — 19% slower, 20% perceived faster, 16 developers, 246 tasks
FAR
Faros AI"AI Productivity Paradox Report 2025," June 2025 — 10,000+ devs, 1,255 teams, DORA metrics flat, review time +91%
VER
Veracode"2025 GenAI Code Security Report" — 45% of AI code has vulnerabilities, 100+ LLMs, 80 coding tasks
SO
Stack OverflowDeveloper Survey 2025 — 41.4% little/no effect, 16.3% significantly more productive
MIT
MIT Technology Review"AI Coding is Now Everywhere," December 2025 — slot machine metaphor, automation bias, Mediaocean quote
SA
SmarterArticles"The AI Coding Productivity Illusion," January 2026 — 19.7% hallucinated dependencies, slopsquatting, Cursor acceptance rate
KAS
Kaspersky"Security Risks of Vibe Coding," October 2025 — Base44 shutdown, Cursor CVE-2025-54135, Replit deletion
TDS
Towards Data Science"The Reality of Vibe Coding," February 2026 — Supabase USING(true), dangerouslySetInnerHTML patterns
ACC
Accorian"Security Impact of Vibe Coding," January 2026 — XSS 86%, log injection 88%, Java 70%+ failures
PD
philippdubach.com"93% Adoption, 10% Gains," February 2026 — Amdahl's Law, Goldratt's Theory of Constraints, DORA 2025
RET
Retool Blog"Risks of Vibe Coding," March 2026 — GDPR, HIPAA, broken auth, insecure dependencies, WCAG
DEV
devclass.com"Vibe Coded Applications Full of Security Blunders," January 2026 — 5 agents tested, auth logic failures, Moltbook breach
Veltrix Collective
The data satisfies
your curiosity.

Weekly data-backed analysis of AI adoption, tools, and what actually works. No hype. No jargon. Just the numbers that matter.

Weekly, every Tuesday · No spam · Privacy policy · Unsubscribe anytime

Methodology note: This analysis synthesises findings from 12 independent research sources spanning randomised controlled trials, large-scale telemetry studies, security audits, and developer surveys. All figures are drawn directly from the cited studies. Where sources use different measurement methodologies, we note the distinction. This is a data synthesis, not original research — our contribution is connecting the dots across sources that are rarely read together. British English spelling throughout.
Written by Luke Madden, founder of Veltrix Collective. Data synthesis and analysis by Vel.