VeltrixVeltrix.
← All articles
23 / 62March 21, 2026

AI Bias Explained: How It Happens, Real Examples, and How to Detect It

What AI bias is, the 4 types, real-world cases like Amazon's hiring tool and facial recognition errors, and what companies are doing about it.

AI bias occurs when a model produces systematically skewed outputs that disadvantage certain groups — usually reflecting patterns in training data that encode historical inequity. The AI didn't decide to discriminate. The data it learned from reflects societies that did.

This matters because AI systems are increasingly used to make or inform consequential decisions: who gets a job interview, who qualifies for a loan, who receives medical care, who gets flagged by law enforcement. When those systems carry bias, the consequences aren't statistical artefacts — they're real harms to real people. And because AI operates at scale, a biased algorithm can discriminate against thousands of people before anyone notices.

The "garbage in, garbage out" framing undersells the problem. It's not just that biased data produces biased models. It's that AI models can amplify biases beyond what was in the original data, making them worse over time — especially in feedback loop situations where model outputs shape the data that trains the next model. MITCH

Bias enters AI systems in different ways at different stages of development. Understanding the source determines how it should be addressed.

Historical bias

The training data reflects historical inequities — not because it was chosen badly, but because history was unequal. A model trained on historical hiring decisions will learn that men were hired more often, because they were — but for reasons that have nothing to do with merit.

Example: Amazon's hiring AI trained on 10 years of hiring data in a male-dominated industry learned to downgrade women's resumes.

Representation bias

Some groups are underrepresented in training data. The model performs worse on them because it has seen fewer examples. Facial recognition trained predominantly on white faces has higher error rates for darker-skinned faces.

Example: NIST found false positive rates 10-100x higher for Black and East Asian faces in commercial facial recognition systems.

Measurement bias

The outcome being predicted is itself a flawed proxy for the underlying concept. "Credit risk" modelled on past defaults inherits historical barriers to credit that made certain groups look riskier than they were.

Example: Healthcare cost algorithms used "future cost" as a proxy for "health need" — but Black patients had lower costs because they received less care, not because they were healthier.

Aggregation bias

A model trained on aggregate data may not work well for subgroups with different patterns. A diabetes prediction model trained on general population data may perform worse for specific ethnic groups with different risk profiles.

Example: General medical AI models often have significantly reduced accuracy for minority populations who weren't adequately represented in training cohorts.

System Bias found Outcome
Amazon hiring AI (2018) Systematically downgraded resumes containing the word "women's" (e.g., "women's chess club") System scrapped. AMZN
COMPAS recidivism algorithm Black defendants twice as likely to be falsely flagged as high risk for reoffending Ongoing use despite documented bias; legal challenges. PROP
Healthcare cost algorithm (Optum) Systematically assigned Black patients lower health need scores than equally sick white patients Bias discovered and partially corrected. 17,000 patients were under-served. OBER
Facial recognition (multiple) False positive rates 10-100x higher for Black and East Asian faces vs white faces Several US cities banned government use. Detroit PD paused use after wrongful arrest. NIST
Apple Card (Goldman Sachs) credit algorithm Women received significantly lower credit limits than men with similar credit profiles NY DFS investigation; Goldman settled without finding deliberate discrimination. NYFS
Is ChatGPT biased?

Yes, in ways that have been documented. Studies have found ChatGPT produces more negative associations for Arab names than English names, generates higher-income professional roles for men and lower-income roles for women in hypothetical scenarios, and produces responses that reflect cultural biases in its predominantly English-language training data. OpenAI acknowledges bias as an ongoing challenge and publishes research on its mitigation efforts. But any model trained on human-generated text will reflect human biases to some degree — the question is whether those biases are measured, disclosed, and actively reduced.

Can AI bias be fixed?

Partially. Bias can be reduced through careful data curation, diverse training sets, fairness constraints during training, post-processing adjustments, and regular auditing. But there are mathematical limits: some fairness criteria are provably incompatible with each other — you can't simultaneously satisfy all commonly used fairness definitions. Reducing bias completely would require eliminating historical inequity from training data, which would require rewriting history. The practical goal is measurable improvement, transparency about remaining bias, and avoiding deployment in contexts where the bias creates serious harm.

How do you test for AI bias?

Several approaches: counterfactual testing (changing protected attributes like race or gender and observing if outputs change), disparity analysis (comparing error rates or outcome rates across demographic groups), audit studies (sending matched applications through AI hiring systems), and red-teaming (adversarially testing for biased outputs). The EU AI Act requires bias testing for high-risk AI systems. Third-party auditing firms like Parity AI and Fairly AI have emerged to provide independent bias assessment.

Sources
NIST
NIST — Face Recognition Vendor Test 2019nist.gov
AMZN
Reuters — Amazon AI Hiring Bias, 2018reuters.com
PROP
ProPublica — Machine Bias, 2016propublica.org
OBER
Obermeyer et al — Dissecting racial bias in healthcare algorithms, Science 2019science.org
MITCH
Mitchell et al — Model Cards for Model Reporting, FAccT 2019arxiv.org/abs/1810.03993
AI bias is a reflection of human bias at scale.
The solution is measurement, not denial.

Every organisation deploying AI in consequential decisions should be able to answer three questions: What bias testing was done before deployment? What are the known error rate disparities across demographic groups? What is the process for identifying and correcting bias post-deployment? The organisations that can't answer these questions are deploying blind.

As AI moves deeper into healthcare, hiring, lending, and criminal justice, the stakes of getting this wrong increase. The good news is that the tooling for bias measurement has improved significantly. The question is whether the organisations deploying AI choose to use it.

Stay ahead of the curve

AI intelligence,
weekly.

Every week: the AI developments that matter, the tools worth trying, and the data behind the headlines. No hype. No filler.

Subscribe free →

Veltrix Collective · Sources: NIST, Reuters, ProPublica, Science (Obermeyer), arXiv (Mitchell). Published April 2026.

Written by Luke Madden, founder of Veltrix Collective. Data synthesis and analysis by Vel.