Problems with AI, 2025 and Beyond

**Tintin** · 10th April 2026 11:12

This post deals with AI's inability to determine what's fake from real, in what could best be described as really very ironic indeed

It couldn't tell a joke either, I suspect.
________________________________________

Source: https://x.com/HedgieMarkets/status/2042430442448548273

🦔A researcher invented a fake eye condition called bixonimania, uploaded two obviously fraudulent papers about it to an academic server, and watched major AI systems present it as real medicine within weeks.
The fake papers thanked Starfleet Academy, cited funding from the Professor Sideshow Bob Foundation and the University of Fellowship of the Ring, and stated mid-paper that the entire thing was made up. Google's Gemini told users it was caused by blue light. Perplexity cited its prevalence at one in 90,000 people.

ChatGPT advised users whether their symptoms matched. The fake research was then cited in a peer-reviewed journal that only retracted it after Nature contacted the publisher.

My Take
The researcher made the papers as obviously fake as possible on purpose. The AI systems didn't catch it. Neither did the human researchers who cited it in real journals, which means people are feeding AI-generated references into their work without reading what they're actually citing.

I've covered the FDA using AI for drug review, the NYC hospital CEO ready to replace radiologists, and ChatGPT Health launching this year. All of that is happening in the same environment where a condition funded by a Simpsons character and endorsed by the crew of the Enterprise was being presented as emerging medical consensus. The people making these deployment decisions seem to believe the pipeline from research to AI to patient is more supervised than it actually is. This experiment suggests it isn't supervised much at all.

Hedgie🤗

_____________________________

The original source is Nature magazine: https://www.nature.com/articles/d41586-026-01100-y
Scientists invented a fake disease. AI told people it was real

**sdv** · 10th April 2026 13:13

Posted by Tintin (here)
This post deals with AI's inability to determine what's fake from real, in what could best be described as really very ironic indeed

It couldn't tell a joke either, I suspect.
________________________________________

Source: https://x.com/HedgieMarkets/status/2042430442448548273

🦔A researcher invented a fake eye condition called bixonimania, uploaded two obviously fraudulent papers about it to an academic server, and watched major AI systems present it as real medicine within weeks.
The fake papers thanked Starfleet Academy, cited funding from the Professor Sideshow Bob Foundation and the University of Fellowship of the Ring, and stated mid-paper that the entire thing was made up. Google's Gemini told users it was caused by blue light. Perplexity cited its prevalence at one in 90,000 people.

ChatGPT advised users whether their symptoms matched. The fake research was then cited in a peer-reviewed journal that only retracted it after Nature contacted the publisher.

My Take
The researcher made the papers as obviously fake as possible on purpose. The AI systems didn't catch it. Neither did the human researchers who cited it in real journals, which means people are feeding AI-generated references into their work without reading what they're actually citing.

I've covered the FDA using AI for drug review, the NYC hospital CEO ready to replace radiologists, and ChatGPT Health launching this year. All of that is happening in the same environment where a condition funded by a Simpsons character and endorsed by the crew of the Enterprise was being presented as emerging medical consensus. The people making these deployment decisions seem to believe the pipeline from research to AI to patient is more supervised than it actually is. This experiment suggests it isn't supervised much at all.

Hedgie🤗

_____________________________

The original source is Nature magazine: https://www.nature.com/articles/d41586-026-01100-y
Scientists invented a fake disease. AI told people it was real

I used to think that the most dangerous outcome for AI would be that most people would stop developing and using critical thinking skills. Now I am not that sure and see that the dangers go way beyond what I could imagine.

How can we get AI under control and still keep it as a useful tool?

**Jaak** · 10th April 2026 15:57

**ExomatrixTV** · 13th April 2026 00:03

New Battles RAGE Across The Country Over Data Center Construction!:

Source: https://youtube.com/watch?v=KqOzexRjTf0

We need to talk:

Source: https://youtube.com/watch?v=ktoWPIeVrzk

@DaveShap quote:

"This video has been up for 16 minutes, and I've already had to ban a few people for various levels of inciting violence or cheerleading violence. I am going to keep the comments open, but I encourage everyone to report violent comments, or comments that advocate or cheerlead more violence".

**ExomatrixTV** · 17th April 2026 00:37

The AI Expert Who Thinks We've Already Lost — Dr Roman Yampolskiy

Source: https://youtube.com/watch?v=3I60uZEqXr0

00:00 Trailer 01:11 Why AI Safety Matters 05:11 Early Warnings & Risks 08:20 Exponential AI Progress 10:29 AI Survival Instincts 14:32 Can Nations Stop AI? 17:26 Ad: Quo 18:46 Why Safety May Be Impossible 25:19 Jobs, Meaning & Society 32:17 Best-Case AI Future 35:08 Ad: Qualia 36:47 AI Bias and Existential Risks 46:28 AI Warfare & Deepfakes 53:41 Ad: Hillsdale College 55:02 What Should We Do? 01:09:31 What's The One Thing We're Not Talking About?

**ExomatrixTV** · 18th April 2026 19:49

The AI Backlash Has Reached a Tipping Point:

Source: https://youtube.com/watch?v=eDSaZveGttk

Data center pays no property taxes but if you want to build a barn on your own land now you have to pay more!

**Tintin** · Yesterday 11:10

A helpful summary from Nav Toor follows the abstract presented below:

______________________________

HALLUHARD: A Hard Multi-Turn Hallucination Benchmark
Authors: Dongyang Fan Sebastien Delsad Nicolas Flammarion Maksym Andriushchenko
Source: https://arxiv.org/abs/2602.01031
PDF view/download: https://arxiv.org/pdf/2602.01031

ABSTRACT

Large language models (LLMs) still produce plausible-sounding but ungrounded factual claims, a problem that worsens in multi-turn dialogue as context grows and early errors cascade. We introduce , a challenging multi-turn hallucination benchmark with 950 seed questions spanning four high-stakes domains: legal cases, research questions, medical guidelines, and coding. We operationalize groundedness by requiring inline citations for factual assertions. To support reliable evaluation in open-ended settings, we propose a judging pipeline that iteratively retrieves evidence via web search. It can fetch, filter, and parse full-text sources (including PDFs) to assess whether cited material actually supports the generated content. Across a diverse set of frontier proprietary and open-weight models, hallucinations remain substantial even with web search ( for the strongest configuration, Opus-4.5 with web search), with content-grounding errors persisting at high rates. Finally, we show that hallucination behavior is shaped by model capacity, turn position, effective reasoning, and the type of knowledge required.

_____________________________

Summary via Nav Toor on X:

Researchers at EPFL proved your AI is lying to you.

Not sometimes. Most of the time. They built one of the hardest hallucination tests ever made with Max Planck Institute. 950 questions. Four domains where being wrong actually hurts. Legal. Medical. Research. Coding.

Then they ran every top model on it.

The results.

- GPT-5. Wrong 71.8% of the time.

- Claude Opus 4.5. Wrong 60% of the time.

- Gemini 3 Pro. Wrong 61.9% of the time.

- DeepSeek Reasoner. Wrong 76.8% of the time.

These are the smartest AI models on Earth. The ones you trust with your career. Your health. Your money. You think turning on web search fixes it.

It doesn't.

Claude Opus 4.5 with web search. Still wrong 30.2% of the time.

GPT-5.2 thinking with web search. Still wrong 38.2% of the time.

The internet attached. Still lying to you in 1 out of every 3 answers.

Now the part that should scare you.

Medical questions. The one place being wrong can kill you.

- GPT-5 hallucinated 92.8% of the time on medical guidelines.

- Claude Haiku 4.5 hallucinated 95.7% of the time.

- Gemini 3 Flash hallucinated 89% of the time.

Nine out of ten medical answers from popular AI models. Wrong.

It gets worse.

The longer you talk to it, the more it lies. Early mistakes cascade. The model starts citing its own earlier hallucinations as facts. Your third message is more wrong than your first.

The paper, in its own words: "hallucinations remain substantial even with web search." This is what hundreds of millions of people are doing right now. Asking software that lies in the majority of its answers. About their health. About their job. About their legal case. About their code.

Most are not checking.

Most never will.

But please. Keep using ChatGPT for medical advice.

The doctors need a break.

http://arxiv.org/abs/2602.01031

____________________________

It's artificial, for sure, but not intelligent....

Thread: Problems with AI, 2025 and Beyond

Thread Tools

Re: Problems with AI, 2025 and Beyond

The Following 9 Users Say Thank You to Tintin For This Post:

Re: Problems with AI, 2025 and Beyond

The Following 9 Users Say Thank You to sdv For This Post:

Re: Problems with AI, 2025 and Beyond

The Following 9 Users Say Thank You to Jaak For This Post:

Re: Problems with AI, 2025 and Beyond

The Following 5 Users Say Thank You to ExomatrixTV For This Post:

Re: Problems with AI, 2025 and Beyond

The Following 6 Users Say Thank You to ExomatrixTV For This Post:

Re: Problems with AI, 2025 and Beyond

The Following 7 Users Say Thank You to ExomatrixTV For This Post:

HALLUHARD: A Hard Multi-Turn Hallucination Benchmark | Fan et-al February 2026

The Following 6 Users Say Thank You to Tintin For This Post:

Bookmarks

Bookmarks

Posting Permissions