+ Reply to Thread
Page 3 of 3 FirstFirst 1 3
Results 41 to 47 of 47

Thread: Problems with AI, 2025 and Beyond

  1. Link to Post #41
    UK Moderator/Librarian/Administrator Tintin's Avatar
    Join Date
    3rd June 2017
    Location
    Project Avalon library
    Language
    English
    Age
    56
    Posts
    8,090
    Thanks
    89,508
    Thanked 70,288 times in 8,057 posts

    Default Re: Problems with AI, 2025 and Beyond

    This post deals with AI's inability to determine what's fake from real, in what could best be described as really very ironic indeed

    It couldn't tell a joke either, I suspect.
    ________________________________________

    Source: https://x.com/HedgieMarkets/status/2042430442448548273
    🦔A researcher invented a fake eye condition called bixonimania, uploaded two obviously fraudulent papers about it to an academic server, and watched major AI systems present it as real medicine within weeks.
    The fake papers thanked Starfleet Academy, cited funding from the Professor Sideshow Bob Foundation and the University of Fellowship of the Ring, and stated mid-paper that the entire thing was made up. Google's Gemini told users it was caused by blue light. Perplexity cited its prevalence at one in 90,000 people.

    ChatGPT advised users whether their symptoms matched. The fake research was then cited in a peer-reviewed journal that only retracted it after Nature contacted the publisher.

    My Take
    The researcher made the papers as obviously fake as possible on purpose. The AI systems didn't catch it. Neither did the human researchers who cited it in real journals, which means people are feeding AI-generated references into their work without reading what they're actually citing.

    I've covered the FDA using AI for drug review, the NYC hospital CEO ready to replace radiologists, and ChatGPT Health launching this year. All of that is happening in the same environment where a condition funded by a Simpsons character and endorsed by the crew of the Enterprise was being presented as emerging medical consensus. The people making these deployment decisions seem to believe the pipeline from research to AI to patient is more supervised than it actually is. This experiment suggests it isn't supervised much at all.

    Hedgie🤗
    _____________________________

    The original source is Nature magazine: https://www.nature.com/articles/d41586-026-01100-y
    Scientists invented a fake disease. AI told people it was real
    “If a man does not keep pace with [fall into line with] his companions, perhaps it is because he hears a different drummer. Let him step to the music which he hears, however measured or far away.” - Thoreau

  2. The Following 9 Users Say Thank You to Tintin For This Post:

    Bill Ryan (10th April 2026), Chip (10th April 2026), Ewan (10th April 2026), Harmony (10th April 2026), Johnnycomelately (10th April 2026), Matthew (11th April 2026), ronny (Yesterday), sdv (10th April 2026), Yoda (13th April 2026)

  3. Link to Post #42
    Avalon Member sdv's Avatar
    Join Date
    5th March 2012
    Location
    On a farm in the Klein Karoo
    Posts
    1,312
    Thanks
    5,285
    Thanked 5,844 times in 1,185 posts

    Default Re: Problems with AI, 2025 and Beyond

    Quote Posted by Tintin (here)
    This post deals with AI's inability to determine what's fake from real, in what could best be described as really very ironic indeed

    It couldn't tell a joke either, I suspect.
    ________________________________________

    Source: https://x.com/HedgieMarkets/status/2042430442448548273
    🦔A researcher invented a fake eye condition called bixonimania, uploaded two obviously fraudulent papers about it to an academic server, and watched major AI systems present it as real medicine within weeks.
    The fake papers thanked Starfleet Academy, cited funding from the Professor Sideshow Bob Foundation and the University of Fellowship of the Ring, and stated mid-paper that the entire thing was made up. Google's Gemini told users it was caused by blue light. Perplexity cited its prevalence at one in 90,000 people.

    ChatGPT advised users whether their symptoms matched. The fake research was then cited in a peer-reviewed journal that only retracted it after Nature contacted the publisher.

    My Take
    The researcher made the papers as obviously fake as possible on purpose. The AI systems didn't catch it. Neither did the human researchers who cited it in real journals, which means people are feeding AI-generated references into their work without reading what they're actually citing.

    I've covered the FDA using AI for drug review, the NYC hospital CEO ready to replace radiologists, and ChatGPT Health launching this year. All of that is happening in the same environment where a condition funded by a Simpsons character and endorsed by the crew of the Enterprise was being presented as emerging medical consensus. The people making these deployment decisions seem to believe the pipeline from research to AI to patient is more supervised than it actually is. This experiment suggests it isn't supervised much at all.

    Hedgie🤗
    _____________________________

    The original source is Nature magazine: https://www.nature.com/articles/d41586-026-01100-y
    Scientists invented a fake disease. AI told people it was real
    I used to think that the most dangerous outcome for AI would be that most people would stop developing and using critical thinking skills. Now I am not that sure and see that the dangers go way beyond what I could imagine.

    How can we get AI under control and still keep it as a useful tool?
    Sandie
    Somewhere, something incredible is waiting to be known. (Carl Sagan)

  4. The Following 9 Users Say Thank You to sdv For This Post:

    Bill Ryan (10th April 2026), Chip (10th April 2026), Ewan (10th April 2026), ExomatrixTV (Today), Harmony (11th April 2026), Johnnycomelately (10th April 2026), Matthew (10th April 2026), Tintin (10th April 2026), Yoda (13th April 2026)

  5. Link to Post #43
    Estonia Avalon Member
    Join Date
    20th February 2023
    Language
    Estonian
    Age
    38
    Posts
    1,075
    Thanks
    2,799
    Thanked 7,994 times in 1,066 posts

    Default Re: Problems with AI, 2025 and Beyond


  6. The Following 9 Users Say Thank You to Jaak For This Post:

    Bill Ryan (Yesterday), Chip (10th April 2026), ExomatrixTV (13th April 2026), Harmony (11th April 2026), Johnnycomelately (10th April 2026), Matthew (10th April 2026), sdv (11th April 2026), Tintin (Yesterday), Yoda (13th April 2026)

  7. Link to Post #44
    Netherlands Avalon Member ExomatrixTV's Avatar
    Join Date
    23rd September 2011
    Location
    Netherlands
    Language
    English, Dutch, German, Limburgs
    Age
    60
    Posts
    29,691
    Thanks
    44,523
    Thanked 166,320 times in 27,713 posts

    Exclamation Re: Problems with AI, 2025 and Beyond

    • New Battles RAGE Across The Country Over Data Center Construction!:

    • We need to talk:
    @DaveShap quote:

    "This video has been up for 16 minutes, and I've already had to ban a few people for various levels of inciting violence or cheerleading violence. I am going to keep the comments open, but I encourage everyone to report violent comments, or comments that advocate or cheerlead more violence".
    No need to follow anyone, only consider broadening (y)our horizon of possibilities ...

  8. The Following 5 Users Say Thank You to ExomatrixTV For This Post:

    Bill Ryan (Yesterday), Harmony (13th April 2026), Johnnycomelately (13th April 2026), Tintin (Yesterday), Yoda (13th April 2026)

  9. Link to Post #45
    Netherlands Avalon Member ExomatrixTV's Avatar
    Join Date
    23rd September 2011
    Location
    Netherlands
    Language
    English, Dutch, German, Limburgs
    Age
    60
    Posts
    29,691
    Thanks
    44,523
    Thanked 166,320 times in 27,713 posts

    Default Re: Problems with AI, 2025 and Beyond

    • The AI Expert Who Thinks We've Already Lost — Dr Roman Yampolskiy

    00:00 Trailer 01:11 Why AI Safety Matters 05:11 Early Warnings & Risks 08:20 Exponential AI Progress 10:29 AI Survival Instincts 14:32 Can Nations Stop AI? 17:26 Ad: Quo 18:46 Why Safety May Be Impossible 25:19 Jobs, Meaning & Society 32:17 Best-Case AI Future 35:08 Ad: Qualia 36:47 AI Bias and Existential Risks 46:28 AI Warfare & Deepfakes 53:41 Ad: Hillsdale College 55:02 What Should We Do? 01:09:31 What's The One Thing We're Not Talking About?
    No need to follow anyone, only consider broadening (y)our horizon of possibilities ...

  10. The Following 6 Users Say Thank You to ExomatrixTV For This Post:

    Bill Ryan (Yesterday), Ewan (17th April 2026), Harmony (19th April 2026), Johnnycomelately (17th April 2026), Tintin (Yesterday), Yoda (18th April 2026)

  11. Link to Post #46
    Netherlands Avalon Member ExomatrixTV's Avatar
    Join Date
    23rd September 2011
    Location
    Netherlands
    Language
    English, Dutch, German, Limburgs
    Age
    60
    Posts
    29,691
    Thanks
    44,523
    Thanked 166,320 times in 27,713 posts

    Default Re: Problems with AI, 2025 and Beyond

    • The AI Backlash Has Reached a Tipping Point:

    Data center pays no property taxes but if you want to build a barn on your own land now you have to pay more!
    Last edited by ExomatrixTV; 19th April 2026 at 11:24.
    No need to follow anyone, only consider broadening (y)our horizon of possibilities ...

  12. The Following 7 Users Say Thank You to ExomatrixTV For This Post:

    Bill Ryan (Yesterday), Ewan (Yesterday), Harmony (19th April 2026), Johnnycomelately (19th April 2026), Sunny (18th April 2026), Tintin (19th April 2026), Yoda (18th April 2026)

  13. Link to Post #47
    UK Moderator/Librarian/Administrator Tintin's Avatar
    Join Date
    3rd June 2017
    Location
    Project Avalon library
    Language
    English
    Age
    56
    Posts
    8,090
    Thanks
    89,508
    Thanked 70,288 times in 8,057 posts

    Default HALLUHARD: A Hard Multi-Turn Hallucination Benchmark | Fan et-al February 2026

    A helpful summary from Nav Toor follows the abstract presented below:

    ______________________________

    HALLUHARD: A Hard Multi-Turn Hallucination Benchmark
    Authors: Dongyang Fan Sebastien Delsad Nicolas Flammarion Maksym Andriushchenko
    Source: https://arxiv.org/abs/2602.01031
    PDF view/download: https://arxiv.org/pdf/2602.01031

    ABSTRACT
    Large language models (LLMs) still produce plausible-sounding but ungrounded factual claims, a problem that worsens in multi-turn dialogue as context grows and early errors cascade. We introduce , a challenging multi-turn hallucination benchmark with 950 seed questions spanning four high-stakes domains: legal cases, research questions, medical guidelines, and coding. We operationalize groundedness by requiring inline citations for factual assertions. To support reliable evaluation in open-ended settings, we propose a judging pipeline that iteratively retrieves evidence via web search. It can fetch, filter, and parse full-text sources (including PDFs) to assess whether cited material actually supports the generated content. Across a diverse set of frontier proprietary and open-weight models, hallucinations remain substantial even with web search ( for the strongest configuration, Opus-4.5 with web search), with content-grounding errors persisting at high rates. Finally, we show that hallucination behavior is shaped by model capacity, turn position, effective reasoning, and the type of knowledge required.
    _____________________________

    Summary via Nav Toor on X:
    Researchers at EPFL proved your AI is lying to you.

    Not sometimes. Most of the time. They built one of the hardest hallucination tests ever made with Max Planck Institute. 950 questions. Four domains where being wrong actually hurts. Legal. Medical. Research. Coding.

    Then they ran every top model on it.

    The results.
    - GPT-5. Wrong 71.8% of the time.

    - Claude Opus 4.5. Wrong 60% of the time.

    - Gemini 3 Pro. Wrong 61.9% of the time.

    - DeepSeek Reasoner. Wrong 76.8% of the time.
    These are the smartest AI models on Earth. The ones you trust with your career. Your health. Your money. You think turning on web search fixes it.

    It doesn't.

    Claude Opus 4.5 with web search. Still wrong 30.2% of the time.

    GPT-5.2 thinking with web search. Still wrong 38.2% of the time.

    The internet attached. Still lying to you in 1 out of every 3 answers.

    Now the part that should scare you.

    Medical questions. The one place being wrong can kill you.
    - GPT-5 hallucinated 92.8% of the time on medical guidelines.

    - Claude Haiku 4.5 hallucinated 95.7% of the time.

    - Gemini 3 Flash hallucinated 89% of the time.
    Nine out of ten medical answers from popular AI models. Wrong.

    It gets worse.

    The longer you talk to it, the more it lies. Early mistakes cascade. The model starts citing its own earlier hallucinations as facts. Your third message is more wrong than your first.

    The paper, in its own words: "hallucinations remain substantial even with web search." This is what hundreds of millions of people are doing right now. Asking software that lies in the majority of its answers. About their health. About their job. About their legal case. About their code.

    Most are not checking.

    Most never will.

    But please. Keep using ChatGPT for medical advice.

    The doctors need a break.

    http://arxiv.org/abs/2602.01031
    ____________________________

    It's artificial, for sure, but not intelligent....
    “If a man does not keep pace with [fall into line with] his companions, perhaps it is because he hears a different drummer. Let him step to the music which he hears, however measured or far away.” - Thoreau

  14. The Following 6 Users Say Thank You to Tintin For This Post:

    Bill Ryan (Yesterday), Ewan (Yesterday), ExomatrixTV (Today), Harmony (Yesterday), ronny (Yesterday), Yoda (Yesterday)

+ Reply to Thread
Page 3 of 3 FirstFirst 1 3

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts