×
Recent testing shows DeepSeek hallucinates much more than competing models
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

A new AI reasoning model from DeepSeek has been found to produce significantly more false or hallucinated responses compared to similar AI models, according to testing by enterprise AI startup Vectara.

Key findings: Vectara’s testing revealed that DeepSeek’s R1 model demonstrates notably higher rates of hallucination compared to other reasoning and open-source AI models.

  • OpenAI and Google’s closed reasoning models showed the lowest rates of hallucination in the tests
  • Alibaba’s Qwen model performed best among models with partially public code
  • DeepSeek’s earlier V3 model, which served as the foundation for R1, showed three times better accuracy than its successor

Technical context: AI hallucination refers to when AI models generate false or made-up information while appearing to provide accurate responses.

  • The issue stems from problems in the fine-tuning process rather than the reasoning capabilities themselves
  • Fine-tuning requires careful balance to maintain multiple capabilities while enhancing specific features
  • According to Vectara’s head of developer relations Ofer Mendelevitch, DeepSeek will likely address these issues in future updates

Independent verification: Recent testing by Wired writer Reece Rogers corroborates Vectara’s findings about DeepSeek’s accuracy issues.

  • Rogers identified both hallucination and moderation problems during his evaluation
  • Questions remain about the training data used to develop the model
  • Despite these issues, Rogers suggested DeepSeek could be a significant competitor to U.S.-based AI companies

Looking ahead: While DeepSeek’s current performance raises concerns about reliability, the broader trend suggests that reasoning models will continue to improve through iterative development and refined training methods. The challenge lies in maintaining multiple capabilities while enhancing specific features like reasoning, highlighting the complexity of developing advanced AI systems.

DeepSeek hallucinates alarmingly more than other AI models

Recent News

Musk-backed DOGE project targets federal workforce with AI automation

DOGE recruitment effort targets 300 standardized roles affecting 70,000 federal employees, sparking debate over AI readiness for government work.

AI tools are changing workflows more than they are cutting jobs

Counterintuitively, the Danish study found that ChatGPT and similar AI tools created new job tasks for workers and saved only about three hours of labor monthly.

Disney abandons Slack after hacker steals terabytes of confidential data using fake AI tool

A Disney employee fell victim to malware disguised as an AI art tool, enabling the hacker to steal 1.1 terabytes of confidential data and forcing the company to abandon Slack entirely.