×
China-based DeepSeek just released a very powerful ultra large AI model
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

DeepSeek, a Chinese AI startup, has released DeepSeek-V3, a new ultra-large AI model with 671B parameters that outperforms leading open-source competitors while approaching the capabilities of prominent closed-source models.

Key innovations: DeepSeek-V3 employs a mixture-of-experts architecture that selectively activates only 37B of its 671B parameters for each task, enabling efficient processing while maintaining high performance.

  • The model introduces an auxiliary loss-free load-balancing strategy that optimizes expert utilization without compromising performance
  • A new multi-token prediction feature allows the model to generate 60 tokens per second, three times faster than previous versions
  • The system uses multi-head latent attention (MLA) and DeepSeekMoE architectures for efficient training and inference

Technical specifications: The model underwent extensive training on 14.8T high-quality tokens and features significant context length capabilities.

  • DeepSeek-V3’s context length was extended in two stages, first to 32K and then to 128K
  • The training process included supervised fine-tuning and reinforcement learning to align with human preferences
  • The company implemented various optimizations, including FP8 mixed precision training and the DualPipe algorithm

Cost efficiency: DeepSeek achieved remarkable cost savings in the training process compared to industry standards.

  • The entire training process required approximately 2788K H800 GPU hours, costing about $5.57 million
  • This represents a significant reduction from typical training costs, such as the estimated $500 million spent on Llama-3.1

Performance benchmarks: DeepSeek-V3 demonstrates superior performance across multiple evaluation metrics.

  • The model outperforms open-source competitors like Llama-3.1-405B and Qwen 2.5-72B
  • It shows particular strength in Chinese language and mathematical tasks, scoring 90.2 on the Math-500 test
  • While matching or exceeding GPT-4o in most areas, it falls behind in specific English-focused tests like SimpleQA and FRAMES

Accessibility and pricing: The model is available through multiple channels with competitive pricing structure.

  • The code is accessible via GitHub under an MIT license
  • Users can access the model through DeepSeek Chat or via API for commercial applications
  • API pricing is set at $0.27/million input tokens and $1.10/million output tokens after February 8

Market implications: The emergence of DeepSeek-V3 signals a significant shift in the competitive landscape between open-source and closed-source AI models, potentially democratizing access to advanced AI capabilities while challenging the dominance of established players in the field.

DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch

Recent News

Nordic countries emerge as prime locations for AI infrastructure

With abundant renewable energy and ideal climate conditions, the Nordic region is attracting major tech investments in computing infrastructure that reconciles AI's massive power demands with environmental sustainability.

Analysis: Gov. agencies must accelerate innovation amid economic crisis, AI “gold rush”

Amid budget cuts and workforce reductions, federal agencies are turning to strategic AI adoption to maintain mission-critical operations with fewer resources.

Spreading out: Startups build cutting-edge AI models without data centers

Distributed computing enables AI startups to train models by connecting regular GPUs over the internet, bypassing the need for expensive data centers.