×
China-based DeepSeek just released a very powerful ultra large AI model
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

DeepSeek, a Chinese AI startup, has released DeepSeek-V3, a new ultra-large AI model with 671B parameters that outperforms leading open-source competitors while approaching the capabilities of prominent closed-source models.

Key innovations: DeepSeek-V3 employs a mixture-of-experts architecture that selectively activates only 37B of its 671B parameters for each task, enabling efficient processing while maintaining high performance.

  • The model introduces an auxiliary loss-free load-balancing strategy that optimizes expert utilization without compromising performance
  • A new multi-token prediction feature allows the model to generate 60 tokens per second, three times faster than previous versions
  • The system uses multi-head latent attention (MLA) and DeepSeekMoE architectures for efficient training and inference

Technical specifications: The model underwent extensive training on 14.8T high-quality tokens and features significant context length capabilities.

  • DeepSeek-V3’s context length was extended in two stages, first to 32K and then to 128K
  • The training process included supervised fine-tuning and reinforcement learning to align with human preferences
  • The company implemented various optimizations, including FP8 mixed precision training and the DualPipe algorithm

Cost efficiency: DeepSeek achieved remarkable cost savings in the training process compared to industry standards.

  • The entire training process required approximately 2788K H800 GPU hours, costing about $5.57 million
  • This represents a significant reduction from typical training costs, such as the estimated $500 million spent on Llama-3.1

Performance benchmarks: DeepSeek-V3 demonstrates superior performance across multiple evaluation metrics.

  • The model outperforms open-source competitors like Llama-3.1-405B and Qwen 2.5-72B
  • It shows particular strength in Chinese language and mathematical tasks, scoring 90.2 on the Math-500 test
  • While matching or exceeding GPT-4o in most areas, it falls behind in specific English-focused tests like SimpleQA and FRAMES

Accessibility and pricing: The model is available through multiple channels with competitive pricing structure.

  • The code is accessible via GitHub under an MIT license
  • Users can access the model through DeepSeek Chat or via API for commercial applications
  • API pricing is set at $0.27/million input tokens and $1.10/million output tokens after February 8

Market implications: The emergence of DeepSeek-V3 signals a significant shift in the competitive landscape between open-source and closed-source AI models, potentially democratizing access to advanced AI capabilities while challenging the dominance of established players in the field.

DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch

Recent News

Coding craftsmanship revisited: Returning to time-tested practices

Programmers debate the role of manual coding in an AI-assisted world, highlighting the tension between productivity and the cognitive benefits of solving problems independently.

GitHub repo showcases RAG examples for Feast framework

Feast's feature store framework streamlines RAG application development by providing organized access to document embeddings and semantic search capabilities.

NVIDIA data scientist Benika Hall turns fantasy sports into fraud detection

Former fantasy sports entrepreneur applies her data expertise to detect financial fraud using graph neural networks at NVIDIA.