China-based DeepSeek just released a very powerful ultra large AI model

Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage

Join Now

DeepSeek, a Chinese AI startup, has released DeepSeek-V3, a new ultra-large AI model with 671B parameters that outperforms leading open-source competitors while approaching the capabilities of prominent closed-source models.

Key innovations: DeepSeek-V3 employs a mixture-of-experts architecture that selectively activates only 37B of its 671B parameters for each task, enabling efficient processing while maintaining high performance.

The model introduces an auxiliary loss-free load-balancing strategy that optimizes expert utilization without compromising performance
A new multi-token prediction feature allows the model to generate 60 tokens per second, three times faster than previous versions
The system uses multi-head latent attention (MLA) and DeepSeekMoE architectures for efficient training and inference

Technical specifications: The model underwent extensive training on 14.8T high-quality tokens and features significant context length capabilities.

DeepSeek-V3’s context length was extended in two stages, first to 32K and then to 128K
The training process included supervised fine-tuning and reinforcement learning to align with human preferences
The company implemented various optimizations, including FP8 mixed precision training and the DualPipe algorithm

Cost efficiency: DeepSeek achieved remarkable cost savings in the training process compared to industry standards.

The entire training process required approximately 2788K H800 GPU hours, costing about $5.57 million
This represents a significant reduction from typical training costs, such as the estimated $500 million spent on Llama-3.1

Performance benchmarks: DeepSeek-V3 demonstrates superior performance across multiple evaluation metrics.

The model outperforms open-source competitors like Llama-3.1-405B and Qwen 2.5-72B
It shows particular strength in Chinese language and mathematical tasks, scoring 90.2 on the Math-500 test
While matching or exceeding GPT-4o in most areas, it falls behind in specific English-focused tests like SimpleQA and FRAMES

Accessibility and pricing: The model is available through multiple channels with competitive pricing structure.

The code is accessible via GitHub under an MIT license
Users can access the model through DeepSeek Chat or via API for commercial applications
API pricing is set at $0.27/million input tokens and $1.10/million output tokens after February 8

Market implications: The emergence of DeepSeek-V3 signals a significant shift in the competitive landscape between open-source and closed-source AI models, potentially democratizing access to advanced AI capabilities while challenging the dominance of established players in the field.

DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch

VentureBeat

Menu

China-based DeepSeek just released a very powerful ultra large AI model

Recent News

Coding craftsmanship revisited: Returning to time-tested practices

GitHub repo showcases RAG examples for Feast framework

NVIDIA data scientist Benika Hall turns fantasy sports into fraud detection

Join the revolution

CO/AI

Resources

Join the revolution

Menu

Welcome

China-based DeepSeek just released a very powerful ultra large AI model

Recent News

Coding craftsmanship revisited: Returning to time-tested practices

GitHub repo showcases RAG examples for Feast framework

NVIDIA data scientist Benika Hall turns fantasy sports into fraud detection

Join the revolution

CO/AI

Resources

Join the revolution