deepseek：2024年DeepSeekV3技术报告（英文版）

发布者：wx****1e

2025-02-07

2 MB 53 页

科技

文件列表：

deepseek：2024年DeepSeekV3技术报告（英文版）.pdf

下载文档

资源简介

We present DeepSeek-V3, a strong, Mixture-of-Experts (MoE) language model with 671B totalparameters with 37B activated for each token, To achieve efficient inference and cost-effectivetraining, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneersan auxiliary-loss-free strategy for load balancing and sets a multi-token prediction trainingobjective for stronger performance. We pre-tr

加载中...

本文档仅能预览20页

继续阅读请下载文档