×
img

deepseek:2024年DeepSeekV3技术报告(英文版)

发布者:wx****1e
2025-02-07
2 MB 53 页
科技
文件列表:
deepseek:2024年DeepSeekV3技术报告(英文版).pdf
下载文档

We present DeepSeek-V3, a strong, Mixture-of-Experts (MoE) language model with 671B totalparameters with 37B activated for each token, To achieve efficient inference and cost-effectivetraining, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneersan auxiliary-loss-free strategy for load balancing and sets a multi-token prediction trainingobjective for stronger performance. We pre-tr


加载中...

本文档仅能预览20页

继续阅读请下载文档

网友评论>

开通智库会员享超值特权
专享文档
免费下载
免广告
更多特权
立即开通

发布机构

更多>>