×
img

DeepSeek-R1-通过以下方式激励LLMs中的推理能力强化学习(英文版)

发布者:wx****26
2025-04-10
1 MB 22 页
文件列表:
DeepSeek-R1-通过以下方式激励LLMs中的推理能力强化学习(英文版).pdf
下载文档

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1.DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities.Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguingreasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enha


加载中...

本文档仅能预览20页

继续阅读请下载文档

网友评论>

开通智库会员享超值特权
专享文档
免费下载
免广告
更多特权
立即开通

发布机构

更多>>