文件列表:
联合国粮农组织:2024强化经验反馈学习:在经济政策中的应用报告(英文版).pdf |
下载文档 |
资源简介
>
Learning from the past is critical for shaping the future, especially when it comes to economic policymaking. Building upon the current methods in the application of Reinforcement Learning (RL) to the large language models (LLMs), this paper introduces Reinforcement Learning from Experience Feedback (RLXF), a procedure that tunes LLMs based on lessons from past experiences. RLXF integrates historical experiences into LLM training in two key ways - by training reward models on historical data, a
加载中...
本文档仅能预览20页