基于贝叶斯赌博机的策略不变显式塑形方法，用于融合外部建议的强化学习（英文版）

发布者：wx****13

2023-04-22

730 KB 8 页

人工智能（AI）

文件列表：

基于贝叶斯赌博机的策略不变显式塑形方法，用于融合外部建议的强化学习【英文版】.pdf

下载文档

资源简介

英文标题：Bandit-Based Policy Invariant Explicit Shaping for Incorporating External Advice in Reinforcement Learning中文摘要：该论文提出一种名为 Shaping-Bandits 的多臂赌博问题来解决如何将外部建议纳入强化学习智能体的学习之中，并提出了三种不同的塑形算法，旨在考虑遵循专家策略或默认 RL 算法的长期后果。通过实验验证这些算法在四个不同的设置中实现了所述目标。英文摘要：A key challenge for a reinforcement learning (RL) agent is to incorporateexternal/expert1 advice in its learning. The desired goals of an algorithm thatcan shape the learning of an RL agent with external advice include (a)maintaining policy invariance; (

加载中...

已阅读到文档的结尾了

下载文档