×
img

延迟、组合和部分匿名回报的强化学习(英文版)

发布者:wx****22
2023-05-06
216 KB 18 页
人工智能(AI)
文件列表:
延迟、组合和部分匿名回报的强化学习【英文版】.pdf
下载文档
英文标题:Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward中文摘要:本文研究了具有延迟、组合和部分匿名奖励反馈的无限期望回报马尔可夫决策过程,并提出了名为 DUCRL2 的算法来获得近似最优策略,并证明其达到了类似于 ODS 的遗憾界。英文摘要:We investigate an infinite-horizon average reward Markov Decision Process(MDP) with delayed, composite, and partially anonymous reward feedback. Thedelay and compositeness of rewards mean that rewards generated as a result oftaking an action at a given state are fragmented into different components, andthey are sequentia

加载中...

已阅读到文档的结尾了

下载文档

网友评论>