延迟、组合和部分匿名回报的强化学习（英文版）

发布者：wx****22

2023-05-06

216 KB 18 页

人工智能（AI）

文件列表：

延迟、组合和部分匿名回报的强化学习【英文版】.pdf

下载文档

资源简介

英文标题：Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward中文摘要：本文研究了具有延迟、组合和部分匿名奖励反馈的无限期望回报马尔可夫决策过程，并提出了名为 DUCRL2 的算法来获得近似最优策略，并证明其达到了类似于 ODS 的遗憾界。英文摘要：We investigate an infinite-horizon average reward Markov Decision Process(MDP) with delayed, composite, and partially anonymous reward feedback. Thedelay and compositeness of rewards mean that rewards generated as a result oftaking an action at a given state are fragmented into different components, andthey are sequentia

加载中...

已阅读到文档的结尾了

下载文档