×
img

BranchNorm: 极深 Transformer 网络的鲁棒缩放(英文版)

发布者:wx****b5
2023-05-05
3 MB 12 页
人工智能(AI)
文件列表:
BranchNorm: 极深 Transformer 网络的鲁棒缩放【英文版】.pdf
下载文档
英文标题:BranchNorm: Robustly Scaling Extremely Deep Transformers中文摘要:本文提出了一种名为 BranchNorm 的方法,用于以更好的效果平衡训练稳定性和收敛性来动态重新调整 Transformer 的分支。英文摘要:Recently, DeepNorm scales Transformers into extremely deep (i.e., 1000layers) and reveals the promising potential of deep scaling. To stabilize thetraining of deep models, DeepNorm (Wang et al., 2022) attempts to constrain themodel update to a constant value. Although applying such a constraint canbenefit the early stage of model training, it may lead to unde

加载中...

已阅读到文档的结尾了

下载文档

网友评论>

开通智库会员享超值特权
专享文档
免费下载
免广告
更多特权
立即开通

发布机构

更多>>