文件列表:
BranchNorm: 极深 Transformer 网络的鲁棒缩放【英文版】.pdf |
下载文档 |
资源简介
>
英文标题:BranchNorm: Robustly Scaling Extremely Deep Transformers中文摘要:本文提出了一种名为 BranchNorm 的方法,用于以更好的效果平衡训练稳定性和收敛性来动态重新调整 Transformer 的分支。英文摘要:Recently, DeepNorm scales Transformers into extremely deep (i.e., 1000layers) and reveals the promising potential of deep scaling. To stabilize thetraining of deep models, DeepNorm (Wang et al., 2022) attempts to constrain themodel update to a constant value. Although applying such a constraint canbenefit the early stage of model training, it may lead to unde
加载中...
已阅读到文档的结尾了