文件列表:
关于 LayerNorm 在 Transformer 注意力机制中表现力的作用【英文版】.pdf |
下载文档 |
资源简介
>
英文标题:On the Expressivity Role of LayerNorm in Transformers' Attention中文摘要:本文表明,LayerNorm 是 Transformer 模型中 multi-head attention 层表现力的重要组成部分,其投影和缩放两个步骤对于注意力机制的作用至关重要。英文摘要:Layer Normalization (LayerNorm) is an inherent component in allTransformer-based models. In this paper, we show that LayerNorm is crucial tothe expressivity of the multi-head attention layer that follows it. This is incontrast to the common belief that LayerNorm's only role is to normalize theactivations during the forward pass,
加载中...
已阅读到文档的结尾了