文件列表:
通过掩码形态生长加速两倍的语言模型预训练【英文版】.pdf |
下载文档 |
资源简介
>
英文标题:2x Faster Language Model Pre-training via Masked Structural Growth中文摘要:本文主要介绍了一种名为 msg 的技术,提出了一种新的增长调度方案,其中包括所有可能的尺寸,并且是独立于新权重初始化的严格函数保持增长运算符。实验证明,与相关工作相比,MSG 速度提高了 80%的 Bert-base 和 120%的 Bert-large 预训练,并且能够同时提高调优性能英文摘要:Acceleration of large language model pre-training is a critical issue inpresent NLP research. In this paper, we focus on speeding up pre-training byprogressively growing from a small Transformer structure to a large one. Thereare two main research problems related to progres
加载中...
已阅读到文档的结尾了