文件列表:
朝着参数效率化迈进:具有动态容量的分层稀疏激活变压器【英文版】.pdf |
下载文档 |
资源简介
>
英文标题:Towards Being Parameter-Efficient: A Stratified Sparsely Activated Transformer with Dynamic Capacity中文摘要:本文提出了分层专家混合(SMoE)模型,该模型具有分层结构,可以为不同令牌分配动态容量,可用于提高机器翻译中的性能和减少参数不足问题。SMoE 在两个多语言机器翻译基准测试上表现出色,优于多个最先进的 MoE 模型。英文摘要:Mixture-of-experts (MoE) models that employ sparse activation havedemonstrated effectiveness in significantly increasing the number of parameterswhile maintaining low computational requirements per token. However, recentstudies have established that MoE models are inherently
加载中...
已阅读到文档的结尾了