文件列表:
无需重新搜索的研究:最大更新参数化在各个尺度上实现准确的损失预测【英文版】.pdf |
下载文档 |
资源简介
>
英文标题:Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales中文摘要:本研究提出了一种新的解决大规模语言模型研究验证成本高的问题的范式,通过发现 Maximal Update parametrization(muP)可以使超参数的缩放定律精确拟合,并允许在训练开始之前使用损失预测直接比较不同模型。英文摘要:As language models scale up, it becomes increasingly expensive to verifyresearch ideas because conclusions on small models do not trivially transfer tolarge ones. A possible solution is to establish a generic system that directlypredicts some metrics for large model
加载中...
已阅读到文档的结尾了