Deepseek:2025年原生稀疏注意力:硬件对齐且可原生训练的稀疏注意力机制技术(英文版)
Deepseek:2025年原生稀疏注意力:硬件对齐且可原生训练的稀疏注意力机制技术(英文版).pdf |
下载文档 |
资源简介
Long-context modeling is crucial for next-generation language models, yet the high computational cost of standard attention mechanisms poses significant computational challenges. Sparse attention offers a promising direction for improving efficiency while maintaining model capabilities. We present NSA, a Natively trainable Sparse Attention mechanism that integrates algorithmic innovations with hardware-aligned optimizations to achieve efficient long-context modeling. NSA employs a dynamic hie
本文档仅能预览20页