红帽:2025年AI推理实践指南:加速迈向高效之路(英文版).pdf |
下载文档 |
资源简介
Quantization reduces the size and resource requirements of AI models by storing their parameters (weights) and intermediate data (activations) in lower precision formats, using fewer bits per value. This technique helps manage resources efficiently, similar to compressing files on a computer. Done correctly, it does not significantly degrade the performance of the model.
本文档仅能预览20页