Moshi：2024实时对话的语音-文本基础模型技术（英文版）

发布者：wx****76

2024-10-14

5 MB 67 页

文件列表：

Moshi：2024实时对话的语音-文本基础模型技术（英文版）.pdf

资源简介

We introduce Moshi, a speech-text foundation model and full-duplex spoken dialogue framework. Current systems for spoken dialogue rely on pipelines of independent components, namely voice activity detection, speech recognition, textual dialogue and text-to-speech. Such frameworks cannot emulate the experience of real conversations. First, their complexity induces a latency of several seconds between interactions. Second, text being the intermediate modality for dialogue, non-linguistic inform

加载中...

本文档仅能预览20页

继续阅读请下载文档