×
img

Moshi:2024实时对话的语音-文本基础模型技术(英文版)

发布者:wx****76
2024-10-14
5 MB 67 页
文件列表:
Moshi:2024实时对话的语音-文本基础模型技术(英文版).pdf
下载文档

We introduce Moshi, a speech-text foundation model and full-duplex spoken dialogue framework. Current systems for spoken dialogue rely on pipelines of independent components, namely voice activity detection, speech recognition, textual dialogue and text-to-speech. Such frameworks cannot emulate the experience of real conversations. First, their complexity induces a latency of several seconds between interactions. Second, text being the intermediate modality for dialogue, non-linguistic inform


加载中...

本文档仅能预览20页

继续阅读请下载文档

网友评论>

开通智库会员享超值特权
专享文档
免费下载
免广告
更多特权
立即开通

发布机构

更多>>