文件列表:
交互式图像描述:多样的多模态控制【英文版】.pdf |
下载文档 |
资源简介
>
英文标题:Caption Anything: Interactive Image Description with Diverse Multimodal Controls中文摘要:本文介绍 Caption AnyThing (CAT),一个支持广泛的多模控制的图像字幕化框架,该框架通过 Segment Anything Model (SAM) 和 ChatGPT 将视觉和语言提示统一成模块化框架,支持不同控制的灵活组合,以获得更好的用户交互建模。英文摘要:Controllable image captioning is an emerging multimodal topic that aims todescribe the image with natural language following human purpose,$\textit{e.g.}$, looking at the specified regions or telling in a particulartext style. State-of-the-art methods are trained on an
加载中...
已阅读到文档的结尾了