交互式图像描述：多样的多模态控制（英文版）

发布者：wx****bf

2023-05-05

5 MB 11 页

人工智能（AI）

文件列表：

交互式图像描述：多样的多模态控制【英文版】.pdf

下载文档

资源简介

英文标题：Caption Anything: Interactive Image Description with Diverse Multimodal Controls中文摘要：本文介绍 Caption AnyThing (CAT)，一个支持广泛的多模控制的图像字幕化框架，该框架通过 Segment Anything Model (SAM) 和 ChatGPT 将视觉和语言提示统一成模块化框架，支持不同控制的灵活组合，以获得更好的用户交互建模。英文摘要：Controllable image captioning is an emerging multimodal topic that aims todescribe the image with natural language following human purpose,$\textit{e.g.}$, looking at the specified regions or telling in a particulartext style. State-of-the-art methods are trained on an

加载中...

已阅读到文档的结尾了

下载文档