应用扩散模型进行图像字幕的多模态数据增强（英文版）

发布者：wx****69

2023-05-06

2 MB 16 页

人工智能（AI）

文件列表：

应用扩散模型进行图像字幕的多模态数据增强【英文版】.pdf

下载文档

资源简介

英文标题：Multimodal Data Augmentation for Image Captioning using Diffusion Models中文摘要：本研究提出了一种基于多模态数据增强技术的图像字幕生成方法，旨在解决图像字幕对齐困难的问题。实验证明，本方法可以通过高质量生成图像 - 字幕对来扩充训练数据集，从而提高模型的训练效率和预测准确性。英文摘要：Image captioning, an important vision-language task, often requires atremendous number of finely labeled image-caption pairs for learning theunderlying alignment between images and texts. In this paper, we proposed amultimodal data augmentation method, leveraging a recent text-to-image modelcalled Stable Diffusion,

加载中...

已阅读到文档的结尾了

下载文档