基于知识导向的上下文优化的可视化语言提示调整 Visual-Language Prompt Tuning with Knowledge-guided Context Optimization

作者:Hantao Yao Rui Zhang Changsheng Xu

即时调优是使用任务相关文本标记使预先训练的视觉语言模型(VLM)适应下游任务的有效方法。基于CoOp的代表性工作将可学习的文本标记与类标记相结合,以获得特定的文本知识。然而,特定的文本知识对看不见的类的泛化能力较差,因为它忘记了具有较强泛化能力的基本的一般文本知识。为了解决这个问题,我们引入了一种新的知识引导上下文优化(KgCoOp),以增强可学习提示对看不见类的泛化能力。KgCoOp的关键见解是,可以通过减少可学习提示和手工提示之间的差异来缓解对基本知识的遗忘。特别是,KgCoOp将学习提示生成的文本嵌入与手工制作的提示之间的差异降至最低。最后,在对比损失ca上添加KgCoOp

Prompt tuning is an effective way to adapt the pre-trained visual-language model (VLM) to the downstream task using task-related textual tokens. Representative CoOp-based work combines the learnable textual tokens with the class tokens to obtain specific textual knowledge. However, the specific textual knowledge is the worse generalization to the unseen classes because it forgets the essential general textual knowledge having a strong generalization ability. To tackle this issue, we introduce a novel Knowledge-guided Context Optimization (KgCoOp) to enhance the generalization ability of the learnable prompt for unseen classes. The key insight of KgCoOp is that forgetting about essential knowledge can be alleviated by reducing the discrepancy between the learnable prompt and the hand-crafted prompt. Especially, KgCoOp minimizes the discrepancy between the textual embeddings generated by learned prompts and the hand-crafted prompts. Finally, adding the KgCoOp upon the contrastive loss can make a discriminative prompt for both seen and unseen tasks. Extensive evaluation of several benchmarks demonstrates that the proposed Knowledge-guided Context Optimization is an efficient method for prompt tuning, \emph{i.e.,} achieves better performance with less training time.

论文链接:http://arxiv.org/pdf/2303.13283v1

更多计算机论文:http://cspaper.cn/

Related posts