Exploring foci of:
arXiv (Cornell University)
Enhancing Large Vision Language Models with Self-Training on Image Comprehension
May 2024 • Yihe Deng, Pan Lu, Fan Yin, Ziniu Hu, Sheng Shen, James Zou, Kai‐Wei Chang, Wei Wang
Large vision language models (LVLMs) integrate large language models (LLMs) with pre-trained vision encoders, thereby activating the perception capability of the model to understand image inputs for different queries and conduct subsequent reasoning. Improving this capability requires high-quality vision-language data, which is costly and labor-intensive to acquire. Self-training approaches have been effective in single-modal settings to alleviate the need for labeled data by leveraging model's own generation. How…
Computer Science
Artificial Intelligence
Computer Vision
Geography
Programming Language
Meteorology