Marco Donzella
YOU?
Author Swipe
View article: Improving image captioning descriptiveness by ranking and LLM-based fusion
Improving image captioning descriptiveness by ranking and LLM-based fusion Open
State-of-the-art (SoTA) image captioning models are often trained on the MicroSoft Common Objects in Context (MS-COCO) dataset, which contains human-annotated captions with an average length of approximately ten tokens. Although effective …
View article: Improving Image Captioning Descriptiveness by Ranking and LLM-based Fusion
Improving Image Captioning Descriptiveness by Ranking and LLM-based Fusion Open
State-of-The-Art (SoTA) image captioning models are often trained on the MicroSoft Common Objects in Context (MS-COCO) dataset, which contains human-annotated captions with an average length of approximately ten tokens. Although effective …