Léo Tronchon
YOU?
Author Swipe
View article: Surfer-H Meets Holo1: Cost-Efficient Web Agent Powered by Open Weights
Surfer-H Meets Holo1: Cost-Efficient Web Agent Powered by Open Weights Open
We present Surfer-H, a cost-efficient web agent that integrates Vision-Language Models (VLM) to perform user-defined tasks on the web. We pair it with Holo1, a new open-weight collection of VLMs specialized in web navigation and informatio…
View article: Building and better understanding vision-language models: insights and future directions
Building and better understanding vision-language models: insights and future directions Open
The field of vision-language models (VLMs), which take images and texts as inputs and output texts, is rapidly evolving and has yet to reach consensus on several key aspects of the development pipeline, including data, architecture, and tr…
View article: What matters when building vision-language models?
What matters when building vision-language models? Open
The growing interest in vision-language models (VLMs) has been driven by improvements in large language models and vision transformers. Despite the abundance of literature on this subject, we observe that critical decisions regarding the d…
View article: Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset Open
Using vision-language models (VLMs) in web development presents a promising strategy to increase efficiency and unblock no-code solutions: by providing a screenshot or a sketch of a UI, a VLM could generate the code to reproduce it, for in…
View article: OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents Open
Large multimodal models trained on natural documents, which interleave images and text, outperform models trained on image-text pairs on various multimodal benchmarks. However, the datasets used to train these models have not been released…