Bill Psomas
YOU?
Author Swipe
Attention, Please! Revisiting Attentive Probing Through the Lens of Efficiency Open
As fine-tuning becomes increasingly impractical at scale, probing is emerging as the preferred evaluation protocol. Yet, the standard linear probing fails to adequately reflect the potential of models whose pre-training optimizes represent…
Composed Image Retrieval for Training-Free Domain Conversion Open
This work addresses composed image retrieval in the context of domain conversion, where the content of a query image is retrieved in the domain specified by the query text. We show that a strong vision-language model provides sufficient de…
Evaluation of Resource-Efficient Crater Detectors on Embedded Systems Open
Real-time analysis of Martian craters is crucial for mission-critical operations, including safe landings and geological exploration. This work leverages the latest breakthroughs for on-the-edge crater detection aboard spacecraft. We rigor…
Composed Image Retrieval for Remote Sensing Open
This work introduces composed image retrieval to remote sensing. It allows to query a large image archive by image examples alternated by a textual description, enriching the descriptive power over unimodal queries, either visual or textua…
Keep It SimPool: Who Said Supervised Transformers Suffer from Attention Deficit? Open
Convolutional networks and vision transformers have different forms of pairwise interactions, pooling across layers and pooling at the end of the network. Does the latter really need to be different? As a by-product of pooling, vision tran…
View article: OpenFilter: A Framework to Democratize Research Access to Social Media AR Filters
OpenFilter: A Framework to Democratize Research Access to Social Media AR Filters Open
Augmented Reality or AR filters on selfies have become very popular on social media platforms for a variety of applications, including marketing, entertainment and aesthetics. Given the wide adoption of AR face filters and the importance o…
What to Hide from Your Students: Attention-Guided Masked Image Modeling Open
Transformers and masked language modeling are quickly being adopted and explored in computer vision as vision transformers and masked image modeling (MIM). In this work, we argue that image token masking differs from token masking in text,…
View article: It Takes Two to Tango: Mixup for Deep Metric Learning
It Takes Two to Tango: Mixup for Deep Metric Learning Open
Metric learning involves learning a discriminative representation such that embeddings of similar classes are encouraged to be close, while embeddings of dissimilar classes are pushed far apart. State-of-the-art methods focus mostly on sop…