Andreas Veit
YOU?
Author Swipe
View article: LatentCRF: Continuous CRF for Efficient Latent Diffusion
LatentCRF: Continuous CRF for Efficient Latent Diffusion Open
Latent Diffusion Models (LDMs) produce high-quality, photo-realistic images, however, the latency incurred by multiple costly inference iterations can restrict their applicability. We introduce LatentCRF, a continuous Conditional Random Fi…
View article: Efficient Document Ranking with Learnable Late Interactions
Efficient Document Ranking with Learnable Late Interactions Open
Cross-Encoder (CE) and Dual-Encoder (DE) models are two fundamental approaches for query-document relevance in information retrieval. To predict relevance, CE models use joint query-document embeddings, while DE models maintain factorized …
View article: Rethinking FID: Towards a Better Evaluation Metric for Image Generation
Rethinking FID: Towards a Better Evaluation Metric for Image Generation Open
As with many machine learning problems, the progress of image generation methods hinges on good evaluation metrics. One of the most popular is the Frechet Inception Distance (FID). FID estimates the distance between a distribution of Incep…
View article: MarkovGen: Structured Prediction for Efficient Text-to-Image Generation
MarkovGen: Structured Prediction for Efficient Text-to-Image Generation Open
Modern text-to-image generation models produce high-quality images that are both photorealistic and faithful to the text prompts. However, this quality comes at significant computational cost: nearly all of these models are iterative and r…
View article: Large Language Models with Controllable Working Memory
Large Language Models with Controllable Working Memory Open
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP), partly owing to the massive amounts of world knowledge they memorize during pretraining.While many downstream applications provide the…
View article: Large Language Models with Controllable Working Memory
Large Language Models with Controllable Working Memory Open
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP), owing to their excellent understanding and generation abilities. Remarkably, what further sets these models apart is the massive amoun…
View article: When does mixup promote local linearity in learned representations?
When does mixup promote local linearity in learned representations? Open
Mixup is a regularization technique that artificially produces new samples using convex combinations of original training points. This simple technique has shown strong empirical performance, and has been heavily used as part of semi-super…
View article: Teacher Guided Training: An Efficient Framework for Knowledge Transfer
Teacher Guided Training: An Efficient Framework for Knowledge Transfer Open
The remarkable performance gains realized by large pretrained models, e.g., GPT-3, hinge on the massive amounts of data they are exposed to during training. Analogously, distilling such large models to compact models for efficient deployme…
View article: Leveraging redundancy in attention with Reuse Transformers
Leveraging redundancy in attention with Reuse Transformers Open
Pairwise dot product-based attention allows Transformers to exchange information between tokens in an input-dependent way, and is key to their success across diverse applications in language and vision. However, a typical Transformer model…
View article: Eigen Analysis of Self-Attention and its Reconstruction from Partial Computation
Eigen Analysis of Self-Attention and its Reconstruction from Partial Computation Open
State-of-the-art transformer models use pairwise dot-product based self-attention, which comes at a computational cost quadratic in the input sequence length. In this paper, we investigate the global structure of attention scores computed …
View article: Understanding Robustness of Transformers for Image Classification
Understanding Robustness of Transformers for Image Classification Open
Deep Convolutional Neural Networks (CNNs) have long been the architecture of choice for computer vision tasks. Recently, Transformer-based architectures like Vision Transformer (ViT) have matched or even surpassed ResNets for image classif…
View article: On the Reproducibility of Neural Network Predictions
On the Reproducibility of Neural Network Predictions Open
Standard training techniques for neural networks involve multiple sources of randomness, e.g., initialization, mini-batch ordering and in some cases data augmentation. Given that neural networks are heavily over-parameterized in practice, …
View article: Improving Calibration in Deep Metric Learning With Cross-Example Softmax
Improving Calibration in Deep Metric Learning With Cross-Example Softmax Open
Modern image retrieval systems increasingly rely on the use of deep neural networks to learn embedding spaces in which distance encodes the relevance between a given query and image. In this setting, existing approaches tend to emphasize o…
View article: Coping with Label Shift via Distributionally Robust Optimisation
Coping with Label Shift via Distributionally Robust Optimisation Open
The label shift problem refers to the supervised learning setting where the train and test label distributions do not match. Existing work addressing label shift usually assumes access to an \emph{unlabelled} test sample. This sample may b…
View article: Long-tail learning via logit adjustment
Long-tail learning via logit adjustment Open
Real-world classification problems typically exhibit an imbalanced or long-tailed label distribution, wherein many labels are associated with only a few samples. This poses a challenge for generalisation on such labels, and also makes naïv…
View article: Doubly-stochastic mining for heterogeneous retrieval
Doubly-stochastic mining for heterogeneous retrieval Open
Modern retrieval problems are characterised by training sets with potentially billions of labels, and heterogeneous data distributions across subpopulations (e.g., users of a retrieval system may be from different countries), each of which…
View article: Why are Adaptive Methods Good for Attention Models?
Why are Adaptive Methods Good for Attention Models? Open
While stochastic gradient descent (SGD) is still the \emph{de facto} algorithm in deep learning, adaptive methods like Clipped SGD/Adam have been observed to outperform SGD across important tasks, such as attention models. The settings und…
View article: How To Backdoor Federated Learning
How To Backdoor Federated Learning Open
Federated learning enables thousands of participants to construct a deep learning model without sharing their private training data with each other. For example, multiple smartphones can jointly train a next-word predictor for keyboards wi…
View article: Semantic Segmentation with Scarce Data
Semantic Segmentation with Scarce Data Open
Semantic segmentation is a challenging vision problem that usually necessitates the collection of large amounts of finely annotated data, which is often quite expensive to obtain. Coarsely annotated data provides an interesting alternative…
View article: Learning to Evaluate Image Captioning
Learning to Evaluate Image Captioning Open
Evaluation metrics for image captioning face two challenges. Firstly, commonly used metrics such as CIDEr, METEOR, ROUGE and BLEU often do not correlate well with human judgments. Secondly, each metric has well known blind spots to patholo…
View article: Separating Self-Expression and Visual Content in Hashtag Supervision
Separating Self-Expression and Visual Content in Hashtag Supervision Open
The variety, abundance, and structured nature of hashtags make them an interesting data source for training vision models. For instance, hashtags have the potential to significantly reduce the problem of manual supervision and annotation w…
View article: Convolutional Networks with Adaptive Computation Graphs.
Convolutional Networks with Adaptive Computation Graphs. Open
Do convolutional networks really need a fixed feed-forward structure? Often, a neural network is already confident after a few layers about the high-level concept shown in the image. However, due to the fixed network structure, all remaini…
View article: Conditional Similarity Networks
Conditional Similarity Networks Open
What makes images similar? To measure the similarity between images, they are typically embedded in a feature-vector space, in which their distance preserve the relative dissimilarity. However, when learning such similarity embeddings the …
View article: Deep Learning is Robust to Massive Label Noise
Deep Learning is Robust to Massive Label Noise Open
Deep neural networks trained on large supervised datasets have led to impressive results in image classification and other tasks. However, well-annotated datasets can be time-consuming and expensive to collect, lending increased interest t…
View article: Learning From Noisy Large-Scale Datasets With Minimal Supervision
Learning From Noisy Large-Scale Datasets With Minimal Supervision Open
We present an approach to effectively use millions of images with noisy annotations in conjunction with a small subset of cleanly-annotated images to learn powerful image representations. One common approach to combine clean and noisy data…
View article: Residual Networks Behave Like Ensembles of Relatively Shallow Networks
Residual Networks Behave Like Ensembles of Relatively Shallow Networks Open
In this work we propose a novel interpretation of residual networks showing that they can be seen as a collection of many paths of differing length. Moreover, residual networks seem to enable very deep networks by leveraging only the short…
View article: Residual Networks are Exponential Ensembles of Relatively Shallow Networks.
Residual Networks are Exponential Ensembles of Relatively Shallow Networks. Open
In this work, we introduce a novel interpretation of residual networks showing they are exponential ensembles. This observation is supported by a large-scale lesion study that demonstrates they behave just like ensembles at test time. Subs…
View article: Disentangling Nonlinear Perceptual Embeddings With Multi-Query Triplet Networks.
Disentangling Nonlinear Perceptual Embeddings With Multi-Query Triplet Networks. Open
In typical perceptual tasks, higher-order concepts are inferred from visual features to assist with perceptual decision making. However, there is a multitude of visual concepts which can be inferred from a single stimulus. When learning no…
View article: COCO-Text: Dataset and Benchmark for Text Detection and Recognition in\n Natural Images
COCO-Text: Dataset and Benchmark for Text Detection and Recognition in\n Natural Images Open
This paper describes the COCO-Text dataset. In recent years large-scale\ndatasets like SUN and Imagenet drove the advancement of scene understanding and\nobject recognition. The goal of COCO-Text is to advance state-of-the-art in\ntext det…