Basil Mustafa
YOU?
Author Swipe
View article: A Study to Assess the Effectiveness of a Tailored Exercise Program for Improving Physical and Mental Well-being among Staff Nurses in Kuwait Hospital, Sharjah
A Study to Assess the Effectiveness of a Tailored Exercise Program for Improving Physical and Mental Well-being among Staff Nurses in Kuwait Hospital, Sharjah Open
Exercise improves health and well-being for nurses, but work schedules and stressors can hinder self-care, leading to physical and mental health issues and increased nursing shortages. This study aims to evaluate the effectiveness of a tai…
View article: Closing the AI generalisation gap by adjusting for dermatology condition distribution differences across clinical settings
Closing the AI generalisation gap by adjusting for dermatology condition distribution differences across clinical settings Open
View article: SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Open
We introduce SigLIP 2, a family of new multilingual vision-language encoders that build on the success of the original SigLIP. In this second iteration, we extend the original image-text training objective with several prior, independently…
View article: Triaging mammography with artificial intelligence: an implementation study
Triaging mammography with artificial intelligence: an implementation study Open
View article: Recent developments in Artificial Neural Network (ANN), steady-state and transient modeling of gas-phase biofiltration process
Recent developments in Artificial Neural Network (ANN), steady-state and transient modeling of gas-phase biofiltration process Open
Biofilter technology has played a significant role over several decades in providing clean air through removal of Volatile Organic Compounds (VOCs) and odor causing chemicals such as hydrogen sulfide from industrial polluted airstreams. Bi…
View article: Capabilities of Gemini Models in Medicine
Capabilities of Gemini Models in Medicine Open
Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong genera…
View article: Closing the AI generalization gap by adjusting for dermatology condition distribution differences across clinical settings
Closing the AI generalization gap by adjusting for dermatology condition distribution differences across clinical settings Open
Recently, there has been great progress in the ability of artificial intelligence (AI) algorithms to classify dermatological conditions from clinical photographs. However, little is known about the robustness of these algorithms in real-wo…
View article: PaLI-3 Vision Language Models: Smaller, Faster, Stronger
PaLI-3 Vision Language Models: Smaller, Faster, Stronger Open
This paper presents PaLI-3, a smaller, faster, and stronger vision language model (VLM) that compares favorably to similar models that are 10x larger. As part of arriving at this strong performance, we compare Vision Transformer (ViT) mode…
View article: From Sparse to Soft Mixtures of Experts
From Sparse to Soft Mixtures of Experts Open
Sparse mixture of expert architectures (MoEs) scale model capacity without significant increases in training or inference costs. Despite their success, MoEs suffer from a number of issues: training instability, token dropping, inability to…
View article: Towards Generalist Biomedical AI
Towards Generalist Biomedical AI Open
Medicine is inherently multimodal, with rich data modalities spanning text, imaging, genomics, and more. Generalist biomedical artificial intelligence (AI) systems that flexibly encode, integrate, and interpret this data at scale can poten…
View article: Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution Open
The ubiquitous and demonstrably suboptimal choice of resizing images to a fixed resolution before processing them with computer vision models has not yet been successfully challenged. However, models such as the Vision Transformer (ViT) of…
View article: PaLI-X: On Scaling up a Multilingual Vision and Language Model
PaLI-X: On Scaling up a Multilingual Vision and Language Model Open
We present the training recipe and results of scaling up PaLI-X, a multilingual vision and language model, both in terms of size of the components and the breadth of its training task mixture. Our model achieves new levels of performance o…
View article: Three Towers: Flexible Contrastive Learning with Pretrained Image Models
Three Towers: Flexible Contrastive Learning with Pretrained Image Models Open
We introduce Three Towers (3T), a flexible method to improve the contrastive learning of vision-language models by incorporating pretrained image classifiers. While contrastive models are usually trained from scratch, LiT (Zhai et al., 202…
View article: Sigmoid Loss for Language Image Pre-Training
Sigmoid Loss for Language Image Pre-Training Open
We propose a simple pairwise Sigmoid loss for Language-Image Pre-training (SigLIP). Unlike standard contrastive learning with softmax normalization, the sigmoid loss operates solely on image-text pairs and does not require a global view of…
View article: Scaling Vision Transformers to 22 Billion Parameters
Scaling Vision Transformers to 22 Billion Parameters Open
The scaling of Transformers has driven breakthrough capabilities for language models. At present, the largest large language models (LLMs) contain upwards of 100B parameters. Vision Transformers (ViT) have introduced the same architecture …
View article: Massively Scaling Heteroscedastic Classifiers
Massively Scaling Heteroscedastic Classifiers Open
Heteroscedastic classifiers, which learn a multivariate Gaussian distribution over prediction logits, have been shown to perform well on image classification problems with hundreds to thousands of classes. However, compared to standard cla…
View article: CLIPPO: Image-and-Language Understanding from Pixels Only
CLIPPO: Image-and-Language Understanding from Pixels Only Open
Multimodal models are becoming increasingly effective, in part due to unified components, such as the Transformer architecture. However, multimodal models still often consist of many task- and modality-specific pieces and training procedur…
View article: Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints Open
Training large, deep neural networks to convergence can be prohibitively expensive. As a result, often only a small selection of popular, dense models are reused across different contexts and tasks. Increasingly, sparsely activated models,…
View article: Enhancing the reliability and accuracy of AI-enabled diagnosis via complementarity-driven deferral to clinicians (CoDoC)
Enhancing the reliability and accuracy of AI-enabled diagnosis via complementarity-driven deferral to clinicians (CoDoC) Open
Diagnostic AI systems trained using deep learning have been shown to achieve expert-level identification of diseases in multiple medical imaging settings1,2. However, such systems are not always reliable and can fail in cases di…
View article: PaLI: A Jointly-Scaled Multilingual Language-Image Model
PaLI: A Jointly-Scaled Multilingual Language-Image Model Open
Effective scaling and a flexible task interface enable large language models to excel at many tasks. We present PaLI (Pathways Language and Image model), a model that extends this approach to the joint modeling of language and vision. PaLI…
View article: Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts Open
Large sparsely-activated models have obtained excellent performance in multiple domains. However, such models are typically trained on a single modality at a time. We present the Language-Image MoE, LIMoE, a sparse mixture of experts model…
View article: Robust and Efficient Medical Imaging with Self-Supervision
Robust and Efficient Medical Imaging with Self-Supervision Open
Recent progress in Medical Artificial Intelligence (AI) has delivered systems that can reach clinical expert level performance. However, such systems tend to demonstrate sub-optimal "out-of-distribution" performance when evaluated in clini…
View article: Learning to Merge Tokens in Vision Transformers
Learning to Merge Tokens in Vision Transformers Open
Transformers are widely applied to solve natural language understanding and computer vision tasks. While scaling up these architectures leads to improved performance, it often comes at the expense of much higher computational costs. In ord…
View article: LiT: Zero-Shot Transfer with Locked-image text Tuning
LiT: Zero-Shot Transfer with Locked-image text Tuning Open
This paper presents contrastive-tuning, a simple method employing contrastive training to align image and text models while still taking advantage of their pre-training. In our empirical study we find that locked pre-trained image models w…
View article: Does your dermatology classifier know what it doesn’t know? Detecting the long-tail of unseen conditions
Does your dermatology classifier know what it doesn’t know? Detecting the long-tail of unseen conditions Open
View article: Sparse MoEs meet Efficient Ensembles
Sparse MoEs meet Efficient Ensembles Open
Machine learning models based on the aggregated outputs of submodels, either at the activation or prediction levels, often exhibit strong performance compared to individual models. We study the interplay of two popular classes of such mode…
View article: Scaling Vision with Sparse Mixture of Experts
Scaling Vision with Sparse Mixture of Experts Open
Sparsely-gated Mixture of Experts networks (MoEs) have demonstrated excellent scalability in Natural Language Processing. In Computer Vision, however, almost all performant networks are "dense", that is, every input is processed by every p…
View article: Correlated Input-Dependent Label Noise in Large-Scale Image Classification
Correlated Input-Dependent Label Noise in Large-Scale Image Classification Open
Large scale image classification datasets often contain noisy labels. We take a principled probabilistic approach to modelling input-dependent, also known as heteroscedastic, label noise in these datasets. We place a multivariate Normal di…
View article: Correlated Input-Dependent Label Noise in Large-Scale Image\n Classification
Correlated Input-Dependent Label Noise in Large-Scale Image\n Classification Open
Large scale image classification datasets often contain noisy labels. We take\na principled probabilistic approach to modelling input-dependent, also known as\nheteroscedastic, label noise in these datasets. We place a multivariate Normal\…
View article: Supervised Transfer Learning at Scale for Medical Imaging
Supervised Transfer Learning at Scale for Medical Imaging Open
Transfer learning is a standard technique to improve performance on tasks with limited data. However, for medical imaging, the value of transfer learning is less clear. This is likely due to the large domain mismatch between the usual natu…