Explanipedia

SAM 3: Segment Anything with Concepts Open

Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Ronghang Hu, Didac Suris , et al. · 2025

We present Segment Anything Model (SAM) 3, a unified model that detects, segments, and tracks objects in images and videos based on concept prompts, which we define as either short noun phrases (e.g., "yellow school bus"), image exemplars,…

SAM 3D: 3Dfy Anything in Images Open

SAM D Team, Fu-Jen Chu, Pierre Gleize, Kevin J Liang, Alexander F. Sax , et al. · 2025

We present SAM 3D, a generative model for visually grounded 3D object reconstruction, predicting geometry, texture, and layout from a single image. SAM 3D excels in natural images, where occlusion and scene clutter are common and visual re…

SAM 2: Segment Anything in Images and Videos Open

Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya K. Ryali , et al. · 2024

We present Segment Anything Model 2 (SAM 2), a foundation model towards solving promptable visual segmentation in images and videos. We build a data engine, which improves model and data via user interaction, to collect the largest video s…

RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection Open

Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick , et al. · 2024

This repository contains all the collected and aligned data for RU-AI dataset. It is constructed based on three large publicly available datasets: Flickr8K, COCO, and Places205, by adding their corresponding machine-generated pairs from fi…

LVIS: A dataset for large vocabulary instance segmentation Open

Agrim Gupta, Piotr Dollár, Ross Girshick · 2024

Progress on object detection is enabled by datasets that focus the research community's attention on open challenges. This process led us from simple images to complex scenes and from bounding boxes to segmentation masks. In this work, we …

Segment Anything Open

Alexander M. Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland , et al. · 2023

We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion mas…

The effectiveness of MAE pre-pretraining for billion-scale pretraining Open

Mannat Singh, Quentin Duval, Kalyan Vasudev Alwala, Haoqi Fan, Vaibhav Aggarwal , et al. · 2023

This paper revisits the standard pretrain-then-finetune paradigm used in computer vision for visual recognition tasks. Typically, state-of-the-art foundation models are pretrained using large scale (weakly) supervised datasets with billion…

Revisiting Weakly Supervised Pre-Training of Visual Perception Models Open

Mannat Singh, Laura Gustafson, Aaron Adcock, Vinicius de Freitas Reis, Buğra Gedik , et al. · 2022

Model pre-training is a cornerstone of modern visual recognition systems. Although fully supervised pre-training on datasets like ImageNet is still the de-facto standard, recent studies suggest that large-scale weakly supervised pre-traini…

Benchmarking Detection Transfer Learning with Vision Transformers Open

Yanghao Li, Saining Xie, Xinlei Chen, Piotr Dollár, Kaiming He , et al. · 2021

Object detection is a central downstream task used to test if pre-trained network parameters confer benefits, such as improved accuracy or training speed. The complexity of object detection methods can make this benchmarking non-trivial wh…

Masked Autoencoders Are Scalable Vision Learners Open

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár , et al. · 2021

This paper shows that masked autoencoders (MAE) are scalable self-supervised learners for computer vision. Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels. It is based on two core de…

Early Convolutions Help Transformers See Better Open

Tete Xiao, Mannat Singh, Eric Mintun, Trevor Darrell, Piotr Dollár , et al. · 2021

Vision transformer (ViT) models exhibit substandard optimizability. In particular, they are sensitive to the choice of optimizer (AdamW vs. SGD), optimizer hyperparameters, and training schedule length. In comparison, modern convolutional …

Fast and Accurate Model Scaling Open

Piotr Dollár, Mannat Singh, Ross Girshick · 2021

In this work we analyze strategies for convolutional neural network scaling; that is, the process of scaling a base convolutional network to endow it with greater computational complexity and consequently representational power. Example sc…

Boundary IoU: Improving Object-Centric Image Segmentation Evaluation Open

Bowen Cheng, Ross Girshick, Piotr Dollár, Alexander C. Berg, Alexander Kirillov · 2021

We present Boundary IoU (Intersection-over-Union), a new segmentation evaluation measure focused on boundary quality. We perform an extensive analysis across different error types and object sizes and show that Boundary IoU is significantl…

Fast and Accurate Model Scaling Open

Piotr Dollár, Mannat Singh, Ross Girshick · 2021

In this work we analyze strategies for convolutional neural network scaling; that is, the process of scaling a base convolutional network to endow it with greater computational complexity and consequently representational power. Example sc…

Evaluating Large-Vocabulary Object Detectors: The Devil is in the\n Details Open

Achal Dave, Piotr Dollár, Deva Ramanan, Alexander M. Kirillov, Ross Girshick · 2021

By design, average precision (AP) for object detection aims to treat all\nclasses independently: AP is computed independently per category and averaged.\nOn one hand, this is desirable as it treats all classes equally. On the other\nhand, …

Evaluating Large-Vocabulary Object Detectors: The Devil is in the Details Open

Achal Dave, Piotr Dollár, Deva Ramanan, Alexander Kirillov, Ross Girshick · 2021

By design, average precision (AP) for object detection aims to treat all classes independently: AP is computed independently per category and averaged. On one hand, this is desirable as it treats all classes equally. On the other hand, it …

Designing Network Design Spaces Open

Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár · 2020

In this work, we present a new network design paradigm. Our goal is to help advance the understanding of network design and discover design principles that generalize across settings. Instead of focusing on designing individual network ins…

Are Labels Necessary for Neural Architecture Search? Open

Chenxi Liu, Piotr Dollár, Kaiming He, Ross Girshick, Alan Yuille , et al. · 2020

Existing neural network architectures in computer vision -- whether designed by humans or by machines -- were typically found using both images and their associated labels. In this paper, we ask the question: can we find high-quality neura…

On Network Design Spaces for Visual Recognition Open

Ilija Radosavovic, Justin Johnson, Saining Xie, Wan‐Yen Lo, Piotr Dollár · 2019

Over the past several years progress in designing better neural network architectures for visual recognition has been substantial. To help sustain this rate of progress, in this work we propose to reexamine the methodology for comparing ne…

Panoptic Segmentation Open

Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, Piotr Dollár · 2019

We propose and study a task we name panoptic segmentation (PS). Panoptic segmentation unifies the typically distinct tasks of semantic segmentation (assign a class label to each pixel) and instance segmentation (detect and segment each obj…

TensorMask: A Foundation for Dense Object Segmentation Open

Xinlei Chen, Ross Girshick, Kaiming He, Piotr Dollár · 2019

Sliding-window object detectors that generate bounding-box object predictions over a dense, regular grid have advanced rapidly and proven popular. In contrast, modern instance segmentation approaches are dominated by methods that first det…

Panoptic Feature Pyramid Networks Open

Alexander Kirillov, Ross Girshick, Kaiming He, Piotr Dollár · 2019

The recently introduced panoptic segmentation task has renewed our community's interest in unifying the tasks of instance segmentation (for thing classes) and semantic segmentation (for stuff classes). However, current state-of-the-art met…

Rethinking ImageNet Pre-training Open

Kaiming He, Ross Girshick, Piotr Dollár · 2018

We report competitive results on object detection and instance segmentation on the COCO dataset using standard models trained from random initialization. The results are no worse than their ImageNet pre-training counterparts even when usin…

Data Distillation: Towards Omni-Supervised Learning Open

Ilija Radosavovic, Piotr Dollár, Ross Girshick, Georgia Gkioxari, Kaiming He · 2017

We investigate omni-supervised learning, a special regime of semi-supervised learning in which the learner exploits all available labeled data plus internet-scale sources of unlabeled data. Omni-supervised learning is lower-bounded by perf…

Learning to Segment Every Thing Open

Ronghang Hu, Piotr Dollár, Kaiming He, Trevor Darrell, Ross Girshick · 2017

Most methods for object instance segmentation require all training examples to be labeled with segmentation masks. This requirement makes it expensive to annotate new categories and has restricted instance segmentation models to ~100 well-…

Focal Loss for Dense Object Detection Open

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár · 2017

The highest accuracy object detectors to date are based on a two-stage approach popularized by R-CNN, where a classifier is applied to a sparse set of candidate object locations. In contrast, one-stage detectors that are applied over a reg…

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour Open

Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski , et al. · 2017

Deep learning thrives with large neural networks and large datasets. However, larger networks and larger datasets result in longer training times that impede research and development progress. Distributed synchronous SGD offers a potential…

Detecting and Recognizing Human-Object Interactions Open

Georgia Gkioxari, Ross Girshick, Piotr Dollár, Kaiming He · 2017

To understand the visual world, a machine must not only recognize individual object instances but also how they interact. Humans are often at the center of such interactions and detecting human-object interactions is an important practical…

Mask R-CNN Open

Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick · 2017

We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. Th…

Mask R-CNN Open

Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick · 2017

We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. Th…

Piotr Dollár YOU? Author Swipe