Exploring foci of
2025-06-10
Mamba-Reg: Vision Mamba Also Needs Registers
2025-06-10 • Feng Wang, Jiahao Wang, Sucheng Ren, Guoyizhe Wei, Jieru Mei, Wei Shao, Yuyin Zhou, Alan Yuille, Cihang Xie
Similar to Vision Transformers, this paper identifies artifacts also present within the feature maps of Vision Mamba. These artifacts, corresponding to high-norm tokens emerging in low-information background areas of images, appear much more severe in Vision Mamba-they exist prevalently even with the tiny-sized model and activate extensively across background regions. To mitigate this issue, we follow the prior solution of introducing register tokens into Vision Mamba. To better cope with Mamba blocks' uni-directi…
Machine Vision
Vision (Marvel Comics)
Vision Quest
One-Shot Learning (Computer Vision)
Vision Quest
Exploring foci of
2025-01-08
RadGPT: Constructing 3D Image-Text Tumor Datasets
2025-01-08 • Pedro R. A. S. Bassi, Mehmet Can Yavuz, Kang Wang, Xiaoxi Chen, Wenxuan Li, Sergio Decherchi, Andrea Cavalli, Yang Yang, Alan Yuille, Zongwei Zhou
With over 85 million CT scans performed annually in the United States, creating tumor-related reports is a challenging and time-consuming task for radiologists. To address this need, we present RadGPT, an Anatomy-Aware Vision-Language AI Agent for generating detailed reports from CT scans. RadGPT first segments tumors, including benign cysts and malignant tumors, and their surrounding anatomical structures, then transforms this information into both structured reports and narrative reports. These reports provide t…
Exploring foci of
2025-01-20
How Well Do Supervised 3D Models Transfer to Medical Imaging Tasks?
2025-01-20 • Wenxuan Li, Alan Yuille, Zongwei Zhou
The pre-training and fine-tuning paradigm has become prominent in transfer learning. For example, if the model is pre-trained on ImageNet and then fine-tuned to PASCAL, it can significantly outperform that trained on PASCAL from scratch. While ImageNet pre-training has shown enormous success, it is formed in 2D, and the learned features are for classification tasks; when transferring to more diverse tasks, like 3D image segmentation, its performance is inevitably compromised due to the deviation from the original …
Well Of Souls
Well Done C.A Sahab!
All Is Well (Tv Series)
All Is Well (2015 Film)
Supervised Injection Site
It Is Well With My Soul
Truth Coming Out Of Her Well
Audi S And Rs Models
Ford Models
Exploring foci of
2025-08-28
Mixture of Contexts for Long Video Generation
2025-08-28 • Shengqu Cai, Ceyuan Yang, Lvmin Zhang, Yang Guo, Junfei Xiao, Ziyan Yang, Yuyun Xu, Zhenheng Yang, Alan Yuille, Leonidas Guibas, Maneesh Agrawala,...
Long video generation is fundamentally a long context memory problem: models must retain and retrieve salient events across a long range without collapsing or drifting. However, scaling diffusion transformers to generate long-context videos is fundamentally limited by the quadratic cost of self-attention, which makes memory and computation intractable and difficult to optimize for long sequences. We recast long-context video generation as an internal information retrieval task and propose a simple, learnable spars…
Long March
The Long Earth
Long Qt Syndrome
Long Range – Anti Ship Missile (India)
The Long Utopia
Long Beach Polytechnic High School
At. Long. Last. Asap
A Long Way Gone
Shelley Long
Exploring foci of
2025-06-29
OmniVCus: Feedforward Subject-driven Video Customization with Multimodal Control Conditions
2025-06-29 • He Zhang, Jinbo Xing, Yiwei Hu, Zhifei Zhang, Soo Ye Kim, Tianyu Wang, Shuicheng Yan, Zhang Lin, Alan Yuille
Existing feedforward subject-driven video customization methods mainly study single-subject scenarios due to the difficulty of constructing multi-subject training data pairs. Another challenging problem that how to use the signals such as depth, mask, camera, and text prompts to control and edit the subject in the customized video is still less explored. In this paper, we first propose a data construction pipeline, VideoCus-Factory, to produce training data pairs for multi-subject customization from raw videos wit…
Feedforward Neural Network