Jinwei Gu
YOU?
Author Swipe
View article: World Simulation with Video Foundation Models for Physical AI
World Simulation with Video Foundation Models for Physical AI Open
We introduce [Cosmos-Predict2.5], the latest generation of the Cosmos World Foundation Models for Physical AI. Built on a flow-based architecture, [Cosmos-Predict2.5] unifies Text2World, Image2World, and Video2World generation in a single …
View article: 4DSloMo: 4D Reconstruction for High Speed Scene with Asynchronous Capture
4DSloMo: 4D Reconstruction for High Speed Scene with Asynchronous Capture Open
Reconstructing fast-dynamic scenes from multi-view videos is crucial for high-speed motion analysis and realistic 4D reconstruction. However, the majority of 4D capture systems are limited to frame rates below 30 FPS (frames per second), a…
View article: ArtiScene: Language-Driven Artistic 3D Scene Generation Through Image Intermediary
ArtiScene: Language-Driven Artistic 3D Scene Generation Through Image Intermediary Open
Designing 3D scenes is traditionally a challenging task that demands both artistic expertise and proficiency with complex software. Recent advances in text-to-3D generation have greatly simplified this process by letting users create scene…
View article: Parallel Sequence Modeling via Generalized Spatial Propagation Network
Parallel Sequence Modeling via Generalized Spatial Propagation Network Open
We present the Generalized Spatial Propagation Network (GSPN), a new attention mechanism optimized for vision tasks that inherently captures 2D spatial structures. Existing attention models, including transformers, linear attention, and st…
View article: NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images
NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images Open
Recent advancements in generative models have significantly improved novel view synthesis (NVS) from multi-view data. However, existing methods depend on external multi-view alignment processes, such as explicit pose estimation or pre-reco…
View article: From Sim-to-Real: Toward General Event-based Low-light Frame Interpolation with Per-scene Optimization
From Sim-to-Real: Toward General Event-based Low-light Frame Interpolation with Per-scene Optimization Open
Video Frame Interpolation (VFI) is important for video enhancement, frame\nrate up-conversion, and slow-motion generation. The introduction of event\ncameras, which capture per-pixel brightness changes asynchronously, has\nsignificantly en…
View article: AdaptiveISP: Learning an Adaptive Image Signal Processor for Object Detection
AdaptiveISP: Learning an Adaptive Image Signal Processor for Object Detection Open
Image Signal Processors (ISPs) convert raw sensor signals into digital images, which significantly influence the image quality and the performance of downstream computer vision tasks. Designing ISP pipeline and tuning ISP parameters are tw…
View article: DualDn: Dual-domain Denoising via Differentiable ISP
DualDn: Dual-domain Denoising via Differentiable ISP Open
Image denoising is a critical component in a camera's Image Signal Processing (ISP) pipeline. There are two typical ways to inject a denoiser into the ISP pipeline: applying a denoiser directly to captured raw frames (raw domain) or to the…
View article: PhoCoLens: Photorealistic and Consistent Reconstruction in Lensless Imaging
PhoCoLens: Photorealistic and Consistent Reconstruction in Lensless Imaging Open
Lensless cameras offer significant advantages in size, weight, and cost compared to traditional lens-based systems. Without a focusing lens, lensless cameras rely on computational algorithms to recover the scenes from multiplexed measureme…
View article: System Structural Error Analysis in Binocular Vision Measurement Systems
System Structural Error Analysis in Binocular Vision Measurement Systems Open
A binocular stereo vision measurement system is widely used in fields such as industrial inspection and marine engineering due to its high accuracy, low cost, and ease of deployment. An unreasonable structural design can lead to difficulti…
View article: Compact Nd: YVO₄ laser system based on Vapor chamber passive cooling techniques
Compact Nd: YVO₄ laser system based on Vapor chamber passive cooling techniques Open
A compact Nd: YVO₄ laser system based on vapor chamber passive cooling technique has been developed and explored for the first time to our best knowledge. An average power of 8.84 W with beam quality of M² < 2.2 and slope efficiency of 44%…
View article: Compact Nd: YVO₄ laser system based on Vapor chamber passive cooling techniques
Compact Nd: YVO₄ laser system based on Vapor chamber passive cooling techniques Open
A compact Nd: YVO₄ laser system based on vapor chamber passive cooling technique has been developed and explored for the first time to our best knowledge. An average power of 8.84 W with beam quality of M² < 2.2 and slope efficiency of 44%…
View article: Matting by Generation
Matting by Generation Open
This paper introduces an innovative approach for image matting that redefines\nthe traditional regression-based task as a generative modeling challenge. Our\nmethod harnesses the capabilities of latent diffusion models, enriched with\nexte…
View article: LenslessFace: An End-to-End Optimized Lensless System for Privacy-Preserving Face Verification
LenslessFace: An End-to-End Optimized Lensless System for Privacy-Preserving Face Verification Open
Lensless cameras, innovatively replacing traditional lenses for ultra-thin, flat optics, encode light directly onto sensors, producing images that are not immediately recognizable. This compact, lightweight, and cost-effective imaging solu…
View article: Learning-based lens wavefront aberration recovery
Learning-based lens wavefront aberration recovery Open
Wavefront aberration describes the deviation of a wavefront in an imaging system from a desired perfect shape, such as a plane or a sphere, which may be caused by a variety of factors, such as imperfections in optical equipment, atmospheri…
View article: Cached Transformers: Improving Transformers with Differentiable Memory Cachde
Cached Transformers: Improving Transformers with Differentiable Memory Cachde Open
This work introduces a new Transformer model called Cached Transformer, which uses Gated Recurrent Cached (GRC) attention to extend the self-attention mechanism with a differentiable memory cache of tokens. GRC attention enables attending …
View article: HDRFlow: Real-Time HDR Video Reconstruction with Large Motions
HDRFlow: Real-Time HDR Video Reconstruction with Large Motions Open
Reconstructing High Dynamic Range (HDR) video from image sequences captured with alternating exposures is challenging, especially in the presence of large camera or object motion. Existing methods typically align low dynamic range sequence…
View article: Event-Based Motion Magnification
Event-Based Motion Magnification Open
Detecting and magnifying imperceptible high-frequency motions in real-world scenarios has substantial implications for industrial and medical applications. These motions are characterized by small amplitudes and high frequencies. Tradition…
View article: Cached Transformers: Improving Transformers with Differentiable Memory Cache
Cached Transformers: Improving Transformers with Differentiable Memory Cache Open
This work introduces a new Transformer model called Cached Transformer, which uses Gated Recurrent Cached (GRC) attention to extend the self-attention mechanism with a differentiable memory cache of tokens. GRC attention enables attending …
View article: AutoDIR: Automatic All-in-One Image Restoration with Latent Diffusion
AutoDIR: Automatic All-in-One Image Restoration with Latent Diffusion Open
We present AutoDIR, an innovative all-in-one image restoration system incorporating latent diffusion. AutoDIR excels in its ability to automatically identify and restore images suffering from a range of unknown degradations. AutoDIR offers…
View article: Reconstruct-and-Generate Diffusion Model for Detail-Preserving Image Denoising
Reconstruct-and-Generate Diffusion Model for Detail-Preserving Image Denoising Open
Image denoising is a fundamental and challenging task in the field of computer vision. Most supervised denoising methods learn to reconstruct clean images from noisy inputs, which have intrinsic spectral bias and tend to produce over-smoot…
View article: Learning Image-Adaptive Codebooks for Class-Agnostic Image Restoration
Learning Image-Adaptive Codebooks for Class-Agnostic Image Restoration Open
Recent work on discrete generative priors, in the form of codebooks, has shown exciting performance for image reconstruction and restoration, as the discrete prior space spanned by the codebooks increases the robustness against diverse ima…
View article: MIPI 2023 Challenge on Nighttime Flare Removal: Methods and Results
MIPI 2023 Challenge on Nighttime Flare Removal: Methods and Results Open
Developing and integrating advanced image sensors with novel algorithms in camera systems are prevalent with the increasing demand for computational photography and imaging on mobile platforms. However, the lack of high-quality data for re…
View article: MIPI 2023 Challenge on RGB+ToF Depth Completion: Methods and Results
MIPI 2023 Challenge on RGB+ToF Depth Completion: Methods and Results Open
Depth completion from RGB images and sparse Time-of-Flight (ToF) measurements is an important problem in computer vision and robotics. While traditional methods for depth completion have relied on stereo vision or structured light techniqu…
View article: MIPI 2023 Challenge on RGBW Fusion: Methods and Results
MIPI 2023 Challenge on RGBW Fusion: Methods and Results Open
Developing and integrating advanced image sensors with novel algorithms in camera systems are prevalent with the increasing demand for computational photography and imaging on mobile platforms. However, the lack of high-quality data for re…
View article: MIPI 2023 Challenge on RGBW Remosaic: Methods and Results
MIPI 2023 Challenge on RGBW Remosaic: Methods and Results Open
Developing and integrating advanced image sensors with novel algorithms in camera systems are prevalent with the increasing demand for computational photography and imaging on mobile platforms. However, the lack of high-quality data for re…
View article: Generating Aligned Pseudo-Supervision from Non-Aligned Data for Image Restoration in Under-Display Camera
Generating Aligned Pseudo-Supervision from Non-Aligned Data for Image Restoration in Under-Display Camera Open
Due to the difficulty in collecting large-scale and perfectly aligned paired training data for Under-Display Camera (UDC) image restoration, previous methods resort to monitor-based image systems or simulation-based methods, sacrificing th…
View article: Random Weights Networks Work as Loss Prior Constraint for Image Restoration
Random Weights Networks Work as Loss Prior Constraint for Image Restoration Open
In this paper, orthogonal to the existing data and model studies, we instead resort our efforts to investigate the potential of loss function in a new perspective and present our belief ``Random Weights Networks can Be Acted as Loss Prior …
View article: Real-time Controllable Denoising for Image and Video
Real-time Controllable Denoising for Image and Video Open
Controllable image denoising aims to generate clean samples with human perceptual priors and balance sharpness and smoothness. In traditional filter-based denoising methods, this can be easily achieved by adjusting the filtering strength. …
View article: Overexposure Mask Fusion: Generalizable Reverse ISP Multi-Step Refinement
Overexposure Mask Fusion: Generalizable Reverse ISP Multi-Step Refinement Open
With the advent of deep learning methods replacing the ISP in transforming sensor RAW readings into RGB images, numerous methodologies solidified into real-life applications. Equally potent is the task of inverting this process which will …