Yuning Chai
YOU?
Author Swipe
View article: Generative Data Mining with Longtail-Guided Diffusion
Generative Data Mining with Longtail-Guided Diffusion Open
It is difficult to anticipate the myriad challenges that a predictive model will encounter once deployed. Common practice entails a reactive, cyclical approach: model deployment, data mining, and retraining. We instead develop a proactive …
View article: DriveGPT: Scaling Autoregressive Behavior Models for Driving
DriveGPT: Scaling Autoregressive Behavior Models for Driving Open
We present DriveGPT, a scalable behavior model for autonomous driving. We model driving as a sequential decision-making task, and learn a transformer model to predict future agent states as tokens in an autoregressive fashion. We scale up …
View article: PROFIT: A Specialized Optimizer for Deep Fine Tuning
PROFIT: A Specialized Optimizer for Deep Fine Tuning Open
The fine-tuning of pre-trained models has become ubiquitous in generative AI, computer vision, and robotics. Although much attention has been paid to improving the efficiency of fine-tuning model, there has been less scholarship around fin…
View article: VLMine: Long-Tail Data Mining with Vision Language Models
VLMine: Long-Tail Data Mining with Vision Language Models Open
Ensuring robust performance on long-tail examples is an important problem for many real-world applications of machine learning, such as autonomous driving. This work focuses on the problem of identifying rare examples within a corpus of un…
View article: Cohere3D: Exploiting Temporal Coherence for Unsupervised Representation Learning of Vision-based Autonomous Driving
Cohere3D: Exploiting Temporal Coherence for Unsupervised Representation Learning of Vision-based Autonomous Driving Open
Due to the lack of depth cues in images, multi-frame inputs are important for the success of vision-based perception, prediction, and planning in autonomous driving. Observations from different angles enable the recovery of 3D object state…
View article: ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts Open
While existing large vision-language multimodal models focus on whole image understanding, there is a prominent gap in achieving region-specific comprehension. Current approaches that use textual coordinates or spatial encodings often fail…
View article: SHIFT3D: Synthesizing Hard Inputs For Tricking 3D Detectors
SHIFT3D: Synthesizing Hard Inputs For Tricking 3D Detectors Open
We present SHIFT3D, a differentiable pipeline for generating 3D shapes that are structurally plausible yet challenging to 3D object detectors. In safety-critical applications like autonomous driving, discovering such novel challenging obje…
View article: NOVA: NOvel View Augmentation for Neural Composition of Dynamic Objects
NOVA: NOvel View Augmentation for Neural Composition of Dynamic Objects Open
We propose a novel-view augmentation (NOVA) strategy to train NeRFs for photo-realistic 3D composition of dynamic objects in a static scene. Compared to prior work, our framework significantly reduces blending artifacts when inserting mult…
View article: Efficient Transformer-based 3D Object Detection with Dynamic Token Halting
Efficient Transformer-based 3D Object Detection with Dynamic Token Halting Open
Balancing efficiency and accuracy is a long-standing problem for deploying deep learning models. The trade-off is even more important for real-time safety-critical systems like autonomous vehicles. In this paper, we propose an effective ap…
View article: HDMapGen: A Hierarchical Graph Generative Model of High Definition Maps
HDMapGen: A Hierarchical Graph Generative Model of High Definition Maps Open
High Definition (HD) maps are maps with precise definitions of road lanes with rich semantics of the traffic rules. They are critical for several key stages in an autonomous driving system, including motion forecasting and planning. Howeve…
View article: Occupancy Flow Fields for Motion Forecasting in Autonomous Driving
Occupancy Flow Fields for Motion Forecasting in Autonomous Driving Open
We propose Occupancy Flow Fields, a new representation for motion forecasting of multiple agents, an important task in autonomous driving. Our representation is a spatio-temporal grid with each grid cell containing both the probability of …
View article: HDMapGen: A Hierarchical Graph Generative Model of High Definition Maps
HDMapGen: A Hierarchical Graph Generative Model of High Definition Maps Open
High Definition (HD) maps are maps with precise definitions of road lanes with rich semantics of the traffic rules. They are critical for several key stages in an autonomous driving system, including motion forecasting and planning. Howeve…
View article: To the Point: Efficient 3D Object Detection in the Range Image with Graph Convolution Kernels
To the Point: Efficient 3D Object Detection in the Range Image with Graph Convolution Kernels Open
3D object detection is vital for many robotics applications. For tasks where a 2D perspective range image exists, we propose to learn a 3D representation directly from this range image view. To this end, we designed a 2D convolutional netw…
View article: RSN: Range Sparse Net for Efficient, Accurate LiDAR 3D Object Detection
RSN: Range Sparse Net for Efficient, Accurate LiDAR 3D Object Detection Open
The detection of 3D objects from LiDAR data is a critical component in most autonomous driving systems. Safe, high speed driving needs larger detection ranges, which are enabled by new LiDARs. These larger detection ranges require more eff…
View article: Large Scale Interactive Motion Forecasting for Autonomous Driving : The\n Waymo Open Motion Dataset
Large Scale Interactive Motion Forecasting for Autonomous Driving : The\n Waymo Open Motion Dataset Open
As autonomous driving systems mature, motion forecasting has received\nincreasing attention as a critical requirement for planning. Of particular\nimportance are interactive situations such as merges, unprotected turns, etc.,\nwhere predic…
View article: Large Scale Interactive Motion Forecasting for Autonomous Driving : The Waymo Open Motion Dataset
Large Scale Interactive Motion Forecasting for Autonomous Driving : The Waymo Open Motion Dataset Open
As autonomous driving systems mature, motion forecasting has received increasing attention as a critical requirement for planning. Of particular importance are interactive situations such as merges, unprotected turns, etc., where predictin…
View article: Pseudo-labeling for Scalable 3D Object Detection
Pseudo-labeling for Scalable 3D Object Detection Open
To safely deploy autonomous vehicles, onboard perception systems must work reliably at high accuracy across a diverse set of environments and geographies. One of the most common techniques to improve the efficacy of such systems in new dom…
View article: Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout
Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout Open
The vast majority of deep models use multiple gradient signals, typically corresponding to a sum of multiple loss terms, to update a shared set of trainable weights. However, these multiple updates can impede optimal training by pulling th…
View article: TNT: Target-driveN Trajectory Prediction
TNT: Target-driveN Trajectory Prediction Open
Predicting the future behavior of moving agents is essential for real world applications. It is challenging as the intent of the agent and the corresponding behavior is unknown and intrinsically multimodal. Our key insight is that for pred…
View article: TNT: Target-driveN Trajectory Prediction
TNT: Target-driveN Trajectory Prediction Open
Predicting the future behavior of moving agents is essential for real world applications. It is challenging as the intent of the agent and the corresponding behavior is unknown and intrinsically multimodal. Our key insight is that for pred…
View article: SoDA: Multi-Object Tracking with Soft Data Association
SoDA: Multi-Object Tracking with Soft Data Association Open
Robust multi-object tracking (MOT) is a prerequisite fora safe deployment of self-driving cars. Tracking objects, however, remains a highly challenging problem, especially in cluttered autonomous driving scenes in which objects tend to int…
View article: SurfelGAN: Synthesizing Realistic Sensor Data for Autonomous Driving
SurfelGAN: Synthesizing Realistic Sensor Data for Autonomous Driving Open
Autonomous driving system development is critically dependent on the ability to replay complex and diverse traffic scenarios in simulation. In such scenarios, the ability to accurately simulate the vehicle sensors such as cameras, lidar or…
View article: Scalability in Perception for Autonomous Driving: Waymo Open Dataset
Scalability in Perception for Autonomous Driving: Waymo Open Dataset Open
The research community has increasing interest in autonomous driving research, despite the resource intensity of obtaining representative real world data. Existing self-driving datasets are limited in the scale and variation of the environ…
View article: Scalability in Perception for Autonomous Driving: An Open Dataset Benchmark
Scalability in Perception for Autonomous Driving: An Open Dataset Benchmark Open
The research community has increasing interest in autonomous driving research, despite the resource intensity of obtaining representative real world data. Existing self-driving datasets are limited in the scale and variation of the environ…
View article: MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction
MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction Open
Predicting human behavior is a difficult and crucial task required for motion planning. It is challenging in large part due to the highly uncertain and multi-modal set of possible outcomes in real-world domains such as autonomous driving. …
View article: MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for\n Behavior Prediction
MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for\n Behavior Prediction Open
Predicting human behavior is a difficult and crucial task required for motion\nplanning. It is challenging in large part due to the highly uncertain and\nmulti-modal set of possible outcomes in real-world domains such as autonomous\ndrivin…
View article: StarNet: Targeted Computation for Object Detection in Point Clouds
StarNet: Targeted Computation for Object Detection in Point Clouds Open
Detecting objects from LiDAR point clouds is an important component of self-driving car technology as LiDAR provides high resolution spatial information. Previous work on point-cloud 3D object detection has re-purposed convolutional approa…
View article: Patchwork: A Patch-wise Attention Network for Efficient Object Detection and Segmentation in Video Streams
Patchwork: A Patch-wise Attention Network for Efficient Object Detection and Segmentation in Video Streams Open
Recent advances in single-frame object detection and segmentation techniques have motivated a wide range of works to extend these methods to process video streams. In this paper, we explore the idea of hard attention aimed for latency-sens…
View article: FEELVOS: Fast End-to-End Embedding Learning for Video Object Segmentation
FEELVOS: Fast End-to-End Embedding Learning for Video Object Segmentation Open
Many of the recent successful methods for video object segmentation (VOS) are overly complicated, heavily rely on fine-tuning on the first frame, and/or are slow, and are hence of limited practical use. In this work, we propose FEELVOS as …
View article: Advances in fine-grained visual categorization
Advances in fine-grained visual categorization Open
The objective of this work is to improve performance in fine-grained visual categorization (FGVC). In particular, we are interested in the large-scale classification between hundreds of different flower, bird, dog species. FGVC is challeng…