Baoru Huang
YOU?
Author Swipe
View article: Learning Human Motion with Temporally Conditional Mamba
Learning Human Motion with Temporally Conditional Mamba Open
Learning human motion based on a time-dependent input signal presents a challenging yet impactful task with various applications. The goal of this task is to generate or estimate human movement that consistently reflects the temporal patte…
View article: I2V-GS: Infrastructure-to-Vehicle View Transformation with Gaussian Splatting for Autonomous Driving Data Generation
I2V-GS: Infrastructure-to-Vehicle View Transformation with Gaussian Splatting for Autonomous Driving Data Generation Open
Vast and high-quality data are essential for end-to-end autonomous driving systems. However, current driving data is mainly collected by vehicles, which is expensive and inefficient. A potential solution lies in synthesizing data from real…
View article: StereoMamba: Real-time and Robust Intraoperative Stereo Disparity Estimation via Long-range Spatial Dependencies
StereoMamba: Real-time and Robust Intraoperative Stereo Disparity Estimation via Long-range Spatial Dependencies Open
Stereo disparity estimation is crucial for obtaining depth information in robot-assisted minimally invasive surgery (RAMIS). While current deep learning methods have made significant advancements, challenges remain in achieving an optimal …
View article: Humanoid Agent via Embodied Chain-of-Action Reasoning with Multimodal Foundation Models for Zero-Shot Loco-Manipulation
Humanoid Agent via Embodied Chain-of-Action Reasoning with Multimodal Foundation Models for Zero-Shot Loco-Manipulation Open
Humanoid loco-manipulation, which integrates whole-body locomotion with dexterous manipulation, remains a fundamental challenge in robotics. Beyond whole-body coordination and balance, a central difficulty lies in understanding human instr…
View article: EndoLRMGS: Complete Endoscopic Scene Reconstruction combining Large Reconstruction Modelling and Gaussian Splatting
EndoLRMGS: Complete Endoscopic Scene Reconstruction combining Large Reconstruction Modelling and Gaussian Splatting Open
Complete reconstruction of surgical scenes is crucial for robot-assisted surgery (RAS). Deep depth estimation is promising but existing works struggle with depth discontinuities, resulting in noisy predictions at object boundaries and do n…
View article: A Terrain Classification Method for Quadruped Robots with Proprioception
A Terrain Classification Method for Quadruped Robots with Proprioception Open
Acquiring terrain information during robot locomotion is essential for autonomous navigation, gait selection, and trajectory planning. Quadruped robots, due to their biomimetic structures, demonstrate enhanced traversability over complex t…
View article: Hybrid Deep Reinforcement Learning for Radio Tracer Localisation in Robotic-assisted Radioguided Surgery
Hybrid Deep Reinforcement Learning for Radio Tracer Localisation in Robotic-assisted Radioguided Surgery Open
Radioguided surgery, such as sentinel lymph node biopsy, relies on the precise localization of radioactive targets by non-imaging gamma/beta detectors. Manual radioactive target detection based on visual display or audible indication of ga…
View article: FedEFM: Federated Endovascular Foundation Model with Unseen Data
FedEFM: Federated Endovascular Foundation Model with Unseen Data Open
In endovascular surgery, the precise identification of catheters and guidewires in X-ray images is essential for reducing intervention risks. However, accurately segmenting catheter and guidewire structures is challenging due to the limite…
View article: SplineFormer: An Explainable Transformer-Based Approach for Autonomous Endovascular Navigation
SplineFormer: An Explainable Transformer-Based Approach for Autonomous Endovascular Navigation Open
Endovascular navigation is a crucial aspect of minimally invasive procedures, where precise control of curvilinear instruments like guidewires is critical for successful interventions. A key challenge in this task is accurately predicting …
View article: Laparoscopic Scene Analysis for Intraoperative Visualisation of Gamma Probe Signals in Minimally Invasive Cancer Surgery
Laparoscopic Scene Analysis for Intraoperative Visualisation of Gamma Probe Signals in Minimally Invasive Cancer Surgery Open
Cancer remains a significant health challenge worldwide, with a new diagnosis occurring every two minutes in the UK. Surgery is one of the main treatment options for cancer. However, surgeons rely on the sense of touch and naked eye with l…
View article: SegCol Challenge: Semantic Segmentation for Tools and Fold Edges in Colonoscopy data
SegCol Challenge: Semantic Segmentation for Tools and Fold Edges in Colonoscopy data Open
Colorectal cancer (CRC) remains a leading cause of cancer-related deaths worldwide, with polyp removal being an effective early screening method. However, navigating the colon for thorough polyp detection poses significant challenges. To a…
View article: Nested ResNet: A Vision-Based Method for Detecting the Sensing Area of a Drop-in Gamma Probe
Nested ResNet: A Vision-Based Method for Detecting the Sensing Area of a Drop-in Gamma Probe Open
Purpose: Drop-in gamma probes are widely used in robotic-assisted minimally invasive surgery (RAMIS) for lymph node detection. However, these devices only provide audio feedback on signal intensity, lacking the visual feedback necessary fo…
View article: Guide3D: A Bi-planar X-ray Dataset for 3D Shape Reconstruction
Guide3D: A Bi-planar X-ray Dataset for 3D Shape Reconstruction Open
Endovascular surgical tool reconstruction represents an important factor in advancing endovascular tool navigation, which is an important step in endovascular surgery. However, the lack of publicly available datasets significantly restrict…
View article: SurgicalGS: Dynamic 3D Gaussian Splatting for Accurate Robotic-Assisted Surgical Scene Reconstruction
SurgicalGS: Dynamic 3D Gaussian Splatting for Accurate Robotic-Assisted Surgical Scene Reconstruction Open
Accurate 3D reconstruction of dynamic surgical scenes from endoscopic video is essential for robotic-assisted surgery. While recent 3D Gaussian Splatting methods have shown promise in achieving high-quality reconstructions with fast render…
View article: Tracking Everything in Robotic-Assisted Surgery
Tracking Everything in Robotic-Assisted Surgery Open
Accurate tracking of tissues and instruments in videos is crucial for Robotic-Assisted Minimally Invasive Surgery (RAMIS), as it enables the robot to comprehend the surgical scene with precise locations and interactions of tissues and tool…
View article: Robotic-CLIP: Fine-tuning CLIP on Action Data for Robotic Applications
Robotic-CLIP: Fine-tuning CLIP on Action Data for Robotic Applications Open
Vision language models have played a key role in extracting meaningful features for various robotic applications. Among these, Contrastive Language-Image Pretraining (CLIP) is widely used in robotic tasks that require both vision and natur…
View article: CathAction: A Benchmark for Endovascular Intervention Understanding
CathAction: A Benchmark for Endovascular Intervention Understanding Open
Real-time visual feedback from catheterization analysis is crucial for enhancing surgical safety and efficiency during endovascular interventions. However, existing datasets are often limited to specific tasks, small scale, and lack the co…
View article: Language-driven Grasp Detection with Mask-guided Attention
Language-driven Grasp Detection with Mask-guided Attention Open
Grasp detection is an essential task in robotics with various industrial applications. However, traditional methods often struggle with occlusions and do not utilize language for grasping. Incorporating natural language into grasp detectio…
View article: Lightweight Language-driven Grasp Detection using Conditional Consistency Model
Lightweight Language-driven Grasp Detection using Conditional Consistency Model Open
Language-driven grasp detection is a fundamental yet challenging task in robotics with various industrial applications. In this work, we present a new approach for language-driven grasp detection that leverages the concept of lightweight d…
View article: Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance
Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance Open
6-DoF grasp detection has been a fundamental and challenging problem in robotic vision. While previous works have focused on ensuring grasp stability, they often do not consider human intention conveyed through natural language, hindering …
View article: Language-driven Grasp Detection
Language-driven Grasp Detection Open
Grasp detection is a persistent and intricate challenge with various industrial applications. Recently, many methods and datasets have been proposed to tackle the grasp detection problem. However, most of them do not consider using natural…
View article: High-fidelity Endoscopic Image Synthesis by Utilizing Depth-guided Neural Surfaces
High-fidelity Endoscopic Image Synthesis by Utilizing Depth-guided Neural Surfaces Open
In surgical oncology, screening colonoscopy plays a pivotal role in providing diagnostic assistance, such as biopsy, and facilitating surgical navigation, particularly in polyp detection. Computer-assisted endoscopic surgery has recently g…
View article: Autonomous Catheterization with Open-source Simulator and Expert Trajectory
Autonomous Catheterization with Open-source Simulator and Expert Trajectory Open
Endovascular robots have been actively developed in both academia and industry. However, progress toward autonomous catheterization is often hampered by the widespread use of closed-source simulators and physical phantoms. Additionally, th…
View article: Residual Aligner-based Network (RAN): Motion-separable structure for coarse-to-fine discontinuous deformable registration
Residual Aligner-based Network (RAN): Motion-separable structure for coarse-to-fine discontinuous deformable registration Open
Deformable image registration, the estimation of the spatial transformation between different images, is an important task in medical imaging. Deep learning techniques have been shown to perform 3D image registration efficiently. However, …
View article: Shape-Sensitive Loss for Catheter and Guidewire Segmentation
Shape-Sensitive Loss for Catheter and Guidewire Segmentation Open
We introduce a shape-sensitive loss function for catheter and guidewire segmentation and utilize it in a vision transformer network to establish a new state-of-the-art result on a large-scale X-ray images dataset. We transform network-deri…
View article: 3D Guidewire Shape Reconstruction from Monoplane Fluoroscopic Images
3D Guidewire Shape Reconstruction from Monoplane Fluoroscopic Images Open
Endovascular navigation, essential for diagnosing and treating endovascular diseases, predominantly hinges on fluoroscopic images due to the constraints in sensory feedback. Current shape reconstruction techniques for endovascular interven…
View article: Language-driven Scene Synthesis using Multi-conditional Diffusion Model
Language-driven Scene Synthesis using Multi-conditional Diffusion Model Open
Scene synthesis is a challenging problem with several industrial applications. Recently, substantial efforts have been directed to synthesize the scene using human motions, room layouts, or spatial graphs as the input. However, few studies…
View article: Open-Vocabulary Affordance Detection using Knowledge Distillation and Text-Point Correlation
Open-Vocabulary Affordance Detection using Knowledge Distillation and Text-Point Correlation Open
Affordance detection presents intricate challenges and has a wide range of robotic applications. Previous works have faced limitations such as the complexities of 3D object shapes, the wide range of potential affordances on real-world obje…
View article: Language-Conditioned Affordance-Pose Detection in 3D Point Clouds
Language-Conditioned Affordance-Pose Detection in 3D Point Clouds Open
Affordance detection and pose estimation are of great importance in many robotic applications. Their combination helps the robot gain an enhanced manipulation capability, in which the generated pose can facilitate the corresponding afforda…
View article: Grasp-Anything: Large-scale Grasp Dataset from Foundation Models
Grasp-Anything: Large-scale Grasp Dataset from Foundation Models Open
Foundation models such as ChatGPT have made significant strides in robotic tasks due to their universal representation of real-world domains. In this paper, we leverage foundation models to tackle grasp detection, a persistent challenge in…