Bo Du
YOU?
Author Swipe
View article: From pretraining to privacy: federated ultrasound foundation model with self-supervised learning
From pretraining to privacy: federated ultrasound foundation model with self-supervised learning Open
View article: BiaMix Contrastive Learning and Memory Similarity Distillation in Class‐Incremental Learning
BiaMix Contrastive Learning and Memory Similarity Distillation in Class‐Incremental Learning Open
Class‐incremental learning studies the problem of continually learning new classes from data streams. But networks suffer from catastrophic forgetting problems, forgetting past knowledge when acquiring new knowledge. Among different approa…
View article: D$^2$GS: Depth-and-Density Guided Gaussian Splatting for Stable and Accurate Sparse-View Reconstruction
D$^2$GS: Depth-and-Density Guided Gaussian Splatting for Stable and Accurate Sparse-View Reconstruction Open
Recent advances in 3D Gaussian Splatting (3DGS) enable real-time, high-fidelity novel view synthesis (NVS) with explicit 3D representations. However, performance degradation and instability remain significant under sparse-view conditions. …
View article: Computed tomography-based quantification of intra-tumoral heterogeneity for predicting treatment response to concurrent chemoradiotherapy in non-small cell lung cancer: a multicenter retrospective study
Computed tomography-based quantification of intra-tumoral heterogeneity for predicting treatment response to concurrent chemoradiotherapy in non-small cell lung cancer: a multicenter retrospective study Open
The CT‑based ITH model exhibited robust predictive accuracy, surpassing standalone clinical and conventional radiomic models. A unified model combining clinical, radiomic, and ITH features further enhanced prognostic precision for CCRT res…
View article: Remote Sensing Tuning: A Survey
Remote Sensing Tuning: A Survey Open
View article: A novel radiomics model combining GTVp, GTVnd, and clinical data for chemoradiotherapy response prediction in patients with advanced NSCLC
A novel radiomics model combining GTVp, GTVnd, and clinical data for chemoradiotherapy response prediction in patients with advanced NSCLC Open
Background Numerous radiomic models have been developed to predict treatment outcomes in patients with NSCLC receiving chemotherapy and radiation therapy. However, computed tomography (CT) radiomic models that integrate the Gross Tumour Vo…
View article: Coarse-to-fine crack cue for robust crack detection
Coarse-to-fine crack cue for robust crack detection Open
Crack detection is an important task in computer vision. Despite impressive in-dataset performance, deep learning-based methods still struggle in generalizing to unseen domains. The thin structure property of cracks is usually overlooked b…
View article: Rethinking Visual Token Reduction in LVLMs under Cross-modal Misalignment
Rethinking Visual Token Reduction in LVLMs under Cross-modal Misalignment Open
Large Vision-Language Models (LVLMs) encode visual inputs as dense sequences of patch-level tokens to capture fine-grained semantics. These visual tokens often outnumber their textual counterparts by a large margin, leading to substantial …
View article: Rethink Sparse Signals for Pose-guided Text-to-image Generation
Rethink Sparse Signals for Pose-guided Text-to-image Generation Open
Recent works favored dense signals (e.g., depth, DensePose), as an alternative to sparse signals (e.g., OpenPose), to provide detailed spatial guidance for pose-guided text-to-image generation. However, dense representations raised new cha…
View article: Resolving Knowledge Conflicts in Domain-specific Data Selection: A Case Study on Medical Instruction-tuning
Resolving Knowledge Conflicts in Domain-specific Data Selection: A Case Study on Medical Instruction-tuning Open
Domain-specific instruction-tuning has become the defacto standard for improving the performance of large language models (LLMs) in specialized applications, e.g., medical question answering. Since the instruction-tuning dataset might cont…
View article: Benchmarking Endoscopic Surgical Image Restoration and Beyond
Benchmarking Endoscopic Surgical Image Restoration and Beyond Open
In endoscopic surgery, a clear and high-quality visual field is critical for surgeons to make accurate intraoperative decisions. However, persistent visual degradation, including smoke generated by energy devices, lens fogging from thermal…
View article: KaFT: Knowledge-aware Fine-tuning for Boosting LLMs' Domain-specific Question-Answering Performance
KaFT: Knowledge-aware Fine-tuning for Boosting LLMs' Domain-specific Question-Answering Performance Open
Supervised fine-tuning (SFT) is a common approach to improve the domain-specific question-answering (QA) performance of large language models (LLMs). However, recent literature reveals that due to the conflicts between LLMs' internal knowl…
View article: WaterDiffusion: Learning a Prior-involved Unrolling Diffusion for Joint Underwater Saliency Detection and Visual Restoration
WaterDiffusion: Learning a Prior-involved Unrolling Diffusion for Joint Underwater Saliency Detection and Visual Restoration Open
Underwater salient object detection (USOD) plays a pivotal role in various vision-based marine exploration tasks. However, existing USOD techniques face the dilemma of object mislocalization and imprecise boundaries due to the complex unde…
View article: Improving Complex Reasoning over Knowledge Graph with Logic-Aware Curriculum Tuning
Improving Complex Reasoning over Knowledge Graph with Logic-Aware Curriculum Tuning Open
Answering complex queries over incomplete knowledge graphs (KGs) is a challenging job. Most previous works have focused on learning entity/relation embeddings and simulating first-order logic operators with various neural networks. However…
View article: Vox-UDA: Voxel-wise Unsupervised Domain Adaptation for Cryo-Electron Subtomogram Segmentation with Denoised Pseudo-Labeling
Vox-UDA: Voxel-wise Unsupervised Domain Adaptation for Cryo-Electron Subtomogram Segmentation with Denoised Pseudo-Labeling Open
Cryo-Electron Tomography (cryo-ET) is a 3D imaging technology that facilitates the study of macromolecular structures at near-atomic resolution. Recent volumetric segmentation approaches on cryo-ET images have drawn widespread interest in …
View article: Mesoscopic Insights: Orchestrating Multi-Scale & Hybrid Architecture for Image Manipulation Localization
Mesoscopic Insights: Orchestrating Multi-Scale & Hybrid Architecture for Image Manipulation Localization Open
The mesoscopic level serves as a bridge between the macroscopic and microscopic worlds, addressing gaps overlooked by both. Image manipulation localization (IML), a crucial technique to pursue truth from fake images, has long relied on low…
View article: MobileSteward: Integrating Multiple App-Oriented Agents with Self-Evolution to Automate Cross-App Instructions
MobileSteward: Integrating Multiple App-Oriented Agents with Self-Evolution to Automate Cross-App Instructions Open
Mobile phone agents can assist people in automating daily tasks on their phones, which have emerged as a pivotal research spotlight. However, existing procedure-oriented agents struggle with cross-app instructions, due to the following cha…
View article: Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model
Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model Open
Multi-modal Large Language Models (MLLMs) integrate visual and linguistic reasoning to address complex tasks such as image captioning and visual question answering. While MLLMs demonstrate remarkable versatility, MLLMs appears limited perf…
View article: Development and validation of a prediction model based on two-dimensional dose distribution maps fused with computed tomography images for noninvasive prediction of radiochemotherapy resistance in non-small cell lung cancer
Development and validation of a prediction model based on two-dimensional dose distribution maps fused with computed tomography images for noninvasive prediction of radiochemotherapy resistance in non-small cell lung cancer Open
Compared to the traditional radiomics model, the 2D dosiomics model demonstrates superior predictive performance. The combined model based on clinical data, radiomics, and dosiomics has improved the prediction of radiochemotherapy resistan…
View article: Knowledge-aware contrastive heterogeneous molecular graph learning
Knowledge-aware contrastive heterogeneous molecular graph learning Open
Molecular representation learning is pivotal in predicting molecular properties and advancing drug design. Traditional methodologies, which predominantly rely on homogeneous graph encoding, are limited by their inability to integrate exter…
View article: Color Correction Meets Cross-Spectral Refinement: A Distribution-Aware Diffusion for Underwater Image Restoration
Color Correction Meets Cross-Spectral Refinement: A Distribution-Aware Diffusion for Underwater Image Restoration Open
Underwater imaging often suffers from significant visual degradation, which limits its suitability for subsequent applications. While recent underwater image enhancement (UIE) methods rely on the current advances in deep neural network arc…
View article: Robust Asymmetric Heterogeneous Federated Learning With Corrupted Clients
Robust Asymmetric Heterogeneous Federated Learning With Corrupted Clients Open
This paper studies a challenging robust federated learning task with model heterogeneous and data corrupted clients, where the clients have different local model structures. Data corruption is unavoidable due to factors, such as random noi…
View article: TrafficDepth: Road-side Keypoint Based Monocular Depth Estimation
TrafficDepth: Road-side Keypoint Based Monocular Depth Estimation Open
View article: CollagePrompt: A Benchmark for Budget-Friendly Visual Recognition with GPT-4V
CollagePrompt: A Benchmark for Budget-Friendly Visual Recognition with GPT-4V Open
View article: Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient
Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient Open
View article: Multilingual Text-to-Image Person Retrieval via Bidirectional Relation Reasoning and Aligning
Multilingual Text-to-Image Person Retrieval via Bidirectional Relation Reasoning and Aligning Open
Text-to-image person retrieval (TIPR) aims to identify the target person using textual descriptions, facing challenge in modality heterogeneity. Prior works have attempted to address it by developing cross-modal global or local alignment s…
View article: SAM-Powered Building Footprint Updating for Various Cities: Sparse Labels meet Historical Data Repurposing in Urban Monitoring
SAM-Powered Building Footprint Updating for Various Cities: Sparse Labels meet Historical Data Repurposing in Urban Monitoring Open
View article: Sam-Powered Building Footprint Updating for Global Cities: Sparse Labels Meet Historical Data Repurposing in Urban Monitoring
Sam-Powered Building Footprint Updating for Global Cities: Sparse Labels Meet Historical Data Repurposing in Urban Monitoring Open
View article: Efficient Relational Context Perception for Knowledge Graph Completion
Efficient Relational Context Perception for Knowledge Graph Completion Open
Knowledge Graphs (KGs) provide a structured representation of knowledge but often suffer from challenges of incompleteness. To address this, link prediction or knowledge graph completion (KGC) aims to infer missing new facts based on exist…
View article: Mesoscopic Insights: Orchestrating Multi-scale & Hybrid Architecture for Image Manipulation Localization
Mesoscopic Insights: Orchestrating Multi-scale & Hybrid Architecture for Image Manipulation Localization Open
The mesoscopic level serves as a bridge between the macroscopic and microscopic worlds, addressing gaps overlooked by both. Image manipulation localization (IML), a crucial technique to pursue truth from fake images, has long relied on low…