He Zhang
YOU?
Author Swipe
View article: MedSeqFT: Sequential Fine-tuning Foundation Models for 3D Medical Image Segmentation
MedSeqFT: Sequential Fine-tuning Foundation Models for 3D Medical Image Segmentation Open
Foundation models have become a promising paradigm for advancing medical image analysis, particularly for segmentation tasks where downstream applications often emerge sequentially. Existing fine-tuning strategies, however, remain limited:…
View article: LPGNet: A Lightweight Network with Parallel Attention and Gated Fusion for Multimodal Emotion Recognition
LPGNet: A Lightweight Network with Parallel Attention and Gated Fusion for Multimodal Emotion Recognition Open
Emotion recognition in conversations (ERC) aims to predict the emotional state of each utterance by using multiple input types, such as text and audio. While Transformer-based models have shown strong performance in this task, they often f…
View article: TOPSIS-driven Comprehensive Evaluation Model of Olympic Sports Selection
TOPSIS-driven Comprehensive Evaluation Model of Olympic Sports Selection Open
This paper presents an effective objective evaluation model for Olympic sports event selection, considering factors like popularity, sustainability, and gender equality. It breaks six main criteria into quantifiable sub - criteria, such as…
View article: VOCABTRIM: Vocabulary Pruning for Efficient Speculative Decoding in LLMs
VOCABTRIM: Vocabulary Pruning for Efficient Speculative Decoding in LLMs Open
In this paper, we introduce a simple training-free technique to improve the performance of drafter-based speculative decoding (SpD) methods that incorporates language modeling head (LM head) during drafting process. A drafter-based specula…
View article: Text2Relight: Creative Portrait Relighting with Text Guidance
Text2Relight: Creative Portrait Relighting with Text Guidance Open
We present a lighting-aware image editing pipeline that, given a portrait image and a text prompt, performs single image relighting. Our model modifies the lighting and color of both the foreground and background to align with the provided…
View article: Comprehensive Relighting: Generalizable and Consistent Monocular Human Relighting and Harmonization
Comprehensive Relighting: Generalizable and Consistent Monocular Human Relighting and Harmonization Open
This paper introduces Comprehensive Relighting, the first all-in-one approach that can both control and harmonize the lighting from an image or video of humans with arbitrary body parts from any scene. Building such a generalizable model i…
View article: Text2Relight: Creative Portrait Relighting with Text Guidance
Text2Relight: Creative Portrait Relighting with Text Guidance Open
We present a lighting-aware image editing pipeline that, given a portrait image and a text prompt, performs single image relighting. Our model modifies the lighting and color of both the foreground and background to align with the provided…
View article: UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics
UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics Open
We introduce UniReal, a unified framework designed to address various image generation and editing tasks. Existing solutions often vary by tasks, yet share fundamental principles: preserving consistency between inputs and outputs while cap…
View article: Research on ECG Signal Classification Based on Hybrid Residual Network
Research on ECG Signal Classification Based on Hybrid Residual Network Open
Arrhythmia detection in electrocardiogram (ECG) signals is essential for monitoring cardiovascular health. Current automated arrhythmia classification methods frequently encounter difficulties in detecting multiple cardiac abnormalities, p…
View article: Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation and Reconstruction
Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation and Reconstruction Open
Existing feedforward image-to-3D methods mainly rely on 2D multi-view diffusion models that cannot guarantee 3D consistency. These methods easily collapse when changing the prompt view direction and mainly handle object-centric cases. In t…
View article: Autonomous Character-Scene Interaction Synthesis from Text Instruction
Autonomous Character-Scene Interaction Synthesis from Text Instruction Open
Synthesizing human motions in 3D environments, particularly those with complex activities such as locomotion, hand-reaching, and human-object interaction, presents substantial demands for user-defined waypoints and stage transitions. These…
View article: GroundingBooth: Grounding Text-to-Image Customization
GroundingBooth: Grounding Text-to-Image Customization Open
Recent approaches in text-to-image customization have primarily focused on preserving the identity of the input subject, but often fail to control the spatial location and size of objects. We introduce GroundingBooth, which achieves zero-s…
View article: SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing
SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing Open
Effective editing of personal content holds a pivotal role in enabling individuals to express their creativity, weaving captivating narratives within their visual stories, and elevate the overall quality and impact of their visual content.…
View article: Holo-Relighting: Controllable Volumetric Portrait Relighting from a Single Image
Holo-Relighting: Controllable Volumetric Portrait Relighting from a Single Image Open
At the core of portrait photography is the search for ideal lighting and viewpoint. The process often requires advanced knowledge in photography and an elaborate studio setup. In this work, we propose Holo-Relighting, a volumetric relighti…
View article: Diffusion Model-Based Image Editing: A Survey
Diffusion Model-Based Image Editing: A Survey Open
Denoising diffusion models have emerged as a powerful tool for various image generation and editing tasks, facilitating the synthesis of visual content in an unconditional or input-conditional manner. The core idea behind them is learning …
View article: Coastline stability analysis of Zhoushan-Liuheng lng terminal project based on remote sensing
Coastline stability analysis of Zhoushan-Liuheng lng terminal project based on remote sensing Open
Seabed evolution research around Zhoushan Liuheng LNG receiving station project, which mainly includes collecting, analyzing and sorting data of environmental investigation, was conducted by the second Institute of Oceanography, MMR. Based…
View article: Learning Highly Dynamic Behaviors for Quadrupedal Robots
Learning Highly Dynamic Behaviors for Quadrupedal Robots Open
Learning highly dynamic behaviors for robots has been a longstanding challenge. Traditional approaches have demonstrated robust locomotion, but the exhibited behaviors lack diversity and agility. They employ approximate models, which lead …
View article: Perceptual Artifacts Localization for Image Synthesis Tasks
Perceptual Artifacts Localization for Image Synthesis Tasks Open
Recent advancements in deep generative models have facilitated the creation of photo-realistic images across various tasks. However, these generated images often exhibit perceptual artifacts in specific regions, necessitating manual correc…
View article: Neural Categorical Priors for Physics-Based Character Control
Neural Categorical Priors for Physics-Based Character Control Open
Recent advances in learning reusable motion priors have demonstrated their effectiveness in generating naturalistic behaviors. In this paper, we propose a new learning framework in this paradigm for controlling physics-based characters wit…
View article: Neural Quantile Optimization for Edge-Cloud Networking
Neural Quantile Optimization for Edge-Cloud Networking Open
We seek the best traffic allocation scheme for the edge-cloud computing network that satisfies constraints and minimizes the cost based on burstable billing. First, for a fixed network topology, we formulate a family of integer programming…
View article: Semi-supervised Parametric Real-world Image Harmonization
Semi-supervised Parametric Real-world Image Harmonization Open
Learning-based image harmonization techniques are usually trained to undo synthetic random global transformations applied to a masked foreground in a single ground truth photo. This simulated data does not model many of the important appea…
View article: Deep learning for the ovarian lesion localization and discrimination between borderline and malignant ovarian tumors based on routine MR imaging
Deep learning for the ovarian lesion localization and discrimination between borderline and malignant ovarian tumors based on routine MR imaging Open
To establish a deep learning (DL) model in differentiating borderline ovarian tumor (BOT) from epithelial ovarian cancer (EOC) on conventional MR imaging. We retrospectively enrolled 201 patients of 102 pathologically proven BOTs and 99 EO…
View article: Rethinking Portrait Matting with Privacy Preserving
Rethinking Portrait Matting with Privacy Preserving Open
Recently, there has been an increasing concern about the privacy issue raised by identifiable information in machine learning. However, previous portrait matting methods were all based on identifiable images. To fill the gap, we present P3…
View article: Interactive Portrait Harmonization
Interactive Portrait Harmonization Open
Current image harmonization methods consider the entire background as the guidance for harmonization. However, this may limit the capability for user to choose any specific object/person in the background to guide the harmonization. To ena…
View article: Boosting Robustness of Image Matting with Context Assembling and Strong Data Augmentation
Boosting Robustness of Image Matting with Context Assembling and Strong Data Augmentation Open
Deep image matting methods have achieved increasingly better results on benchmarks (e.g., Composition-1k/alphamatting.com). However, the robustness, including robustness to trimaps and generalization to images from different domains, is st…
View article: Enhanced densely dehazing network for single image haze removal under railway scenes
Enhanced densely dehazing network for single image haze removal under railway scenes Open
Purpose This paper aims to propose an enhanced densely dehazing network to suit railway scenes’ features and improve the visual quality degraded by haze and fog. Design/methodology/approach It is an end-to-end network based on DenseNet. Th…