Yong Dou
YOU?
Author Swipe
View article: Process-Oriented Modeling and Performance Optimization of Intelligent Traffic Systems Using Stochastic Petri Nets
Process-Oriented Modeling and Performance Optimization of Intelligent Traffic Systems Using Stochastic Petri Nets Open
To address the dynamic characteristics of data collection, risk assessment, and response execution in intelligent traffic warning systems, this study proposes a modeling and performance analysis framework based on Stochastic Petri Nets (SP…
View article: AudioSet-R: A Refined AudioSet with Multi-Stage LLM Label Reannotation
AudioSet-R: A Refined AudioSet with Multi-Stage LLM Label Reannotation Open
AudioSet is a widely used benchmark in the audio research community and has significantly advanced various audio-related tasks. However, persistent issues with label accuracy and completeness remain critical bottlenecks that limit performa…
View article: SPARC: Soft Probabilistic Adaptive multi-interest Retrieval Model via Codebooks for recommender system
SPARC: Soft Probabilistic Adaptive multi-interest Retrieval Model via Codebooks for recommender system Open
Modeling multi-interests has arisen as a core problem in real-world RS. Current multi-interest retrieval methods pose three major challenges: 1) Interests, typically extracted from predefined external knowledge, are invariant. Failed to dy…
View article: Regist3R: Incremental Registration with Stereo Foundation Model
Regist3R: Incremental Registration with Stereo Foundation Model Open
Multi-view 3D reconstruction has remained an essential yet challenging problem in the field of computer vision. While DUSt3R and its successors have achieved breakthroughs in 3D reconstruction from unposed images, these methods exhibit sig…
View article: Maintaining Fairness in Logit-based Knowledge Distillation for Class-Incremental Learning
Maintaining Fairness in Logit-based Knowledge Distillation for Class-Incremental Learning Open
Logit-based knowledge distillation (KD) is commonly used to mitigate catastrophic forgetting in class-incremental learning (CIL) caused by data distribution shifts. However, the strict match of logit values between student and teacher mode…
View article: A study on the relationship between spore count and color difference values during the mildewing process of paper wine boxes
A study on the relationship between spore count and color difference values during the mildewing process of paper wine boxes Open
Changes in the number of mold spores and the color difference values of cardboard were evaluated during the molding process of paper wine boxes. The experiment utilized three types of cardboard: single white industrial paperboard (Q), grey…
View article: Robust self supervised symmetric nonnegative matrix factorization to the graph clustering
Robust self supervised symmetric nonnegative matrix factorization to the graph clustering Open
Graph clustering is a fundamental task in network analysis, aimed at uncovering meaningful groups of nodes based on structural and attribute-based similarities. Traditional Nonnegative Matrix Factorization (NMF) methods have shown promise …
View article: Audio-Language Models for Audio-Centric Tasks: A survey
Audio-Language Models for Audio-Centric Tasks: A survey Open
Audio-Language Models (ALMs), which are trained on audio-text data, focus on the processing, understanding, and reasoning of sounds. Unlike traditional supervised learning approaches learning from predefined labels, ALMs utilize natural la…
View article: Prediction of compressive strength of concrete based on artificial neural network and sensitivity analysis of combination factors
Prediction of compressive strength of concrete based on artificial neural network and sensitivity analysis of combination factors Open
To investigate factors affecting the compressive strength of concrete in pumped-storage facilities, this study developed a predictive analytical model based on artificial neural networks. A dataset of experimental concrete mixtures was emp…
View article: Large Pretrained Foundation Model for Key Performance Indicator Multivariate Time Series Anomaly Detection
Large Pretrained Foundation Model for Key Performance Indicator Multivariate Time Series Anomaly Detection Open
In the realm of Key Performance Indicator (KPI) anomaly detection, deep learning has emerged as a pivotal technology. Yet, the development of effective deep learning models is hindered by several challenges: scarce and complex labeled data…
View article: AudioCIL: A Python Toolbox for Audio Class-Incremental Learning with Multiple Scenes
AudioCIL: A Python Toolbox for Audio Class-Incremental Learning with Multiple Scenes Open
Deep learning, with its robust aotomatic feature extraction capabilities, has demonstrated significant success in audio signal processing. Typically, these methods rely on static, pre-collected large-scale datasets for training, performing…
View article: RuleRAG: Rule-Guided Retrieval-Augmented Generation with Language Models for Question Answering
RuleRAG: Rule-Guided Retrieval-Augmented Generation with Language Models for Question Answering Open
Retrieval-augmented generation (RAG) has shown promising potential in knowledge intensive question answering (QA). However, existing approaches only consider the query itself, neither specifying the retrieval preferences for the retrievers…
View article: Contrastive Learning-based Chaining-Cluster for Multilingual Voice-Face Association
Contrastive Learning-based Chaining-Cluster for Multilingual Voice-Face Association Open
The innate correlation between a person's face and voice has recently emerged as a compelling area of study, especially within the context of multilingual environments. This paper introduces our novel solution to the Face-Voice Association…
View article: PointBLIP: Zero-Training Point Cloud Classification Network Based on BLIP-2 Model
PointBLIP: Zero-Training Point Cloud Classification Network Based on BLIP-2 Model Open
Leveraging the open-world understanding capacity of large-scale visual-language pre-trained models has become a hot spot in point cloud classification. Recent approaches rely on transferable visual-language pre-trained models, classifying …
View article: VoxNeuS: Enhancing Voxel-Based Neural Surface Reconstruction via Gradient Interpolation
VoxNeuS: Enhancing Voxel-Based Neural Surface Reconstruction via Gradient Interpolation Open
Neural Surface Reconstruction learns a Signed Distance Field~(SDF) to reconstruct the 3D model from multi-view images. Previous works adopt voxel-based explicit representation to improve efficiency. However, they ignored the gradient insta…
View article: VoiceStyle: Voice-Based Face Generation via Cross-Modal Prototype Contrastive Learning
VoiceStyle: Voice-Based Face Generation via Cross-Modal Prototype Contrastive Learning Open
Can we predict a person’s appearance solely based on their voice? This article explores this question by focusing on generating a face from an unheard voice segment. Our proposed method, VoiceStyle, combines cross-modal representation lear…
View article: DistGrid: Scalable Scene Reconstruction with Distributed Multi-resolution Hash Grid
DistGrid: Scalable Scene Reconstruction with Distributed Multi-resolution Hash Grid Open
Neural Radiance Field~(NeRF) achieves extremely high quality in object-scaled and indoor scene reconstruction. However, there exist some challenges when reconstructing large-scale scenes. MLP-based NeRFs suffer from limited network capacit…
View article: Deep RAW Image Super-Resolution. A NTIRE 2024 Challenge Survey
Deep RAW Image Super-Resolution. A NTIRE 2024 Challenge Survey Open
This paper reviews the NTIRE 2024 RAW Image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, th…
View article: Hierarchical Shared Encoder With Task-Specific Transformer Layer Selection for Emotion-Cause Pair Extraction
Hierarchical Shared Encoder With Task-Specific Transformer Layer Selection for Emotion-Cause Pair Extraction Open
Emotion Cause Pair Extraction (ECPE) aims to extract emotions and their causes from a document. Powerful emotion and cause extraction abilities have proven essential in achieving accurate ECPE. However, most existing methods employ shared …
View article: AbsGS: Recovering Fine Details for 3D Gaussian Splatting
AbsGS: Recovering Fine Details for 3D Gaussian Splatting Open
3D Gaussian Splatting (3D-GS) technique couples 3D Gaussian primitives with differentiable rasterization to achieve high-quality novel view synthesis results while providing advanced real-time rendering performance. However, due to the fla…
View article: PointBLIP: Zero-Training Point Cloud Classification Network Based on BLIP-2 Model
PointBLIP: Zero-Training Point Cloud Classification Network Based on BLIP-2 Model Open
Leveraging the open-world understanding capacity of large-scale visual-language pre-trained models has become a hot-spot in point cloud classification. Recent approaches rely on transferable visual-language pre-trained models, classifying …
View article: Effects of Anesthetics on Cardiac Repolarization in Adults: A Network Meta-Analysis of Randomized Clinical Trials
Effects of Anesthetics on Cardiac Repolarization in Adults: A Network Meta-Analysis of Randomized Clinical Trials Open
Objectives: Prolongation of cardiac repolarization, especially the heart rate-corrected QT (QTc) interval, is associated with life-threatening dysrhythmias. This study aimed to identify the anesthetic with the lowest risk of prolonging car…
View article: Self-Supervised Learning-For Underwater Acoustic Signal Classification With Mixup
Self-Supervised Learning-For Underwater Acoustic Signal Classification With Mixup Open
Underwater acoustic signal classification is a critical task that involves identifying different types of signals in a complex and dynamic underwater environment, which is often contaminated by strong ambient noise. Recent studies have dem…
View article: Incorporating Structured Sentences with Time-enhanced BERT for Fully-inductive Temporal Relation Prediction
Incorporating Structured Sentences with Time-enhanced BERT for Fully-inductive Temporal Relation Prediction Open
Temporal relation prediction in incomplete temporal knowledge graphs (TKGs) is a popular temporal knowledge graph completion (TKGC) problem in both transductive and inductive settings. Traditional embedding-based TKGC models (TKGE) rely on…
View article: Towards Vision Transformer Unrolling Fixed-Point Algorithm: a Case Study on Image Restoration
Towards Vision Transformer Unrolling Fixed-Point Algorithm: a Case Study on Image Restoration Open
The great success of Deep Neural Networks (DNNs) has inspired the algorithmic development of DNN-based Fixed-Point (DNN-FP) for computer vision tasks. DNN-FP methods, trained by Back-Propagation Through Time or computing the inaccurate inv…
View article: Temporal Extrapolation and Knowledge Transfer for Lifelong Temporal Knowledge Graph Reasoning
Temporal Extrapolation and Knowledge Transfer for Lifelong Temporal Knowledge Graph Reasoning Open
Real-world Temporal Knowledge Graphs keep growing with time and new entities and facts emerge continually, necessitating a model that can extrapolate to future timestamps and transfer knowledge for new components. Therefore, our work first…
View article: A Snapshot Assist scanner: Assessment of Radiation Dose and Image Quality in retrospective ECG-gated Coronary CT Angiography
A Snapshot Assist scanner: Assessment of Radiation Dose and Image Quality in retrospective ECG-gated Coronary CT Angiography Open
Objective: This study sought to compare the radiation dose and image quality between a snapshot assist (SSA) scanner and a conventional scanner in retrospective electrocardiographically (ECG)-gated coronary computed tomographic (CT) angiog…