Jiajun Bu
YOU?
Author Swipe
View article: Hubness Reduction with Dual Bank Sinkhorn Normalization for Cross-Modal Retrieval
Hubness Reduction with Dual Bank Sinkhorn Normalization for Cross-Modal Retrieval Open
The past decade has witnessed rapid advancements in cross-modal retrieval, with significant progress made in accurately measuring the similarity between cross-modal pairs. However, the persistent hubness problem, a phenomenon where a small…
View article: Development and Validation of a Brain Aging Biomarker in Middle-Aged and Older Adults: Deep Learning Approach
Development and Validation of a Brain Aging Biomarker in Middle-Aged and Older Adults: Deep Learning Approach Open
Background Precise assessment of brain aging is crucial for early detection of neurodegenerative disorders and aiding clinical practice. Existing magnetic resonance imaging (MRI)–based methods excel in this task, but they still have room f…
View article: OpenGT: A Comprehensive Benchmark For Graph Transformers
OpenGT: A Comprehensive Benchmark For Graph Transformers Open
Graph Transformers (GTs) have recently demonstrated remarkable performance across diverse domains. By leveraging attention mechanisms, GTs are capable of modeling long-range dependencies and complex structural relationships beyond local ne…
View article: FocusedAD: Character-centric Movie Audio Description
FocusedAD: Character-centric Movie Audio Description Open
Movie Audio Description (AD) aims to narrate visual content during dialogue-free segments, particularly benefiting blind and visually impaired (BVI) audiences. Compared with general video captioning, AD demands plot-relevant narration with…
View article: MetaNeRV: Meta Neural Representations for Videos with Spatial-Temporal Guidance
MetaNeRV: Meta Neural Representations for Videos with Spatial-Temporal Guidance Open
Neural Representations for Videos (NeRV) has emerged as a promising implicit neural representation (INR) approach for video analysis, which represents videos as neural networks with frame indexes as inputs. However, NeRV-based methods are …
View article: ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data
ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data Open
Recently, large language models (LLMs) and multimodal large language models (MLLMs) have demonstrated promising results on document visual question answering (VQA) task, particularly after training on document instruction datasets. An effe…
View article: One Head Eight Arms: Block Matrix based Low Rank Adaptation for CLIP-based Few-Shot Learning
One Head Eight Arms: Block Matrix based Low Rank Adaptation for CLIP-based Few-Shot Learning Open
Recent advancements in fine-tuning Vision-Language Foundation Models (VLMs) have garnered significant attention for their effectiveness in downstream few-shot learning tasks.While these recent approaches exhibits some performance improveme…
View article: MetaNeRV: Meta Neural Representations for Videos with Spatial-Temporal Guidance
MetaNeRV: Meta Neural Representations for Videos with Spatial-Temporal Guidance Open
Neural Representations for Videos (NeRV) has emerged as a promising implicit neural representation (INR) approach for video analysis, which represents videos as neural networks with frame indexes as inputs. However, NeRV-based methods are …
View article: Cold-Start Recommendation towards the Era of Large Language Models (LLMs): A Comprehensive Survey and Roadmap
Cold-Start Recommendation towards the Era of Large Language Models (LLMs): A Comprehensive Survey and Roadmap Open
Cold-start problem is one of the long-standing challenges in recommender systems, focusing on accurately modeling new or interaction-limited users or items to provide better recommendations. Due to the diversification of internet platforms…
View article: Precision Adverse Drug Reactions Prediction with Heterogeneous Graph Neural Network
Precision Adverse Drug Reactions Prediction with Heterogeneous Graph Neural Network Open
Accurate prediction of Adverse Drug Reactions (ADRs) at the patient level is essential for ensuring patient safety and optimizing healthcare outcomes. Traditional machine learning‐based methods primarily focus on predicting potential ADRs …
View article: Correlation-Aware Graph Convolutional Networks for Multi-Label Node Classification
Correlation-Aware Graph Convolutional Networks for Multi-Label Node Classification Open
Multi-label node classification is an important yet under-explored domain in graph mining as many real-world nodes belong to multiple categories rather than just a single one. Although a few efforts have been made by utilizing Graph Convol…
View article: FGP: Feature-Gradient-Prune for Efficient Convolutional Layer Pruning
FGP: Feature-Gradient-Prune for Efficient Convolutional Layer Pruning Open
To reduce computational overhead while maintaining model performance, model pruning techniques have been proposed. Among these, structured pruning, which removes entire convolutional channels or layers, significantly enhances computational…
View article: TSINR: Capturing Temporal Continuity via Implicit Neural Representations for Time Series Anomaly Detection
TSINR: Capturing Temporal Continuity via Implicit Neural Representations for Time Series Anomaly Detection Open
Time series anomaly detection aims to identify unusual patterns in data or deviations from systems' expected behavior. The reconstruction-based methods are the mainstream in this task, which learn point-wise representation via unsupervised…
View article: Is Cognition Consistent with Perception? Assessing and Mitigating Multimodal Knowledge Conflicts in Document Understanding
Is Cognition Consistent with Perception? Assessing and Mitigating Multimodal Knowledge Conflicts in Document Understanding Open
Multimodal large language models (MLLMs) have shown impressive capabilities in document understanding, a rapidly growing research area with significant industrial demand. As a multimodal task, document understanding requires models to poss…
View article: A Comprehensive Survey on Deep Clustering: Taxonomy, Challenges, and Future Directions
A Comprehensive Survey on Deep Clustering: Taxonomy, Challenges, and Future Directions Open
Clustering is a fundamental machine learning task, which aim at assigning instances into groups so that similar samples belong to the same cluster while dissimilar samples belong to different clusters. Shallow clustering methods usually as…
View article: DeepASD: a deep adversarial-regularized graph learning method for ASD diagnosis with multimodal data
DeepASD: a deep adversarial-regularized graph learning method for ASD diagnosis with multimodal data Open
Autism Spectrum Disorder (ASD) is a prevalent neurological condition with multiple co-occurring comorbidities that seriously affect mental health. Precisely diagnosis of ASD is crucial to intervention and rehabilitation. A single modality …
View article: Neural Memory State Space Models for Medical Image Segmentation
Neural Memory State Space Models for Medical Image Segmentation Open
With the rapid advancement of deep learning, computer-aided diagnosis and treatment have become crucial in medicine. UNet is a widely used architecture for medical image segmentation, and various methods for improving UNet have been extens…
View article: Attention Beats Linear for Fast Implicit Neural Representation Generation
Attention Beats Linear for Fast Implicit Neural Representation Generation Open
Implicit Neural Representation (INR) has gained increasing popularity as a data representation method, serving as a prerequisite for innovative generation models. Unlike gradient-based methods, which exhibit lower efficiency in inference, …
View article: WebRPG: Automatic Web Rendering Parameters Generation for Visual Presentation
WebRPG: Automatic Web Rendering Parameters Generation for Visual Presentation Open
In the era of content creation revolution propelled by advancements in generative models, the field of web design remains unexplored despite its critical role in modern digital communication. The web design process is complex and often tim…
View article: ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data
ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data Open
Recently, large language models (LLMs) and multimodal large language models (MLLMs) have demonstrated promising results on document visual question answering (VQA) task, particularly after training on document instruction datasets. An effe…
View article: Revisiting, Benchmarking and Understanding Unsupervised Graph Domain Adaptation
Revisiting, Benchmarking and Understanding Unsupervised Graph Domain Adaptation Open
Unsupervised Graph Domain Adaptation (UGDA) involves the transfer of knowledge from a label-rich source graph to an unlabeled target graph under domain discrepancies. Despite the proliferation of methods designed for this emerging task, th…
View article: NoisyGL: A Comprehensive Benchmark for Graph Neural Networks under Label Noise
NoisyGL: A Comprehensive Benchmark for Graph Neural Networks under Label Noise Open
Graph Neural Networks (GNNs) exhibit strong potential in node classification task through a message-passing mechanism. However, their performance often hinges on high-quality node labels, which are challenging to obtain in real-world scena…
View article: Better Late Than Never: Formulating and Benchmarking Recommendation Editing
Better Late Than Never: Formulating and Benchmarking Recommendation Editing Open
Recommendation systems play a pivotal role in suggesting items to users based on their preferences. However, in online platforms, these systems inevitably offer unsuitable recommendations due to limited model capacity, poor data quality, o…
View article: Towards a Unified Framework of Clustering-based Anomaly Detection
Towards a Unified Framework of Clustering-based Anomaly Detection Open
Unsupervised Anomaly Detection (UAD) plays a crucial role in identifying abnormal patterns within data without labeled examples, holding significant practical implications across various domains. Although the individual contributions of re…