Zhouchen Lin
YOU?
Author Swipe
View article: PAM: a propagation-based model for segmenting any 3D objects across multi-modal medical images
PAM: a propagation-based model for segmenting any 3D objects across multi-modal medical images Open
Volumetric segmentation is a major challenge in medical imaging, as current methods require extensive annotations and retraining, limiting transferability across objects. We present PAM, a propagation-based framework that generates 3D segm…
View article: Rethinking Facial Expression Recognition in the Era of Multimodal Large Language Models: Benchmark, Datasets, and Beyond
Rethinking Facial Expression Recognition in the Era of Multimodal Large Language Models: Benchmark, Datasets, and Beyond Open
Multimodal Large Language Models (MLLMs) have revolutionized numerous research fields, including computer vision and affective computing. As a pivotal challenge in this interdisciplinary domain, facial expression recognition (FER) has evol…
View article: Quadratic Direct Forecast for Training Multi-Step Time-Series Forecast Models
Quadratic Direct Forecast for Training Multi-Step Time-Series Forecast Models Open
The design of training objective is central to training time-series forecasting models. Existing training objectives such as mean squared error mostly treat each future step as an independent, equally weighted task, which we found leading …
View article: Generalist++: A Meta-learning Framework for Mitigating Trade-off in Adversarial Training
Generalist++: A Meta-learning Framework for Mitigating Trade-off in Adversarial Training Open
Despite the rapid progress of neural networks, they remain highly vulnerable to adversarial examples, for which adversarial training (AT) is currently the most effective defense. While AT has been extensively studied, its practical applica…
View article: On the Limitations and Capabilities of Position Embeddings for Length Generalization
On the Limitations and Capabilities of Position Embeddings for Length Generalization Open
In Transformers, Position Embeddings (PEs) significantly influence Length Generalization (LG) performance, yet their fundamental role remains unclear. In this work, we investigate the limitations and capabilities of PEs in achieving LG. We…
View article: Explicit Discovery of Nonlinear Symmetries from Dynamic Data
Explicit Discovery of Nonlinear Symmetries from Dynamic Data Open
Symmetry is widely applied in problems such as the design of equivariant networks and the discovery of governing equations, but in complex scenarios, it is not known in advance. Most previous symmetry discovery methods are limited to linea…
View article: AI Pangaea: Unifying Intelligence Islands for Adapting Myriad Tasks
AI Pangaea: Unifying Intelligence Islands for Adapting Myriad Tasks Open
The pursuit of artificial general intelligence continuously demands generalization in one model across myriad tasks, even those not seen before. However, current AI models are isolated from each other for being limited to specific tasks, n…
View article: A Self-Ensemble Inspired Approach for Effective Training of Binary-Weight Spiking Neural Networks
A Self-Ensemble Inspired Approach for Effective Training of Binary-Weight Spiking Neural Networks Open
Spiking Neural Networks (SNNs) are a promising approach to low-power applications on neuromorphic hardware due to their energy efficiency. However, training SNNs is challenging because of the non-differentiable spike generation function. T…
View article: Proximity Matters: Local Proximity Enhanced Balancing for Treatment Effect Estimation
Proximity Matters: Local Proximity Enhanced Balancing for Treatment Effect Estimation Open
View article: DNT: a Deeply Normalized Transformer that can be trained by Momentum SGD
DNT: a Deeply Normalized Transformer that can be trained by Momentum SGD Open
Transformers have become the de facto backbone of modern deep learning, yet their training typically demands an advanced optimizer with adaptive learning rate like AdamW, rather than a momentum SGDW (mSGDW). Previous works show that it is …
View article: AV-NAS: Audio-Visual Multi-Level Semantic Neural Architecture Search for Video Hashing
AV-NAS: Audio-Visual Multi-Level Semantic Neural Architecture Search for Video Hashing Open
View article: Simple Convergence Proof of Adam From a Sign-like Descent Perspective
Simple Convergence Proof of Adam From a Sign-like Descent Perspective Open
Adam is widely recognized as one of the most effective optimizers for training deep neural networks (DNNs). Despite its remarkable empirical success, its theoretical convergence analysis remains unsatisfactory. Existing works predominantly…
View article: Machine Learning Models to Predict Individual Cognitive Load in Collaborative Learning: Combining fNIRS and Eye-Tracking Data
Machine Learning Models to Predict Individual Cognitive Load in Collaborative Learning: Combining fNIRS and Eye-Tracking Data Open
Effectively leveraging cognitive load predictions helps optimize collaborative learning design and implementation. This study explored the feasibility of predicting individual learners’ cognitive load during collaborative learning using a …
View article: MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios
MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios Open
Multimodal Large Language Models (MLLMs) have achieved considerable accuracy in Optical Character Recognition (OCR) from static images. However, their efficacy in video OCR is significantly diminished due to factors such as motion blur, te…
View article: Time-o1: Time-Series Forecasting Needs Transformed Label Alignment
Time-o1: Time-Series Forecasting Needs Transformed Label Alignment Open
Training time-series forecast models presents unique challenges in designing effective learning objectives. Existing methods predominantly utilize the temporal mean squared error, which faces two critical challenges: (1) label autocorrelat…
View article: On the $O(\frac{\sqrt{d}}{K^{1/4}})$ Convergence Rate of AdamW Measured by $\ell_1$ Norm
On the $O(\frac{\sqrt{d}}{K^{1/4}})$ Convergence Rate of AdamW Measured by $\ell_1$ Norm Open
As the default optimizer for training large language models, AdamW has achieved remarkable success in deep learning. However, its convergence behavior is not theoretically well-understood. This paper establishes the convergence rate $\frac…
View article: A Novel SHAP-GAN Network for Interpretable Ovarian Cancer Diagnosis
A Novel SHAP-GAN Network for Interpretable Ovarian Cancer Diagnosis Open
Ovarian cancer stands out as one of the most formidable adversaries in women’s health, largely due to its typically subtle and nonspecific early symptoms, which pose significant challenges to early detection and diagnosis. Although existin…
View article: Empowering LLMs with Logical Reasoning: A Comprehensive Survey
Empowering LLMs with Logical Reasoning: A Comprehensive Survey Open
Large language models (LLMs) have achieved remarkable successes on various tasks. However, recent studies have found that there are still significant challenges to the logical reasoning abilities of LLMs, which can be categorized into the …
View article: Optimization design of cross border intelligent marketing management model based on multi layer perceptron-grey wolf optimization convolutional neural network
Optimization design of cross border intelligent marketing management model based on multi layer perceptron-grey wolf optimization convolutional neural network Open
The cross-border intelligent marketing algorithm based on traditional linear models is relatively single in information feature extraction, making it difficult to effectively handle complex scenarios containing a large amount of implicit i…
View article: High-Rank Irreducible Cartesian Tensor Decomposition and Bases of Equivariant Spaces
High-Rank Irreducible Cartesian Tensor Decomposition and Bases of Equivariant Spaces Open
Irreducible Cartesian tensors (ICTs) play a crucial role in the design of equivariant graph neural networks, as well as in theoretical chemistry and chemical physics. Meanwhile, the design space of available linear operations on tensors th…
View article: An Integrated Algorithm with Feature Selection, Data Augmentation, and XGBoost for Ovarian Cancer
An Integrated Algorithm with Feature Selection, Data Augmentation, and XGBoost for Ovarian Cancer Open
Ovarian cancer is one of the most aggressive gynecological cancers due to its high invasion and chemoresistance. It not only has a high incidence rate but also tops the list of mortality rates. Its subtle early symptoms make subsequent dia…
View article: GL-Fusion: Rethinking the Combination of Graph Neural Network and Large Language model
GL-Fusion: Rethinking the Combination of Graph Neural Network and Large Language model Open
Recent research on integrating Large Language Models (LLMs) with Graph Neural Networks (GNNs) typically follows two approaches: LLM-centered models, which convert graph data into tokens for LLM processing, and GNN-centered models, which us…
View article: Convergence Rate Analysis of LION
Convergence Rate Analysis of LION Open
The LION (evoLved sIgn mOmeNtum) optimizer for deep neural network training was found by Google via program search, with the simple sign update yet showing impressive performance in training large scale networks. Although previous studies …
View article: Number Cookbook: Number Understanding of Language Models and How to Improve It
Number Cookbook: Number Understanding of Language Models and How to Improve It Open
Large language models (LLMs) can solve an increasing number of complex reasoning tasks while making surprising mistakes in basic numerical understanding and processing (such as 9.11 > 9.9). The latter ability is essential for tackling comp…
View article: MixCon: A Hybrid Architecture for Efficient and Adaptive Sequence Modeling
MixCon: A Hybrid Architecture for Efficient and Adaptive Sequence Modeling Open
Sequence modeling is a critical task in various domains such as natural language processing, speech recognition, and time series analysis. The existing models still face challenges in capturing long-range dependencies and efficiently model…
View article: Symmetry Discovery for Different Data Types
Symmetry Discovery for Different Data Types Open
Equivariant neural networks incorporate symmetries into their architecture, achieving higher generalization performance. However, constructing equivariant neural networks typically requires prior knowledge of data types and symmetries, whi…
View article: On the Adversarial Transferability of Generalized "Skip Connections"
On the Adversarial Transferability of Generalized "Skip Connections" Open
Skip connection is an essential ingredient for modern deep models to be deeper and more powerful. Despite their huge success in normal scenarios (state-of-the-art classification performance on natural examples), we investigate and identify…
View article: Low-Dimension-to-High-Dimension Generalization And Its Implications for Length Generalization
Low-Dimension-to-High-Dimension Generalization And Its Implications for Length Generalization Open
Low-Dimension-to-High-Dimension (LDHD) generalization is a special case of Out-of-Distribution (OOD) generalization, where the training data are restricted to a low-dimensional subspace of the high-dimensional testing space. Assuming that …
View article: Pyramidal Flow Matching for Efficient Video Generative Modeling
Pyramidal Flow Matching for Efficient Video Generative Modeling Open
Video generation requires modeling a vast spatiotemporal space, which demands significant computational resources and data usage. To reduce the complexity, the prevailing approaches employ a cascaded architecture to avoid direct training w…
View article: Incorporating Arbitrary Matrix Group Equivariance into KANs
Incorporating Arbitrary Matrix Group Equivariance into KANs Open
Kolmogorov-Arnold Networks (KANs) have seen great success in scientific domains thanks to spline activation functions, becoming an alternative to Multi-Layer Perceptrons (MLPs). However, spline functions may not respect symmetry in tasks, …