LLM-Guided Multimodal Information Fusion With Hierarchical Spatio-Temporal Graph Network for Sentiment Analysis Article Swipe

PDF

Related Concepts

Computer science Graph Information fusion Sentiment analysis Artificial intelligence Fusion Data mining Theoretical computer science Philosophy Linguistics

Yujie Jin , Yong Wang , Yuzhe Wang , Qiyang Chen , Bin Hu , Yanling Han , Chaohong Ma , Witold Pedrycz ·

YOU? · · 2025 · Open Access · · DOI: https://doi.org/10.4018/ijisss.388002 · OA: W4413975987

Multimodal sentiment analysis aims to attain a precise comprehension of emotions by integrating complementary textual, visual, and audio information. However, issues such as sentiment discrepancies between modalities, ineffective integration of multi-modal information, and the intricacy of order dependency significantly constrain the models' efficacy. The authors propose an LLM-guided Hierarchical Spatio-Temporal Graph Network (L-HSTGN). By multimodal large model feature enhancement, bidirectional spatio-temporal joint modeling, and dynamic gate fusion mechanism, they effectively address the aforementioned problems. Firstly, they produce cross-modal emotion pseudo-labels based on the multimodal large model, and the single-modal representation was optimized by combining adversarial regularization. Secondly, they develop a bidirectional spatio-temporal convolution module to concurrently extract local-global temporal characteristics and dynamic spatial correlations.