Explanipedia

GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation Open

Gaowen Liu, Chongyu Wang, Rongjie Li, Yingchen Yu, Xuming He , et al. · 2025

While Multimodal Large Language Models (MLLMs) have advanced GUI navigation agents, current approaches face limitations in cross-domain generalization and effective history utilization. We present a reasoning-enhanced framework that system…

Physics-informed machine learning-based real-time long-horizon temperature fields prediction in metallic additive manufacturing Open

Miao Tian, Haochen Mu, Gaowen Liu, Mengjiao Li, Donghong Ding , et al. · 2025

Real-time long-horizon temperature prediction in wire arc additive manufacturing is critical for process control and quality assurance. However, finite element methods are computationally expensive, and the existing data-driven models suff…

Orientation-anchored Hyper-Gaussian for 4D Reconstruction from Casual Videos Open

Junyi Wu, Jiachen Tao, Haoxuan Wang, Gaowen Liu, Ramana Rao Kompella , et al. · 2025

We present Orientation-anchored Gaussian Splatting (OriGS), a novel framework for high-quality 4D reconstruction from casually captured monocular videos. While recent advances extend 3D Gaussian Splatting to dynamic scenes via various moti…

Efficient Multimodal Dataset Distillation via Generative Models Open

Zhenghao Zhao, Haoxuan Wang, Junyi Wu, Yuzhang Shang, Gaowen Liu , et al. · 2025

Dataset distillation aims to synthesize a small dataset from a large dataset, enabling the model trained on it to perform well on the original dataset. With the blooming of large language models and multimodal large language models, the im…

A Content-dependent Watermark for Safeguarding Image Attribution Open

Tong Zhou, Ruyi Ding, Gaowen Liu, Charles B. Fleming, Ramana Rao Kompella , et al. · 2025

The rapid growth of digital and AI-generated images has amplified the need for secure and verifiable methods of image attribution. While digital watermarking offers more robust protection than metadata-based approaches--which can be easily…

Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search Open

Xin Lai, Jun‐Yi Li, Wei Li, Gaowen Liu, Tianjian Li , et al. · 2025

Recent advances in large multimodal models have leveraged image-based tools with reinforcement learning to tackle visual problems. However, existing open-source approaches often exhibit monotonous reasoning patterns and allow only a limite…

How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on $τ$-bench Open

Venkatesh Mishra, Amir Saeidi, Satyam Raj, Mutsumi Nakamura, Jayanth Srinivasa , et al. · 2025

Recent advances in reasoning and planning capabilities of large language models (LLMs) have enabled their potential as autonomous agents capable of tool use in dynamic environments. However, in multi-turn conversational environments like $…

A multi-model management approach for power system transient stability assessment based on multi-moment feature clustering Open

Xiaoyu Han, Gaowen Liu, Defu Cai, Runhuai Chen, Erxi Wang , et al. · 2025

Collision- and Reachability-Aware Multi-Robot Control with Grounded LLM Planners Open

Jiabao Ji, Yongchao Chen, Yang Zhang, Ramana Rao Kompella, Chuchu Fan , et al. · 2025

Large language models (LLMs) have demonstrated strong performance in various robot control tasks. However, their deployment in real-world applications remains constrained. Even state-ofthe-art LLMs, such as GPT-o4mini, frequently produce i…

Relation-aware Hierarchical Prompt for Open-vocabulary Scene Graph Generation Open

Gaowen Liu, Rongjie Li, Chongyu Wang, Xuming He · 2025

Open-vocabulary Scene Graph Generation (OV-SGG) overcomes the limitations of the closed-set assumption by aligning visual relationship representations with open-vocabulary textual representations. This enables the identification of novel v…

VQTalker: Towards Multilingual Talking Avatars Through Facial Motion Tokenization Open

Gaowen Liu, Ziyang Ma, Qi Chen, Feilong Chen, Shuai Fan , et al. · 2025

We present VQTalker, a Vector Quantization-based framework for multilingual talking head generation that addresses the challenges of lip synchronization and natural motion across diverse languages. Our approach is grounded in the phonetic …

UniMuMo: Unified Text, Music, and Motion Generation Open

Yang Han, Kun Su, Yutong Zhang, Jiaben Chen, Kaizhi Qian , et al. · 2025

We introduce UniMuMo, a unified multimodal model capable of taking arbitrary text, music, and motion data as input conditions to generate outputs across all three modalities. To address the lack of time-synchronized data, we align unpaired…

Enhancing Dance-to-Music Generation via Negative Conditioning Latent Diffusion Model Open

Changchang Sun, Gaowen Liu, Charles B. Fleming, Yan Yan · 2025

Conditional diffusion models have gained increasing attention since their impressive results for cross-modal synthesis, where the strong alignment between conditioning input and generated output can be achieved by training a time-condition…

Towards Vector Optimization on Low-Dimensional Vector Symbolic Architecture Open

Shijin Duan, Yejia Liu, Gaowen Liu, Ramana Rao Kompella, Shaolei Ren , et al. · 2025

Vector Symbolic Architecture (VSA) is emerging in machine learning due to its efficiency, but they are hindered by issues of hyperdimensionality and accuracy. As a promising mitigation, the Low-Dimensional Computing (LDC) method significan…

Investigating the Shortcomings of LLMs in Step-by-Step Legal Reasoning Open

Venkatesh Mishra, Bimsara Pathiraja, Mihir Parmar, Sat Chidananda, Jayanth Srinivasa , et al. · 2025

Reasoning abilities of LLMs have been a key focus in recent years. One challenging reasoning domain with interesting nuances is legal reasoning, which requires careful application of rules, and precedents while balancing deductive and anal…

Discrete Element-Based Design of a High-Speed Rotary Tiller for Saline-Alkali Land and Verification of Optimal Tillage Parameters Open

Shuai Zheng, Tongqing Lu, Jie Liu, Yu Tian, Miaomiao Han , et al. · 2025

Aiming at the saline soil in Binhai New Area, which is solid and sclerotic, and addressing the problem of poor quality and low efficiency of traditional rotary tillage, this research designed a high-speed rotary tiller that can realize the…

Deep learning-based novel aluminum furniture design style recognition and key technology research Open

Gaowen Liu · 2025

This study explores a pioneering research effort focusing on the use of deep learning techniques to achieve high-precision automatic recognition of aluminum furniture design styles, and proposes an innovative convolutional neural network (…

Gaze-Based Map Interaction Method Driven by Generative Large Models Open

Tianxin Wang, Ping Du, Pei Dang, Gaowen Liu, Pengpeng Li · 2025

Investigating the Shortcomings of LLMs in Step-by-Step Legal Reasoning Open

Venkatesh Mishra, Bimsara Pathiraja, Mihir Parmar, Sat Chidananda, Jayanth Srinivasa , et al. · 2025

How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on tau-bench Open

Venkatesh Mishra, Amir Saeidi, Satyam Raj, Mutsumi Nakamura, Gaowen Liu , et al. · 2025

Relation-aware Hierarchical Prompt for Open-vocabulary Scene Graph Generation Open

Gaowen Liu, Rongjie Li, Chongyu Wang, Xuming He · 2024

Open-vocabulary Scene Graph Generation (OV-SGG) overcomes the limitations of the closed-set assumption by aligning visual relationship representations with open-vocabulary textual representations. This enables the identification of novel v…

Safeguarding Text-to-Image Generation via Inference-Time Prompt-Noise Optimization Open

Jiangweizhi Peng, Zhiwei Tang, Gaowen Liu, Charles B. Fleming, Mingyi Hong · 2024

Text-to-Image (T2I) diffusion models are widely recognized for their ability to generate high-quality and diverse images based on text prompts. However, despite recent advances, these models are still prone to generating unsafe images cont…

A kind of efficient drilling holes technology modularistically for aircraft beam products Open

Gaowen Liu, Yuanyuan Zhang, Caihong Chen · 2024

UniMuMo: Unified Text, Music and Motion Generation Open

Yang Han, Kun Su, Yutong Zhang, Jiaben Chen, Kaizhi Qian , et al. · 2024

We introduce UniMuMo, a unified multimodal model capable of taking arbitrary text, music, and motion data as input conditions to generate outputs across all three modalities. To address the lack of time-synchronized data, we align unpaired…

Boosting Online 3D Multi-Object Tracking through Camera-Radar Cross Check Open

Sheng-Yao Kuan, Jen‐Hao Cheng, Hsiang-Wei Huang, Wenhao Chai, Cheng-Yen Yang , et al. · 2024

In the domain of autonomous driving, the integration of multi-modal perception techniques based on data from diverse sensors has demonstrated substantial progress. Effectively surpassing the capabilities of state-of-the-art single-modality…

SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding Open

Weitai Kang, Gaowen Liu, Mubarak Shah, Yan Yan · 2024

Different from Object Detection, Visual Grounding deals with detecting a bounding box for each text-image pair. This one box for each text-image data provides sparse supervision signals. Although previous works achieve impressive results, …

Pruning One More Token is Enough: Leveraging Latency-Workload Non-Linearities for Vision Transformers on the Edge Open

Nicholas Eliopoulos, Purvish Jajal, James C. Davis, Gaowen Liu, George K. Thiravathukal , et al. · 2024

This paper investigates how to efficiently deploy vision transformers on edge devices for small workloads. Recent methods reduce the latency of transformer neural networks by removing or merging tokens, with small accuracy degradation. How…

Riemannian Multinomial Logistics Regression for SPD Neural Networks Open

Ziheng Chen, Yue Song, Gaowen Liu, Ramana Rao Kompella, Xiao‐Jun Wu , et al. · 2024

Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference Open

Jiabao Ji, Yujian Liu, Yang Zhang, Gaowen Liu, Ramana Rao Kompella , et al. · 2024

As Large Language Models (LLMs) demonstrate extensive capability in learning from documents, LLM unlearning becomes an increasingly important research area to address concerns of LLMs in terms of privacy, copyright, etc. A conventional LLM…

Towards Hierarchical Multi-Agent Workflows for Zero-Shot Prompt Optimization Open

Yuchi Liu, Jaskirat Singh, Gaowen Liu, Ali Payani, Zheng Liang · 2024

Large language models (LLMs) have shown great progress in responding to user questions, allowing for a multitude of diverse applications. Yet, the quality of LLM outputs heavily depends on the prompt design, where a good prompt might enabl…

Gaowen Liu YOU? Author Swipe