Explanipedia

Rethinking Facial Expression Recognition in the Era of Multimodal Large Language Models: Benchmark, Datasets, and Beyond Open

Fan Zhang, Haoxuan Li, Shengju Qian, Xin Wang, Zheng Lian , et al. · 2025

Multimodal Large Language Models (MLLMs) have revolutionized numerous research fields, including computer vision and affective computing. As a pivotal challenge in this interdisciplinary domain, facial expression recognition (FER) has evol…

DiTAC: Discrete Teamwork Abstraction for Ad Hoc Collaboration Open

Jing Wang, Pengjie Gu, Mengchen Zhao, Guangyong Chen, Furui Liu , et al. · 2025

Training autonomous agents to collaborate with unknown teammates in cooperative multi-agent environments remains a fundamental challenge in ad hoc teamwork research. Conventional approaches rely heavily on online interactions with arbitrar…

InstructPLM-mu: 1-Hour Fine-Tuning of ESM2 Beats ESM3 in Protein Mutation Predictions Open

Jing-Yue Xu, Yong Shi, Lisa Lang, Tianlun Cui, Zhiming Zhang , et al. · 2025

Multimodal protein language models deliver strong performance on mutation-effect prediction, but training such models from scratch demands substantial computational resources. In this paper, we propose a fine-tuning framework called Instru…

From Supervision to Exploration: What Does Protein Language Model Learn During Reinforcement Learning? Open

Hanqun Cao, Hongrui Zhang, Jiaping Xu, Zhang Zhou, Linlin Shen , et al. · 2025

Protein language models (PLMs) have advanced computational protein science through large-scale pretraining and scalable architectures. In parallel, reinforcement learning (RL) has broadened exploration and enabled precise multi-objective o…

Unified and explainable molecular representation learning for imperfectly annotated data from the hypergraph view Open

Bowen Wang, Junyou Li, Donghao Zhou, Lanqing Li, Jinpeng Li , et al. · 2025

Molecular representation learning (MRL) has shown promise in accelerating drug development by predicting chemical properties. However, imperfectly annotation among datasets pose challenges in model design and explainability. In this work, …

Interpretable PROTAC Degradation Prediction With Structure‐Informed Deep Ternary Attention Framework Open

Zhuo Chen, Chunbin Gu, Shuoyan Tan, Xiaorui Wang, Yuquan Li , et al. · 2025

Proteolysis Targeting Chimeras (PROTACs) are heterobifunctional ligands bridging Proteins‐Of‐Interest (POIs) and E3 ligases for ubiquitin‐proteasome degradation, promising to target the ‘undruggable’. While PROTAC research primarily relies…

A deep reinforcement learning platform for antibiotic discovery Open

Hanqun Cao, Marcelo D. T. Torres, Jingjie Zhang, Zijun Gao, Fang Wu , et al. · 2025

Antimicrobial resistance (AMR) is projected to cause up to 10 million deaths annually by 2050, underscoring the urgent need for new antibiotics. Here we present ApexAmphion, a deep-learning framework for de novo design of antibiotics that …

MEJO: MLLM-Engaged Surgical Triplet Recognition via Inter- and Intra-Task Joint Optimization Open

Yiyi Zhang, Yuchen Yuan, Ying Zheng, Jialun Pei, Jiazheng Li , et al. · 2025

Surgical triplet recognition, which involves identifying instrument, verb, target, and their combinations, is a complex surgical scene understanding challenge plagued by long-tailed data distribution. The mainstream multi-task learning par…

Hand-Shadow Poser Open

Hao Xu, Yinqiao Wang, Niloy J. Mitra, Pheng‐Ann Heng, Chi‐Wing Fu · 2025

Hand shadow art is a captivating art form, creatively using hand shadows to reproduce expressive shapes on the wall. In this work, we study an inverse problem: given a target shape, find the poses of left and right hands that together best…

PhyloTune: An efficient method to accelerate phylogenetic updates using a pretrained DNA language model Open

Danruo Deng, Wuqin Xu, Bian Wu, Hans Peter Comes, Yu Feng , et al. · 2025

Understanding the phylogenetic relationships among species is crucial for comprehending major evolutionary transitions. Despite the ever-growing volume of sequence data, constructing reliable phylogenetic trees effectively becomes more cha…

Large Language Model‐Embedded Intelligent Robotic Scrub Nurse with Multimodal Input for Enhancing Surgeon–Robot Interaction Open

Wing Yin Ng, Wanyu Ma, Pheng‐Ann Heng, Philip Wai Yan Chiu, Zheng Li · 2025

Scrub nurses have crucial responsibilities, particularly in handling instrument‐related tasks. However, significant mental burdens and unfamiliarity with instruments can lead to various human errors. Consequently, the research community ha…

ClipGS: Clippable Gaussian Splatting for Interactive Cinematic Visualization of Volumetric Medical Data Open

Chengkun Li, Yuqi Tong, Kai Chen, Zhenya Yang, Ruiyang Li , et al. · 2025

The visualization of volumetric medical data is crucial for enhancing diagnostic accuracy and improving surgical planning and education. Cinematic rendering techniques significantly enrich this process by providing high-quality visualizati…

Towards fair decentralized benchmarking of healthcare AI algorithms with the Federated Tumor Segmentation (FeTS) challenge Open

Maximilian Zenk, Ujjwal Baid, Sarthak Pati, Akis Linardos, Brandon Edwards , et al. · 2025

Computational competitions are the standard for benchmarking medical image analysis algorithms, but they typically use small curated test datasets acquired at a few centers, leaving a gap to the reality of diverse multicentric patient data…

DivPro: diverse protein sequence design with direct structure recovery guidance Open

Xinyi Zhou, Guibao Shen, Ying-Cong Chen, Guangyong Chen, Pheng‐Ann Heng · 2025

Motivation Structure-based protein design is crucial for designing proteins with novel structures and functions, which aims to generate sequences that fold into desired structures. Current deep learning-based methods primarily focus on tra…

Generalist medical foundation model improves prostate cancer segmentation from multimodal MRI images Open

Yuhan Zhang, Xiao Ma, Mingchao Li, Kun Huang, Jie Zhu , et al. · 2025

Prostate cancer (PCa) is one of the most common types of cancer, seriously affecting adult male health. Accurate and automated PCa segmentation is essential for radiologists to confirm the location of cancer, evaluate its severity, and des…

Protein Inverse Folding From Structure Feedback Open

Jiaping Xu, Zijun Gao, Xinyi Zhou, Jie Hu, Xingyi Cheng , et al. · 2025

The inverse folding problem, aiming to design amino acid sequences that fold into desired three-dimensional structures, is pivotal for various biotechnological applications. Here, we introduce a novel approach leveraging Direct Preference …

Fast-in-Slow: A Dual-System Foundation Model Unifying Fast Manipulation within Slow Reasoning Open

Hao Chen, Jiaming Liu, Chenyang Gu, Zhuoyang Liu, Renrui Zhang , et al. · 2025

Generalized policy and execution efficiency constitute the two critical challenges in robotic manipulation. While recent foundation policies benefit from the common-sense reasoning capabilities of internet-scale pretrained vision-language …

Benchmarking Endoscopic Surgical Image Restoration and Beyond Open

Jialun Pei, Diandian Guo, Donghui Yang, Zhixi Li, Yuxin Feng , et al. · 2025

In endoscopic surgery, a clear and high-quality visual field is critical for surgeons to make accurate intraoperative decisions. However, persistent visual degradation, including smoke generated by energy devices, lens fogging from thermal…

Medical Large Vision Language Models with Multi-Image Visual Ability Open

Xikai Yang, Juzheng Miao, Yuchen Yuan, Jiaze Wang, Qi Dou , et al. · 2025

Medical large vision-language models (LVLMs) have demonstrated promising performance across various single-image question answering (QA) benchmarks, yet their capability in processing multi-image clinical scenarios remains underexplored. U…

EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning Open

Zhenghao Xing, Xiaowei Hu, Chi‐Wing Fu, Wenhai Wang, Jifeng Dai , et al. · 2025

Multimodal large language models (MLLMs) have advanced perception across text, vision, and audio, yet they often struggle with structured cross-modal reasoning, particularly when integrating audio and visual signals. We introduce EchoInk-R…

Learning-based early detection of post-hepatectomy liver failure using temporal perioperative data: a nationwide multicenter retrospective study in China Open

Kai Wang, Qian Yang, Kang Li, Shanhua Tang, Baoluhe Zhang , et al. · 2025

Gated-GPS: enhancing protein–protein interaction site prediction with scalable learning and imbalance-aware optimization Open

Xin Gao, Hanqun Cao, Jinpeng Li, Jiezhong Qiu, Guangyong Chen , et al. · 2025

In protein–protein interaction site (PPIS) prediction, existing machine learning models struggle with small datasets, limiting their predictive accuracy for unseen proteins. Additionally, class imbalance in protein complexes, where binding…

MM-Mixing: Multi-Modal Mixing Alignment for 3D Understanding Open

Jiaze Wang, Yi Wang, Z. J. Guo, Renrui Zhang, Donghao Zhou , et al. · 2025

We introduce MM-Mixing, a multi-modal mixing alignment framework for 3D understanding. MM-Mixing applies mixing-based methods to multi-modal data, preserving and optimizing cross-modal connections while enhancing diversity and improving al…

Surgical Workflow Recognition and Blocking Effectiveness Detection in Laparoscopic Liver Resection with Pringle Maneuver Open

Diandian Guo, Weixin Si, Zhixi Li, Jialun Pei, Pheng‐Ann Heng · 2025

Pringle maneuver (PM) in laparoscopic liver resection aims to reduce blood loss and provide a clear surgical view by intermittently blocking blood inflow of the liver, whereas prolonged PM may cause ischemic injury. To comprehensively moni…

A Survey of Reasoning with Foundation Models: Concepts, Methodologies, and Outlook Open

Junkui Sun, Chuanyang Zheng, Enze Xie, Zhengying Liu, Ruihang Chu , et al. · 2025

Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation. It serves as a fundamental methodology in the field of Artifi…

Temporal‐multimodal consistency alignment for Alzheimer's cognitive assessment prediction Open

Xikai Yang, Xilin Dang, Jinyue Cai, Jinpeng Li, Xi Wang , et al. · 2025

Background As one of the most prevalent neurodegenerative disorders, Alzheimer's disease (AD) severely impacts human thinking and behavior. Early and accurate prediction of cognitive decline is crucial for timely AD intervention. However, …

SciVerse: Unveiling the Knowledge Comprehension and Visual Reasoning of LMMs on Multi-modal Scientific Problems Open

Z. J. Guo, Ray Zhang, Hao Chen, Jialin Gao, Dongzhi Jiang , et al. · 2025

The rapid advancement of Large Multi-modal Models (LMMs) has enabled their application in scientific problem-solving, yet their fine-grained capabilities remain under-explored. In this paper, we introduce SciVerse, a multi-modal scientific…

scHeteroNet: A Heterophily‐Aware Graph Neural Network for Accurate Cell Type Annotation and Novel Cell Detection Open

Jiacheng Liu, Xingyu Fan, Chunbin Gu, Yaodong Yang, Bian Wu , et al. · 2025

Single‐cell RNA sequencing (scRNA‐seq) has unveiled extensive cellular heterogeneity, yet precise cell type annotation and the identification of novel cell populations remain significant challenges. scHeteroNet, a novel graph neural networ…

The Dual-use Dilemma in LLMs: Do Empowering Ethical Capacities Make a Degraded Utility? Open

Yiyi Zhang, Xingyu Chen, Kexin Chen, Yiping P. Du, Xilin Dang , et al. · 2025

Recent years have witnessed extensive efforts to enhance Large Language Models (LLMs) across various domains, alongside growing attention to their ethical implications. However, a critical challenge remains largely overlooked: LLMs must ba…

Dual Ensembled Multiagent Q-Learning with Hypernet Regularizer Open

Yaodong Yang, Guangyong Chen, Hongyao Tang, Fei‐Fei Liu, Danruo Deng , et al. · 2025

Overestimation in single-agent reinforcement learning has been extensively studied. In contrast, overestimation in the multiagent setting has received comparatively little attention although it increases with the number of agents and leads…

Pheng‐Ann Heng YOU? Author Swipe