Pheng‐Ann Heng
YOU?
Author Swipe
View article: Rethinking Facial Expression Recognition in the Era of Multimodal Large Language Models: Benchmark, Datasets, and Beyond
Rethinking Facial Expression Recognition in the Era of Multimodal Large Language Models: Benchmark, Datasets, and Beyond Open
Multimodal Large Language Models (MLLMs) have revolutionized numerous research fields, including computer vision and affective computing. As a pivotal challenge in this interdisciplinary domain, facial expression recognition (FER) has evol…
View article: DiTAC: Discrete Teamwork Abstraction for Ad Hoc Collaboration
DiTAC: Discrete Teamwork Abstraction for Ad Hoc Collaboration Open
Training autonomous agents to collaborate with unknown teammates in cooperative multi-agent environments remains a fundamental challenge in ad hoc teamwork research. Conventional approaches rely heavily on online interactions with arbitrar…
View article: InstructPLM-mu: 1-Hour Fine-Tuning of ESM2 Beats ESM3 in Protein Mutation Predictions
InstructPLM-mu: 1-Hour Fine-Tuning of ESM2 Beats ESM3 in Protein Mutation Predictions Open
Multimodal protein language models deliver strong performance on mutation-effect prediction, but training such models from scratch demands substantial computational resources. In this paper, we propose a fine-tuning framework called Instru…
View article: From Supervision to Exploration: What Does Protein Language Model Learn During Reinforcement Learning?
From Supervision to Exploration: What Does Protein Language Model Learn During Reinforcement Learning? Open
Protein language models (PLMs) have advanced computational protein science through large-scale pretraining and scalable architectures. In parallel, reinforcement learning (RL) has broadened exploration and enabled precise multi-objective o…
View article: Unified and explainable molecular representation learning for imperfectly annotated data from the hypergraph view
Unified and explainable molecular representation learning for imperfectly annotated data from the hypergraph view Open
Molecular representation learning (MRL) has shown promise in accelerating drug development by predicting chemical properties. However, imperfectly annotation among datasets pose challenges in model design and explainability. In this work, …
View article: Interpretable PROTAC Degradation Prediction With Structure‐Informed Deep Ternary Attention Framework
Interpretable PROTAC Degradation Prediction With Structure‐Informed Deep Ternary Attention Framework Open
Proteolysis Targeting Chimeras (PROTACs) are heterobifunctional ligands bridging Proteins‐Of‐Interest (POIs) and E3 ligases for ubiquitin‐proteasome degradation, promising to target the ‘undruggable’. While PROTAC research primarily relies…
View article: A deep reinforcement learning platform for antibiotic discovery
A deep reinforcement learning platform for antibiotic discovery Open
Antimicrobial resistance (AMR) is projected to cause up to 10 million deaths annually by 2050, underscoring the urgent need for new antibiotics. Here we present ApexAmphion, a deep-learning framework for de novo design of antibiotics that …
View article: MEJO: MLLM-Engaged Surgical Triplet Recognition via Inter- and Intra-Task Joint Optimization
MEJO: MLLM-Engaged Surgical Triplet Recognition via Inter- and Intra-Task Joint Optimization Open
Surgical triplet recognition, which involves identifying instrument, verb, target, and their combinations, is a complex surgical scene understanding challenge plagued by long-tailed data distribution. The mainstream multi-task learning par…
View article: Hand-Shadow Poser
Hand-Shadow Poser Open
Hand shadow art is a captivating art form, creatively using hand shadows to reproduce expressive shapes on the wall. In this work, we study an inverse problem: given a target shape, find the poses of left and right hands that together best…
View article: PhyloTune: An efficient method to accelerate phylogenetic updates using a pretrained DNA language model
PhyloTune: An efficient method to accelerate phylogenetic updates using a pretrained DNA language model Open
Understanding the phylogenetic relationships among species is crucial for comprehending major evolutionary transitions. Despite the ever-growing volume of sequence data, constructing reliable phylogenetic trees effectively becomes more cha…
View article: Large Language Model‐Embedded Intelligent Robotic Scrub Nurse with Multimodal Input for Enhancing Surgeon–Robot Interaction
Large Language Model‐Embedded Intelligent Robotic Scrub Nurse with Multimodal Input for Enhancing Surgeon–Robot Interaction Open
Scrub nurses have crucial responsibilities, particularly in handling instrument‐related tasks. However, significant mental burdens and unfamiliarity with instruments can lead to various human errors. Consequently, the research community ha…
View article: ClipGS: Clippable Gaussian Splatting for Interactive Cinematic Visualization of Volumetric Medical Data
ClipGS: Clippable Gaussian Splatting for Interactive Cinematic Visualization of Volumetric Medical Data Open
The visualization of volumetric medical data is crucial for enhancing diagnostic accuracy and improving surgical planning and education. Cinematic rendering techniques significantly enrich this process by providing high-quality visualizati…
View article: Towards fair decentralized benchmarking of healthcare AI algorithms with the Federated Tumor Segmentation (FeTS) challenge
Towards fair decentralized benchmarking of healthcare AI algorithms with the Federated Tumor Segmentation (FeTS) challenge Open
Computational competitions are the standard for benchmarking medical image analysis algorithms, but they typically use small curated test datasets acquired at a few centers, leaving a gap to the reality of diverse multicentric patient data…
View article: DivPro: diverse protein sequence design with direct structure recovery guidance
DivPro: diverse protein sequence design with direct structure recovery guidance Open
Motivation Structure-based protein design is crucial for designing proteins with novel structures and functions, which aims to generate sequences that fold into desired structures. Current deep learning-based methods primarily focus on tra…
View article: Generalist medical foundation model improves prostate cancer segmentation from multimodal MRI images
Generalist medical foundation model improves prostate cancer segmentation from multimodal MRI images Open
Prostate cancer (PCa) is one of the most common types of cancer, seriously affecting adult male health. Accurate and automated PCa segmentation is essential for radiologists to confirm the location of cancer, evaluate its severity, and des…
View article: Protein Inverse Folding From Structure Feedback
Protein Inverse Folding From Structure Feedback Open
The inverse folding problem, aiming to design amino acid sequences that fold into desired three-dimensional structures, is pivotal for various biotechnological applications. Here, we introduce a novel approach leveraging Direct Preference …
View article: Fast-in-Slow: A Dual-System Foundation Model Unifying Fast Manipulation within Slow Reasoning
Fast-in-Slow: A Dual-System Foundation Model Unifying Fast Manipulation within Slow Reasoning Open
Generalized policy and execution efficiency constitute the two critical challenges in robotic manipulation. While recent foundation policies benefit from the common-sense reasoning capabilities of internet-scale pretrained vision-language …
View article: Benchmarking Endoscopic Surgical Image Restoration and Beyond
Benchmarking Endoscopic Surgical Image Restoration and Beyond Open
In endoscopic surgery, a clear and high-quality visual field is critical for surgeons to make accurate intraoperative decisions. However, persistent visual degradation, including smoke generated by energy devices, lens fogging from thermal…
View article: Medical Large Vision Language Models with Multi-Image Visual Ability
Medical Large Vision Language Models with Multi-Image Visual Ability Open
Medical large vision-language models (LVLMs) have demonstrated promising performance across various single-image question answering (QA) benchmarks, yet their capability in processing multi-image clinical scenarios remains underexplored. U…
View article: EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning
EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning Open
Multimodal large language models (MLLMs) have advanced perception across text, vision, and audio, yet they often struggle with structured cross-modal reasoning, particularly when integrating audio and visual signals. We introduce EchoInk-R…
View article: Learning-based early detection of post-hepatectomy liver failure using temporal perioperative data: a nationwide multicenter retrospective study in China
Learning-based early detection of post-hepatectomy liver failure using temporal perioperative data: a nationwide multicenter retrospective study in China Open
View article: Gated-GPS: enhancing protein–protein interaction site prediction with scalable learning and imbalance-aware optimization
Gated-GPS: enhancing protein–protein interaction site prediction with scalable learning and imbalance-aware optimization Open
In protein–protein interaction site (PPIS) prediction, existing machine learning models struggle with small datasets, limiting their predictive accuracy for unseen proteins. Additionally, class imbalance in protein complexes, where binding…
View article: MM-Mixing: Multi-Modal Mixing Alignment for 3D Understanding
MM-Mixing: Multi-Modal Mixing Alignment for 3D Understanding Open
We introduce MM-Mixing, a multi-modal mixing alignment framework for 3D understanding. MM-Mixing applies mixing-based methods to multi-modal data, preserving and optimizing cross-modal connections while enhancing diversity and improving al…
View article: Surgical Workflow Recognition and Blocking Effectiveness Detection in Laparoscopic Liver Resection with Pringle Maneuver
Surgical Workflow Recognition and Blocking Effectiveness Detection in Laparoscopic Liver Resection with Pringle Maneuver Open
Pringle maneuver (PM) in laparoscopic liver resection aims to reduce blood loss and provide a clear surgical view by intermittently blocking blood inflow of the liver, whereas prolonged PM may cause ischemic injury. To comprehensively moni…
View article: A Survey of Reasoning with Foundation Models: Concepts, Methodologies, and Outlook
A Survey of Reasoning with Foundation Models: Concepts, Methodologies, and Outlook Open
Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation. It serves as a fundamental methodology in the field of Artifi…
View article: Temporal‐multimodal consistency alignment for Alzheimer's cognitive assessment prediction
Temporal‐multimodal consistency alignment for Alzheimer's cognitive assessment prediction Open
Background As one of the most prevalent neurodegenerative disorders, Alzheimer's disease (AD) severely impacts human thinking and behavior. Early and accurate prediction of cognitive decline is crucial for timely AD intervention. However, …
View article: SciVerse: Unveiling the Knowledge Comprehension and Visual Reasoning of LMMs on Multi-modal Scientific Problems
SciVerse: Unveiling the Knowledge Comprehension and Visual Reasoning of LMMs on Multi-modal Scientific Problems Open
The rapid advancement of Large Multi-modal Models (LMMs) has enabled their application in scientific problem-solving, yet their fine-grained capabilities remain under-explored. In this paper, we introduce SciVerse, a multi-modal scientific…
View article: scHeteroNet: A Heterophily‐Aware Graph Neural Network for Accurate Cell Type Annotation and Novel Cell Detection
scHeteroNet: A Heterophily‐Aware Graph Neural Network for Accurate Cell Type Annotation and Novel Cell Detection Open
Single‐cell RNA sequencing (scRNA‐seq) has unveiled extensive cellular heterogeneity, yet precise cell type annotation and the identification of novel cell populations remain significant challenges. scHeteroNet, a novel graph neural networ…
View article: The Dual-use Dilemma in LLMs: Do Empowering Ethical Capacities Make a Degraded Utility?
The Dual-use Dilemma in LLMs: Do Empowering Ethical Capacities Make a Degraded Utility? Open
Recent years have witnessed extensive efforts to enhance Large Language Models (LLMs) across various domains, alongside growing attention to their ethical implications. However, a critical challenge remains largely overlooked: LLMs must ba…
View article: Dual Ensembled Multiagent Q-Learning with Hypernet Regularizer
Dual Ensembled Multiagent Q-Learning with Hypernet Regularizer Open
Overestimation in single-agent reinforcement learning has been extensively studied. In contrast, overestimation in the multiagent setting has received comparatively little attention although it increases with the number of agents and leads…