Explanipedia

Enhancing User-Oriented Proactivity in Open-Domain Dialogues with Critic Guidance Open

Yijie Wang, Jinwu Hu, Ziteng Huang, Kunyang Lin, Zitian Zhang , et al. · 2025

Open-domain dialogue systems aim to generate natural and engaging conversations, providing significant practical value in real applications such as social robotics and personal assistants. The advent of large language models (LLMs) has gre…

LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences Open

Hongyan Zhi, Peihao Chen, Junyan Li, Shuailei Ma, Xinyu Sun , et al. · 2024

Computer science Psychology

Research on 3D Vision-Language Models (3D-VLMs) is gaining increasing attention, which is crucial for developing embodied AI within 3D scenes, such as visual navigation and embodied question answering. Due to the high density of visual fea…

3D-Mem: 3D Scene Memory for Embodied Exploration and Reasoning Open

Yuncong Yang, Han Yang, Jiachen Zhou, Peihao Chen, Hongxin Zhang , et al. · 2024

Computer science Psychology

Constructing compact and informative 3D scene representations is essential for effective embodied exploration and reasoning, especially in complex environments over extended periods. Existing representations, such as object-centric 3D scen…

FlexAttention for Efficient High-Resolution Vision-Language Models Open

Junyan Li, Delin Chen, Tianle Cai, Peihao Chen, Yining Hong , et al. · 2024

Computer science

Current high-resolution vision-language models encode images as high-resolution image tokens and exhaustively take all these tokens to compute attention, which significantly increases the computational cost. To address this problem, we pro…

CoNav: A Benchmark for Human-Centered Collaborative Navigation Open

Changhao Li, Xinyu Sun, Peihao Chen, Jugang Fan, Zixu Wang , et al. · 2024

Computer science Geography

Human-robot collaboration, in which the robot intelligently assists the human with the upcoming task, is an appealing objective. To achieve this goal, the agent needs to be equipped with a fundamental collaborative navigation ability, wher…

MAGIC: Map-Guided Few-Shot Audio-Visual Acoustics Modeling Open

Diwei Huang, Kunyang Lin, Peihao Chen, Qing Du, Mingkui Tan · 2024

Art Computer science Physics

Few-shot audio-visual acoustics modeling seeks to synthesize the room impulse response in arbitrary locations with few-shot observations. To sufficiently exploit the provided few-shot data for accurate acoustic modeling, we present a *map-…

3D-VLA: A 3D Vision-Language-Action Generative World Model Open

Haoyu Zhen, Xiaowen Qiu, Peihao Chen, Jincheng Yang, Xin Yan , et al. · 2024

Computer science Philosophy Physics

Recent vision-language-action (VLA) models rely on 2D inputs, lacking integration with the broader realm of the 3D physical world. Furthermore, they perform action prediction by learning a direct mapping from perception to action, neglecti…

Total syntheses of Tetrodotoxin and 9-epiTetrodotoxin Open

Peihao Chen, Jing Wang, Shuangfeng Zhang, Yan Wang, Yuze Sun , et al. · 2024

Biology Chemistry

Tetrodotoxin and congeners are specific voltage-gated sodium channel blockers that exhibit remarkable anesthetic and analgesic effects. Here, we present a scalable asymmetric syntheses of Tetrodotoxin and 9- epi Tetrodotoxin from the abund…

MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World Open

Yining Hong, Zishuo Zheng, Peihao Chen, Yian Wang, Junyan Li , et al. · 2024

Computer science Psychology Physics

Human beings possess the capability to multiply a melange of multisensory cues while actively exploring and interacting with the 3D world. Current multi-modal large language models, however, passively absorb sensory data as inputs, lacking…

Total Synthesis of Tetrodotoxin and 9-epiTetrodotoxin Open

Peihao Chen, Jing Wang, Shuangfeng Zhang, Yan Wang, Yuzeng Sun , et al. · 2024

Business Biology

The original dataset of "Total Synthesis of Tetrodotoxin and 9-epiTetrodotoxin", manuscript#: NCOMMS-23-01160D

SKDF: A Simple Knowledge Distillation Framework for Distilling Open-Vocabulary Knowledge to Open-world Object Detector Open

Shuailei Ma, Yuefeng Wang, Ying Wei, Jiaqi Fan, Xinyu Sun , et al. · 2023

Computer science Chemistry Philosophy

In this paper, we attempt to specialize the VLM model for OWOD tasks by distilling its open-world knowledge into a language-agnostic detector. Surprisingly, we observe that the combination of a simple \textbf{knowledge distillation} approa…

DCIR: Dynamic Consistency Intrinsic Reward for Multi-Agent Reinforcement Learning Open

Kunyang Lin, Yufeng Wang, Peihao Chen, Runhao Zeng, Siyuan Zhou , et al. · 2023

Computer science Physics Philosophy

Learning optimal behavior policy for each agent in multi-agent systems is an essential yet difficult problem. Despite fruitful progress in multi-agent reinforcement learning, the challenge of addressing the dynamics of whether two agents s…

CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding Open

Junyan Li, Delin Chen, Yining Hong, Zhenfang Chen, Peihao Chen , et al. · 2023

Computer science Psychology

A remarkable ability of human beings resides in compositional reasoning, i.e., the capacity to make "infinite use of finite means". However, current large vision-language foundation models (VLMs) fall short of such compositional abilities …

FGPrompt: Fine-grained Goal Prompting for Image-goal Navigation Open

Xinyu Sun, Peihao Chen, Jugang Fan, Thomas H. Li, Jian Chen , et al. · 2023

Computer science Psychology Geography

Learning to navigate to an image-specified goal is an important but challenging task for autonomous systems. The agent is required to reason the goal location from where a picture is shot. Existing methods try to solve this problem by lear…

$A^2$Nav: Action-Aware Zero-Shot Robot Navigation by Exploiting Vision-and-Language Ability of Foundation Models Open

Peihao Chen, Xinyu Sun, Hongyan Zhi, Runhao Zeng, Thomas H. Li , et al. · 2023

Computer science Economics Physics

We study the task of zero-shot vision-and-language navigation (ZS-VLN), a practical yet challenging problem in which an agent learns to navigate following a path described by language instructions without requiring any path-instruction ann…

3D-LLM: Injecting the 3D World into Large Language Models Open

Yining Hong, Haoyu Zhen, Peihao Chen, Shuhong Zheng, Yilun Du , et al. · 2023

Computer science Engineering Mathematics

Large language models (LLMs) and Vision-Language Models (VLMs) have been proven to excel at multiple tasks, such as commonsense reasoning. Powerful as these models can be, they are not grounded in the 3D physical world, which involves rich…

Learning Vision-and-Language Navigation from YouTube Videos Open

Kunyang Lin, Peihao Chen, Di-Wei Huang, Thomas H. Li, Mingkui Tan , et al. · 2023

Computer science Physics

Vision-and-language navigation (VLN) requires an embodied agent to navigate in realistic 3D environments using natural language instructions. Existing VLN methods suffer from training on small-scale environments or unreasonable path-instru…

Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition Open

Weidong Chen, Xiaofen Xing, Peihao Chen, Xiangmin Xu · 2023

Computer science Engineering Art

This paper presents a paradigm that adapts general large-scale pretrained models (PTMs) to speech emotion recognition task. Although PTMs shed new light on artificial general intelligence, they are constructed with general tasks in mind, a…

Detecting the open-world objects with the help of the Brain Open

Shuailei Ma, Yuefeng Wang, Ying Wei, Peihao Chen, Zhixiang Ye , et al. · 2023

Computer science Economics Mathematics

Open World Object Detection (OWOD) is a novel computer vision task with a considerable challenge, bridging the gap between classic object detection (OD) benchmarks and real-world object detection. In addition to detecting and classifying s…

Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language Navigation Open

Peihao Chen, Dongyu Ji, Kunyang Lin, Runhao Zeng, Thomas H. Li , et al. · 2022

Computer science Economics Political science

We address a practical yet challenging problem of training robot agents to navigate in an environment following a path described by some language instructions. The instructions often contain descriptions of objects in the environment. To a…

Learning Active Camera for Multi-Object Navigation Open

Peihao Chen, Dongyu Ji, Kunyang Lin, Wei-Wen Hu, Wenbing Huang , et al. · 2022

Computer science Mathematics Physics

Getting robots to navigate to multiple objects autonomously is essential yet difficult in robot applications. One of the key challenges is how to explore environments efficiently with camera sensors only. Existing navigation methods mainly…

Masked Motion Encoding for Self-Supervised Video Representation Learning Open

Xinyu Sun, Peihao Chen, Liangwei Chen, Thomas H. Li, Mingkui Tan , et al. · 2022

Computer science Political science Art

How to learn discriminative video representation from unlabeled videos is challenging but crucial for video analysis. The latest attempts seek to learn a representation model by predicting the appearance contents in the masked regions. How…

RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning Open

Peihao Chen, Deng Huang, Dongliang He, Xiang Long, Runhao Zeng , et al. · 2021

Computer science Economics Political science

We study unsupervised video representation learning that seeks to learn both motion and appearance features from unlabeled video only, which can be reused for downstream tasks such as action recognition. This task, however, is extremely ch…

RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning Open

Peihao Chen, Deng Huang, Dongliang He, Xiang Long, Runhao Zeng , et al. · 2020

Computer science Physics Biology

We study unsupervised video representation learning that seeks to learn both motion and appearance features from unlabeled video only, which can be reused for downstream tasks such as action recognition. This task, however, is extremely ch…

Discovery of a molecular glue promoting CDK12-DDB1 interaction to trigger cyclin K degradation Open

Lu Lv, Peihao Chen, Longzhi Cao, Yamei Li, Zhi Jiang Zeng , et al. · 2020

Chemistry Biology

Molecular-glue degraders mediate interactions between target proteins and components of the ubiquitin-proteasome system to cause selective protein degradation. Here, we report a new molecular glue HQ461 discovered by high-throughput screen…

Author response: Discovery of a molecular glue promoting CDK12-DDB1 interaction to trigger cyclin K degradation Open

Lu Lv, Peihao Chen, Longzhi Cao, Yamei Li, Zhi Jiang Zeng , et al. · 2020

Chemistry Biology

Article Figures and data Abstract Introduction Results Discussion Materials and methods Appendix 1 Appendix 2 Data availability References Decision letter Author response Article and author information Metrics Abstract Molecular-glue degra…

Location-aware Graph Convolutional Networks for Video Question Answering Open

Deng Huang, Peihao Chen, Runhao Zeng, Qing Du, Mingkui Tan , et al. · 2020

Computer science

We addressed the challenging task of video question answering, which requires machines to answer questions about videos in a natural language form. Previous state-of-the-art methods attempt to apply spatio-temporal attention mechanism on v…

Foley Music: Learning to Generate Music from Videos Open

Chuang Gan, Deng Huang, Peihao Chen, Joshua B. Tenenbaum, Antonio Torralba · 2020

Computer science Art Engineering

In this paper, we introduce Foley Music, a system that can synthesize plausible music for a silent video clip about people playing musical instruments. We first identify two key intermediate representations for a successful video to music …

Peihao Chen YOU? Author Swipe