Explanipedia

Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action Models Open

Huaping Liu, Xinghang Li, Peiyan Li, Minghuan Liu, Dong Wang , et al. · 2025

Foundation Vision Language Models (VLMs) exhibit strong capabilities in multi-modal representation learning, comprehension, and reasoning. By injecting action components into the VLMs, Vision-Language-Action models (VLAs) can be naturally …

Uncovering Untapped Potential in Sample-Efficient World Model Agents Open

Lior Cohen, Kaixin Wang, Bingyi Kang, Uri Gadot, Shie Mannor · 2025

Computer science

World model (WM) agents enable sample-efficient reinforcement learning by learning policies entirely from simulated experience. However, existing token-based world models (TBWMs) are limited to visual inputs and discrete actions, restricti…

Video Depth Anything: Consistent Depth Estimation for Super-Long Videos Open

Sili Chen, Hengkai Guo, Shengnan Zhu, Feihu Zhang, Zilong Huang , et al. · 2025

Computer science Geology Economics

Depth Anything has achieved remarkable success in monocular depth estimation with strong generalization ability. However, it suffers from temporal inconsistency in videos, hindering its practical applications. Various methods have been pro…

VideoWorld: Exploring Knowledge Learning from Unlabeled Videos Open

Zhongwei Ren, Yunchao Wei, Xun Guo, Yao Zhao, Bingyi Kang , et al. · 2025

Computer science Psychology

This work explores whether a deep generative model can learn complex knowledge solely from visual input, in contrast to the prevalent focus on text-based models like large language models (LLMs). We develop VideoWorld, an auto-regressive v…

Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action Models Open

Xinghang Li, Peiyan Li, Minghuan Liu, Dong Wang, Jirong Liu , et al. · 2024

Computer science Sociology Psychology

Foundation Vision Language Models (VLMs) exhibit strong capabilities in multi-modal representation learning, comprehension, and reasoning. By injecting action components into the VLMs, Vision-Language-Action Models (VLAs) can be naturally …

Classification Done Right for Vision-Language Pre-Training Open

Zilong Huang, Qinghao Ye, Bingyi Kang, Jiashi Feng, Haoqi Fan · 2024

Computer science Psychology Geography

We introduce SuperClass, a super simple classification method for vision-language pre-training on image-text data. Unlike its contrastive counterpart CLIP who contrast with a text encoder, SuperClass directly utilizes tokenized raw text as…

DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution Open

Yang Yue, Yulin Wang, Bingyi Kang, Yizeng Han, Shenzhi Wang , et al. · 2024

Computer science

MLLMs have demonstrated remarkable comprehension and reasoning capabilities with complex language and visual data. These advances have spurred the vision of establishing a generalist robotic MLLM proficient in understanding complex human i…

How Far is Video Generation from World Model: A Physical Law Perspective Open

Bingyi Kang, Yue Yang, Rui Lu, Zhijie Lin, Yang Zhao , et al. · 2024

Computer science Political science Economics

OpenAI's Sora highlights the potential of video generation for developing world models that adhere to fundamental physical laws. However, the ability of video generation models to discover such laws purely from visual data without human pr…

Loong: Generating Minute-level Long Videos with Autoregressive Language Models Open

Yuqing Wang, Tianqin Xiong, Daquan Zhou, Zhijie Lin, Yang Zhao , et al. · 2024

Computer science Economics

It is desirable but challenging to generate content-rich long videos in the scale of minutes. Autoregressive large language models (LLMs) have achieved great success in generating coherent and long sequences of tokens in the domain of natu…

Depth Anything V2 Open

Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiaogang Xu , et al. · 2024

Business Computer science

This work presents Depth Anything V2. Without pursuing fancy techniques, we aim to reveal crucial findings to pave the way towards building a powerful monocular depth estimation model. Notably, compared with V1, this version produces much …

Improving Token-Based World Models with Parallel Observation Prediction Open

Lior Cohen, Kaixin Wang, Bingyi Kang, Shie Mannor · 2024

Computer science

Motivated by the success of Transformers when applied to sequences of discrete symbols, token-based world models (TBWMs) were recently proposed as sample-efficient methods. In TBWMs, the world model consumes agent experience as a language-…

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data Open

Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng , et al. · 2024

Computer science Philosophy Economics

This work presents Depth Anything, a highly practical solution for robust monocular depth estimation. Without pursuing novel technical modules, we aim to build a simple yet powerful foundation model dealing with any images under any circum…

Research on the double-layer clustering method of residential energy use characteristics under the background of energy system energy savings and carbon reduction Open

Bingyi Kang, Zhihao Xu, Wenhua He, Guili Ding, Wei Han , et al. · 2024

Computer science Engineering

Accurate differentiation of energy consumption information of residential users is of great significance for load planning, scheduling, operation and management of power system, and is the basic premise for realizing intelligent perception…

Harnessing Diffusion Models for Visual Perception with Meta Prompts Open

Qiang Wan, Zilong Huang, Bingyi Kang, Jiashi Feng, Li Zhang · 2023

Computer science Mathematics Philosophy

The issue of generative pretraining for vision models has persisted as a long-standing conundrum. At present, the text-to-image (T2I) diffusion model demonstrates remarkable proficiency in generating high-definition images matching textual…

FreeMask: Synthetic Images with Dense Annotations Make Stronger Segmentation Models Open

Lihe Yang, Xiaogang Xu, Bingyi Kang, Yinghuan Shi, Hengshuang Zhao · 2023

Computer science

Semantic segmentation has witnessed tremendous progress due to the proposal of various advanced network architectures. However, they are extremely hungry for delicate annotations to train, and the acquisition is laborious and unaffordable.…

Understanding, Predicting and Better Resolving Q-Value Divergence in Offline-RL Open

Yang Yue, Rui Lu, Bingyi Kang, Shiji Song, Gao Huang · 2023

Computer science Mathematics Economics

The divergence of the Q-value estimation has been a prominent issue in offline RL, where the agent has no access to real dynamics. Traditional beliefs attribute this instability to querying out-of-distribution actions when bootstrapping va…

BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs Open

Yang Zhao, Zhijie Lin, Daquan Zhou, Zilong Huang, Jiashi Feng , et al. · 2023

Computer science Mathematics Sociology

LLMs have demonstrated remarkable abilities at interacting with humans through language, especially with the usage of instruction-following data. Recent advancements in LLMs, such as MiniGPT-4, LLaVA, and X-LLM, further enlarge their abili…

Value-Consistent Representation Learning for Data-Efficient Reinforcement Learning Open

Yue Yang, Bingyi Kang, Zhongwen Xu, Gao Huang, Shuicheng Yan · 2023

Computer science Political science

Deep reinforcement learning (RL) algorithms suffer severe performance degradation when the interaction data is scarce, which limits their real-world application. Recently, visual representation learning has been shown to be effective and p…

Decoupled Prioritized Resampling for Offline RL Open

Yue Yang, Bingyi Kang, Xiao Ma, Gao Huang, Shiji Song , et al. · 2023

Computer science Mathematics Physics

Offline reinforcement learning (RL) is challenged by the distributional shift problem. To address this problem, existing works mainly focus on designing sophisticated policy constraints between the learned policy and the behavior policy. H…

Improving and Benchmarking Offline Reinforcement Learning Algorithms Open

Bingyi Kang, Xiao Ma, Yirui Wang, Yue Yang, Shuicheng Yan · 2023

Computer science Medicine Business

Recently, Offline Reinforcement Learning (RL) has achieved remarkable progress with the emergence of various algorithms and datasets. However, these methods usually focus on algorithmic advancements, ignoring that many low-level implementa…

Efficient Diffusion Policies for Offline Reinforcement Learning Open

Bingyi Kang, Xiao Ma, Chao‐Hai Du, Tianyu Pang, Shuicheng Yan · 2023

Computer science Mathematics Physics

Offline reinforcement learning (RL) aims to learn optimal policies from offline datasets, where the parameterization of policies is crucial but often overlooked. Recently, Diffsuion-QL significantly boosts the performance of offline RL by …

MADiff: Offline Multi-agent Learning with Diffusion Models Open

Zhengbang Zhu, Minghuan Liu, Liyuan Mao, Bingyi Kang, Minkai Xu , et al. · 2023

Computer science Engineering Physics

Offline reinforcement learning (RL) aims to learn policies from pre-existing datasets without further interactions, making it a challenging task. Q-learning algorithms struggle with extrapolation errors in offline settings, while supervise…

Bag of Tricks for Training Data Extraction from Language Models Open

Weichen Yu, Tianyu Pang, Qian Liu, Chao‐Hai Du, Bingyi Kang , et al. · 2023

Computer science Geology Political science

With the advance of language models, privacy protection is receiving more attention. Training data extraction is therefore of great importance, as it can serve as a potential tool to assess privacy leakage. However, due to the difficulty o…

Phylogenomics, plastome structure and species identification in Mahonia (Berberidaceae) Open

Ru-chang Tong, Chao-Xia Gui, Yu Zhang, Na Su, Xiao-Qi Hou , et al. · 2022

Biology

Background Elucidating the phylogenetic relationships within species-rich genera is essential but challenging, especially when lineages are assumed to have been going through radiation events. Mahonia Nutt. (Berberidaceae) is a genus with …

Boosting Offline Reinforcement Learning via Data Rebalancing Open

Yue Yang, Bingyi Kang, Xiao Ma, Zhongwen Xu, Gao Huang , et al. · 2022

Computer science Engineering Geography

Offline reinforcement learning (RL) is challenged by the distributional shift between learning policies and datasets. To address this problem, existing works mainly focus on designing sophisticated algorithms to explicitly or implicitly co…

Bingyi Kang YOU? Author Swipe