Explanipedia

SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models Open

Wufei Ma, Luoxin Ye, Celso M. de Melo, Jieneng Chen, Alan Yuille · 2025

Humans naturally understand 3D spatial relationships, enabling complex reasoning like predicting collisions of vehicles from different directions. Current large multimodal models (LMMs), however, lack of this capability of 3D spatial reaso…

GenEx: Generating an Explorable World Open

Taiming Lu, Tianmin Shu, Junfei Xiao, Luoxin Ye, Jiahao Wang , et al. · 2024

Understanding, navigating, and exploring the 3D physical real world has long been a central challenge in the development of artificial intelligence. In this work, we take a step toward this goal by introducing GenEx, a system capable of pl…

Efficient Large Multi-modal Models via Visual Context Compression Open

Jieneng Chen, Luoxin Ye, Ju He, Zhaoyang Wang, Daniel Khashabi , et al. · 2024

While significant advancements have been made in compressed representations for text embeddings in large language models (LLMs), the compression of visual tokens in multi-modal LLMs (MLLMs) has remained a largely overlooked area. In this w…

Luoxin Ye YOU? Author Swipe