Heptapod: Language Modeling on Visual Signals

Exploring foci of: arXiv (Cornell University) Heptapod: Language Modeling on Visual Signals October 2025 • Yongxin Zhu, Jiawei Chen, Yuanzhe Chen, Zhuo Chen, Dongya Jia, Jian Cong, Xiaobin Zhuang, Yu‐Ping Wang, Yuxuan Wang We introduce Heptapod, an image autoregressive model that adheres to the foundational principles of language modeling. Heptapod employs \textbf{causal attention}, \textbf{eliminates reliance on CFG}, and \textbf{eschews the trend of semantic tokenizers}. Our key innovation is \textit{next 2D distribution prediction}: a causal Transformer with reconstruction-focused visual tokenizer, learns to predict the distribution over the entire 2D spatial grid of images at each timestep. This learning objective unifies the se… Open Article Page

C (Programming Language) Style (Visual Arts) Being Funny In A Foreign Language French Language Russian Language Hebrew Language Claude (Language Model) Language Visual Arts In Israel Open Article

Language Family Java (Programming Language) Proto-Indo-European Language Swahili Language Rust (Programming Language) Welsh Language Scratch (Programming Language) Akkadian Language Maltese Language Open Article

Vietnamese Language Romanian Language Romansh Language Visual Studio Code Egyptian Language Assembly Language Open Article