Exploring foci of:
arXiv (Cornell University)
Heptapod: Language Modeling on Visual Signals
October 2025 • Yongxin Zhu, Jiawei Chen, Yuanzhe Chen, Zhuo Chen, Dongya Jia, Jian Cong, Xiaobin Zhuang, Yu‐Ping Wang, Yuxuan Wang
We introduce Heptapod, an image autoregressive model that adheres to the foundational principles of language modeling. Heptapod employs \textbf{causal attention}, \textbf{eliminates reliance on CFG}, and \textbf{eschews the trend of semantic tokenizers}. Our key innovation is \textit{next 2D distribution prediction}: a causal Transformer with reconstruction-focused visual tokenizer, learns to predict the distribution over the entire 2D spatial grid of images at each timestep. This learning objective unifies the se…
C (Programming Language)
Style (Visual Arts)
Being Funny In A Foreign Language
French Language
Russian Language
Hebrew Language
Claude (Language Model)
Language
Visual Arts In Israel
Language Family
Java (Programming Language)
Proto-Indo-European Language
Swahili Language
Rust (Programming Language)
Welsh Language
Scratch (Programming Language)
Akkadian Language
Maltese Language
Vietnamese Language
Romanian Language
Romansh Language
Visual Studio Code
Egyptian Language
Assembly Language