Essential AI
YOU?
Author Swipe
View article: Essential-Web v1.0: 24T tokens of organized web data
Essential-Web v1.0: 24T tokens of organized web data Open
Data plays the most prominent role in how language models acquire skills and knowledge. The lack of massive, well-organized pre-training datasets results in costly and inaccessible data pipelines. We present Essential-Web v1.0, a 24-trilli…
View article: Practical Efficiency of Muon for Pretraining
Practical Efficiency of Muon for Pretraining Open
We demonstrate that Muon, the simplest instantiation of a second-order optimizer, explicitly expands the Pareto frontier over AdamW on the compute-time tradeoff. We find that Muon is more effective than AdamW in retaining data efficiency a…
View article: Rethinking Reflection in Pre-Training
Rethinking Reflection in Pre-Training Open
A language model's ability to reflect on its own reasoning provides a key advantage for solving complex problems. While most recent research has focused on how this ability develops during reinforcement learning, we show that it actually b…