Michael Pust
YOU?
Author Swipe
View article: Essential-Web v1.0: 24T tokens of organized web data
Essential-Web v1.0: 24T tokens of organized web data Open
Data plays the most prominent role in how language models acquire skills and knowledge. The lack of massive, well-organized pre-training datasets results in costly and inaccessible data pipelines. We present Essential-Web v1.0, a 24-trilli…
View article: Practical Efficiency of Muon for Pretraining
Practical Efficiency of Muon for Pretraining Open
We demonstrate that Muon, the simplest instantiation of a second-order optimizer, explicitly expands the Pareto frontier over AdamW on the compute-time tradeoff. We find that Muon is more effective than AdamW in retaining data efficiency a…
View article: Rethinking Reflection in Pre-Training
Rethinking Reflection in Pre-Training Open
A language model's ability to reflect on its own reasoning provides a key advantage for solving complex problems. While most recent research has focused on how this ability develops during reinforcement learning, we show that it actually b…
View article: SARAL: A Low-Resource Cross-Lingual Domain-Focused Information Retrieval System for Effective Rapid Document Triage
SARAL: A Low-Resource Cross-Lingual Domain-Focused Information Retrieval System for Effective Rapid Document Triage Open
With the increasing democratization of elec-tronic media, vast information resources areavailable in less-frequently-taught languagessuch as Swahili or Somali.That informa-tion, which may be crucially important andnot available elsewhere, …
View article: Augmenting Statistical Machine Translation with Subword Translation of Out-of-Vocabulary Words
Augmenting Statistical Machine Translation with Subword Translation of Out-of-Vocabulary Words Open
Most statistical machine translation systems cannot translate words that are unseen in the training data. However, humans can translate many classes of out-of-vocabulary (OOV) words (e.g., novel morphological variants, misspellings, and co…
View article: Translating a Language You Don’t Know In the Chinese Room
Translating a Language You Don’t Know In the Chinese Room Open
In a corruption of John Searle’s famous AI thought experiment, the Chinese Room (Searle, 1980), we twist its original intent by enabling humans to translate text, e.g. from Uyghur to English, even if they don’t have any prior knowledge of …
View article: Design of a pressure sensitive matrix for analyzing direct haptic patient-therapist interaction in motor rehabilitation after stroke
Design of a pressure sensitive matrix for analyzing direct haptic patient-therapist interaction in motor rehabilitation after stroke Open
Robot based therapy is one of the prevalent therapeutic approaches in motor stroke rehabilitation. It is often used in hospitals in combination with conventional therapy. In order to optimize human-robot interaction, we aim to investigate …