Explanipedia

MizAR 60 for Mizar 50 Open

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones , et al. · 2023

As a present to Mizar on its 50th anniversary, we develop an AI/TP system that automatically proves about 60% of the Mizar theorems in the hammer setting. We also automatically prove 75% of the Mizar theorems when the automated provers are…

Attention Is All You Need Open

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones , et al. · 2025

Computer science Physics Economics

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. …

Cross-lingual Language Model Pretraining Open

Guillaume Lample, Alexis Conneau · 2019

Computer science Chemistry

Recent studies have demonstrated the efficiency of generative pretraining for English natural language understanding. In this work, we extend this approach to multiple languages and show the effectiveness of cross-lingual pretraining. We p…

Understanding Back-Translation at Scale Open

Sergey Edunov, Myle Ott, Michael Auli, David Grangier · 2018

Computer science Mathematics Chemistry

An effective method to improve neural machine translation with monolingual data is to augment the parallel training corpus with back-translations of target language sentences. This work broadens the understanding of back-translation and in…

Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned Open

Elena Voita, David Talbot, Fédor Moiseev, Rico Sennrich, Ivan Titov · 2019

Computer science Mathematics Biology

Multi-head self-attention is a key component of the Transformer, a state-of-the-art architecture for neural machine translation. In this work we evaluate the contribution made by individual attention heads to the overall performance of the…

Multilingual Denoising Pre-training for Neural Machine Translation Open

Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov , et al. · 2020

Computer science Chemistry Economics

This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present mBART—a sequence-to-sequence denoising auto-encoder pre-trained on …

Multilingual Denoising Pre-training for Neural Machine Translation Open

Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov , et al. · 2020

Computer science Economics Chemistry

This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present mBART -- a sequence-to-sequence denoising auto-encoder pre-trained …

Phrase-Based & Neural Unsupervised Machine Translation Open

Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, Marc’Aurelio Ranzato · 2018

Computer science

Machine translation systems achieve near human-level performance on some languages, yet their effectiveness strongly relies on the availability of large amounts of parallel sentences, which hinders their applicability to the majority of la…

Non-Autoregressive Neural Machine Translation Open

Jiatao Gu, James Bradbury, Caiming Xiong, Victor O. K. Li, Richard Socher · 2017

Computer science Mathematics Physics

Existing approaches to neural machine translation condition each output word on previously generated outputs. We introduce a model that avoids this autoregressive property and produces its outputs in parallel, allowing an order of magnitud…

A Structured Review of the Validity of BLEU Open

Ehud Reiter · 2018

Computer science Chemistry Economics

The BLEU metric has been widely used in NLP for over 15 years to evaluate NLP systems, especially in machine translation and natural language generation. I present a structured review of the evidence on whether BLEU is a valid evaluation t…

Unsupervised Statistical Machine Translation Open

Mikel Artetxe, Gorka Labaka, Eneko Agirre · 2018

Computer science Chemistry

Linear temporal logic (LTL) is commonly used in model checking tasks; moreover, it is well-suited for the formalization of technical requirements. However, the correct specification and interpretation of temporal logic formulas require a s…

Meta-Learning for Low-Resource Neural Machine Translation Open

Jiatao Gu, Yong Wang, Yun Chen, Victor O. K. Li, Kyunghyun Cho · 2018

Computer science Chemistry Political science

In this paper, we propose to extend the recently introduced model-agnostic meta-learning algorithm (MAML, Finn, et al., 2017) for low-resource neural machine translation (NMT). We frame low-resource translation as a meta-learning problem w…

Facebook FAIR’s WMT19 News Translation Task Submission Open

Nathan Ng, Kyra Yee, Alexei Baevski, Myle Ott, Michael Auli , et al. · 2019

Computer science Philosophy Chemistry

This paper describes Facebook FAIR’s submission to the WMT19 shared news translation task. We participate in four language directions, English German and English Russian in both directions. Following our submission from last year, our base…

Self-Attention with Relative Position Representations Open

Peter Shaw, Jakob Uszkoreit, Ashish Vaswani · 2018

Computer science Economics Chemistry

Relying entirely on an attention mechanism, the Transformer introduced by Vaswani et al. (2017) achieves state-of-the-art results for machine translation. In contrast to recurrent and convolutional neural networks, it does not explicitly m…

Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder Open

Thanh-Le Ha, Jan Niehues, Alexander Waibel · 2016

Computer science Engineering Chemistry

In this paper, we present our first attempts in building a multilingual Neural Machine Translation framework under a unified approach. We are then able to employ attention-based NMT for many-to-many multilingual translation tasks. Our appr…

Unsupervised Pretraining for Sequence to Sequence Learning Open

Prajit Ramachandran, Peter Liu, Quoc V. Le · 2017

Computer science Mathematics Chemistry

This work presents a general unsupervised learning method to improve the accuracy of sequence to sequence (seq2seq) models. In our method, the weights of the encoder and decoder of a seq2seq model are initialized with the pretrained weight…

Iterative Back-Translation for Neural Machine Translation Open

Vu Cong Duy Hoang, Philipp Koehn, Gholamreza Haffari, Trevor Cohn · 2018

Computer science History Philosophy

We present iterative back-translation, a method for generating increasingly better synthetic parallel data from monolingual data to train neural machine translation systems. Our proposed method is very simple yet effective and highly appli…

Phrase-Based & Neural Unsupervised Machine Translation. Open

Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, Marc’Aurelio Ranzato · 2018

Computer science

Machine translation systems achieve near human-level performance on some languages, yet their effectiveness strongly relies on the availability of large amounts of parallel sentences, which hinders their applicability to the majority of la…

Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation Open

Jie Zhou, Ying Cao, Xuguang Wang, Peng Li, Wei Xu · 2016

Computer science Chemistry Economics

Neural machine translation (NMT) aims at solving machine translation (MT) problems using neural networks and has exhibited promising results in recent years. However, most of the existing NMT models are shallow and there is still a perform…

Rapid Adaptation of Neural Machine Translation to New Languages Open

Graham Neubig, Junjie Hu · 2018

Computer science Chemistry Physics

This paper examines the problem of adapting neural machine translation systems to new, low-resourced languages (LRLs) as effectively and rapidly as possible. We propose methods based on starting with massively multilingual "seed models", w…

Graph Transformer for Graph-to-Sequence Learning Open

Deng Cai, Wai Lam · 2020

Computer science Physics

The dominant graph-to-sequence transduction models employ graph neural networks for graph representation learning, where the structural information is reflected by the receptive field of neurons. Unlike graph neural networks that restrict …

Improving Neural Machine Translation Models with Monolingual Data Open

Rico Sennrich, Barry Haddow, Alexandra Birch · 2016

Computer science Economics Chemistry

Neural Machine Translation (NMT) has obtained state-of-the art performance for several language pairs, while only using parallel data for training. Target-side monolingual data plays an important role in boosting fluency for phrase-based s…

Instance Weighting for Neural Machine Translation Domain Adaptation Open

Rui Wang, Masao Utiyama, Lemao Liu, Kehai Chen, Eiichiro Sumita · 2017

Computer science Mathematics Medicine

Instance weighting has been widely applied to phrase-based machine translation domain adaptation. However, it is challenging to be applied to Neural Machine Translation (NMT) directly, because NMT is not a linear model. In this paper, two …

Transfer Learning for Low-Resource Neural Machine Translation Open

Barret Zoph, Deniz Yüret, Jonathan May, Kevin Knight · 2016

Computer science Philosophy Chemistry

The encoder-decoder framework for neural machine translation (NMT) has been shown effective in large data scenarios, but is much less effective for low-resource languages. We present a transfer learning method that significantly improves B…

BLEU is Not Suitable for the Evaluation of Text Simplification Open

Elior Sulem, Omri Abend, Ari Rappoport · 2018

Computer science Economics Philosophy

BLEU is widely considered to be an informative metric for text-to-text generation, including Text Simplification (TS). TS includes both lexical and structural aspects. In this paper we show that BLEU is not suitable for the evaluation of s…

Beyond BLEU:Training Neural Machine Translation with Semantic Similarity Open

John Wieting, Taylor Berg-Kirkpatrick, Kevin Gimpel, Graham Neubig · 2019

Computer science Chemistry Economics

While most neural machine translation (NMT)systems are still trained using maximum likelihood estimation, recent work has demonstrated that optimizing systems to directly improve evaluation metrics such as BLEU can significantly improve fi…

Generating Chinese Classical Poems with Statistical Machine Translation Models Open

Jing He, Ming Zhou, Long Jiang · 2021

Computer science Mathematics Chemistry

This paper describes a statistical approach to generation of Chinese classical poetry and proposes a novel method to automatically evaluate poems. The system accepts a set of keywords representing the writing intents from a writer and gene…

Non-Autoregressive Neural Machine Translation Open

Jiatao Gu, James Bradbury, Caiming Xiong, Victor O. K. Li, Richard Socher · 2017

Computer science Mathematics Physics

Existing approaches to neural machine translation condition each output word on previously generated outputs. We introduce a model that avoids this autoregressive property and produces its outputs in parallel, allowing an order of magnitud…

An Effective Approach to Unsupervised Machine Translation Open

Mikel Artetxe, Gorka Labaka, Eneko Agirre · 2019

Computer science Economics Chemistry

While machine translation has traditionally relied on large amounts of parallel corpora, a recent research line has managed to train both Neural Machine Translation (NMT) and Statistical Machine Translation (SMT) systems using monolingual …

Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input Open

Junliang Guo, Xu Tan, Di He, Tao Qin, Linli Xu , et al. · 2019

Computer science Chemistry Economics

Non-autoregressive translation (NAT) models, which remove the dependence on previous target tokens from the inputs of the decoder, achieve significantly inference speedup but at the cost of inferior accuracy compared to autoregressive tran…

BLEU ≈ BLEUBLEU