Natural language processing

MizAR 60 for Mizar 50 Open

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones , et al. · 2023

As a present to Mizar on its 50th anniversary, we develop an AI/TP system that automatically proves about 60% of the Mizar theorems in the hammer setting. We also automatically prove 75% of the Mizar theorems when the automated provers are…

Transformer-Based Feature Learning for Algorithm Selection in Combinatorial Optimisation Open

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi , et al. · 2025

Computer science Mathematics

Given a combinatorial optimisation problem, there are typically multiple ways of modelling it for presentation to an automated solver. Choosing the right combination of model and target solver can have a significant impact on the effective…

Evaluating the Effectiveness of Large Language Models in Representing Textual Descriptions of Geometry and Spatial Relations (Short Paper) Open

T. B. Brown, Benjamin F. Mann · 2023

Computer science Philosophy Economics

This research focuses on assessing the ability of large language models (LLMs) in representing geometries and their spatial relations. We utilize LLMs including GPT-2 and BERT to encode the well-known text (WKT) format of geometries and th…

Enriching Word Vectors with Subword Information Open

Piotr Bojanowski, Édouard Grave, Armand Joulin, Tomáš Mikolov · 2017

Computer science Mathematics Political science

Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. Popular models that learn such representations ignore the morphology of words, by assigning a distinct vector to eac…

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks Open

Nils Reimers, Iryna Gurevych · 2019

Computer science

Nils Reimers, Iryna Gurevych. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text\n Transformer Open

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang , et al. · 2019

Computer science Engineering

Transfer learning, where a model is first pre-trained on a data-rich task\nbefore being fine-tuned on a downstream task, has emerged as a powerful\ntechnique in natural language processing (NLP). The effectiveness of transfer\nlearning has…

A Simple Framework for Contrastive Learning of Visual Representations Open

Ting Chen, Simon Kornblith, Mohammad Norouzi, Geoffrey E. Hinton · 2020

Computer science Mathematics Political science

This paper presents SimCLR: a simple framework for contrastive learning of visual representations. We simplify recently proposed contrastive self-supervised learning algorithms without requiring specialized architectures or a memory bank. …

Attention Is All You Need Open

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones , et al. · 2025

Computer science Physics Economics

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. …

BioBERT: a pre-trained biomedical language representation model for biomedical text mining Open

Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim , et al. · 2019

Computer science Economics Political science

Motivation Biomedical text mining is becoming increasingly important as the number of biomedical documents rapidly grows. With the progress in natural language processing (NLP), extracting valuable information from biomedical literature ha…

SQuAD: 100,000+ Questions for Machine Comprehension of Text Open

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, Percy Liang · 2016

Computer science

We present the Stanford Question Answering Dataset (SQuAD), a new reading comprehension dataset consisting of 100,000+ questions posed by crowdworkers on a set of Wikipedia articles, where the answer to each question is a segment of text f…

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Open

Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi , et al. · 2016

Computer science Philosophy Chemistry

Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to …

Learning Transferable Visual Models From Natural Language Supervision Open

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh , et al. · 2021

Computer science Psychology Economics

State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify an…

Kaldi Speech Recognition Toolkit Open

Daniel Povey · 2024

Computer science Mathematics Biology

—We describe the design of Kaldi, a free, open-source toolkit for speech recognition research. Kaldi provides a speech recognition system based on finite-state transducers (using the freely available OpenFst), together with detailed docume…

Hierarchical Attention Networks for Document Classification Open

Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola , et al. · 2016

Computer science Philosophy Psychology

Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, Eduard Hovy. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016.

Reading digits in natural images with unsupervised feature learning Open

Yuval Netzer · 2024

Computer science Philosophy Political science

Detecting and reading text from natural images is a hard computer vision task that is central to a variety of emerging applications. Related problems like document character recognition have been widely studied by computer vision and machi…

LLM-Supported Manufacturing Mapping Generation Open

Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf · 2025

Computer science Engineering Chemistry

In large manufacturing companies, such as Bosch, that operate thousands of production lines with each comprising up to dozens of production machines and other equipment, even simple inventory questions such as of location and quantities of…

Representation Learning with Contrastive Predictive Coding Open

Aäron van den Oord, Yazhe Li, Oriol Vinyals · 2018

Computer science Mathematics Political science

While supervised learning has enabled great progress in many applications, unsupervised learning has not seen such widespread adoption, and remains an important and challenging endeavor for artificial intelligence. In this work, we propose…

Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation Open

David Powers · 2020

Computer science Mathematics Psychology

Commonly used evaluation measures including Recall, Precision, F-Measure and Rand Accuracy are biased and should not be used without clear understanding of the biases, and corresponding identification of chance or base case levels of the s…

Neural Architectures for Named Entity Recognition Open

Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, Chris Dyer · 2016

Computer science Engineering

Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, Chris Dyer. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016.

An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling Open

Shaojie Bai, J. Zico Kolter, Vladlen Koltun · 2018

Computer science Biology Geography

For most deep learning practitioners, sequence modeling is synonymous with recurrent networks. Yet recent results indicate that convolutional architectures can outperform recurrent networks on tasks such as audio synthesis and machine tran…

Training language models to follow instructions with human feedback Open

Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright , et al. · 2022

Computer science Materials science Philosophy

Making language models bigger does not inherently make them better at following a user's intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these m…

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Open

Jason Lee, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed H. , et al. · 2022

Computer science Psychology Philosophy

We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerg…

Get To The Point: Summarization with Pointer-Generator Networks Open

Abigail See, Peter J. Liu, Christopher D. Manning · 2017

Computer science Biology Physics

Neural sequence-to-sequence models have provided a viable new approach for abstractive text summarization (meaning they are not restricted to simply selecting and rearranging passages from the original text). However, these models have two…

Representation Learning with Contrastive Predictive Coding Open

Aäron van den Oord, Yazhe Li, Oriol Vinyals · 2018

Computer science Mathematics Political science

While supervised learning has enabled great progress in many applications, unsupervised learning has not seen such widespread adoption, and remains an important and challenging endeavor for artificial intelligence. In this work, we propose…

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Open

Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy , et al. · 2018

Computer science Geography Political science

Human ability to understand language is general, flexible, and robust. In contrast, most NLU models above the word level are designed for a specific task and struggle with out-of-domain data. If we aspire to develop models with understandi…

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Open

Colin Raffel, Noam Shazeer, Adam P. Roberts, Katherine Lee, Sharan Narang , et al. · 2019

Computer science Physics Economics

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has gi…

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing Open

Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi , et al. · 2022

Computer science Engineering Philosophy

This article surveys and organizes research works in a new paradigm in natural language processing, which we dub “prompt-based learning.” Unlike traditional supervised learning, which trains a model to take in an input x and predict an out…

Transformer-XL: Attentive Language Models beyond a Fixed-Length Context Open

Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le , et al. · 2019

Computer science Engineering

Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling. We propose a novel neural architecture Transformer-XL that enables learning dependency beyond …

Language Models are Few-Shot Learners Open

T. B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan , et al. · 2020

Computer science Economics Philosophy

Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires…

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks Open

Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin , et al. · 2020

Computer science

Large pre-trained language models have been shown to store factual knowledge in their parameters, and achieve state-of-the-art results when fine-tuned on downstream NLP tasks. However, their ability to access and precisely manipulate knowl…

Natural language processing ≈ Natural language processing