Simeng Sun
YOU?
Author Swipe
View article: An empirical study on the limitation of Transformers in program trace generation
An empirical study on the limitation of Transformers in program trace generation Open
We study Transformers on the task \emph{program trace generation} (PTG), where models produce step-by-step execution traces for synthetic programs. Unlike existing algorithmic problems, PTG externalizes reasoning through long traces where …
View article: nGPT: Normalized Transformer with Representation Learning on the Hypersphere
nGPT: Normalized Transformer with Representation Learning on the Hypersphere Open
We propose a novel neural network architecture, the normalized Transformer (nGPT) with representation learning on the hypersphere. In nGPT, all vectors forming the embeddings, MLP, attention matrices and hidden states are unit norm normali…
View article: Suri: Multi-constraint Instruction Following for Long-form Text Generation
Suri: Multi-constraint Instruction Following for Long-form Text Generation Open
Existing research on instruction following largely focuses on tasks with simple instructions and short responses. In this work, we explore multi-constraint instruction following for generating long-form text. We create Suri, a dataset with…
View article: TopicGPT: A Prompt-based Topic Modeling Framework
TopicGPT: A Prompt-based Topic Modeling Framework Open
Topic modeling is a well-established technique for exploring text corpora. Conventional topic models (e.g., LDA) represent topics as bags of words that often require "reading the tea leaves" to interpret; additionally, they offer users min…
View article: PEARL: Prompting Large Language Models to Plan and Execute Actions Over Long Documents
PEARL: Prompting Large Language Models to Plan and Execute Actions Over Long Documents Open
Strategies such as chain-of-thought prompting improve the performance of large language models (LLMs) on complex reasoning tasks by decomposing input examples into intermediate steps. However, it remains unclear how to apply such methods t…
View article: GraphIQA: Learning Distortion Graph Representations for Blind Image Quality Assessment
GraphIQA: Learning Distortion Graph Representations for Blind Image Quality Assessment Open
A good distortion representation is crucial for the success of deep blind image quality assessment (BIQA). However, most previous methods do not effectively model the relationship between distortions or the distribution of samples with the…
View article: Alternative Input Signals Ease Transfer in Multilingual Machine Translation
Alternative Input Signals Ease Transfer in Multilingual Machine Translation Open
Simeng Sun, Angela Fan, James Cross, Vishrav Chaudhary, Chau Tran, Philipp Koehn, Francisco Guzmán. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022.
View article: Do Long-Range Language Models Actually Use Long-Range Context?
Do Long-Range Language Models Actually Use Long-Range Context? Open
Language models are generally trained on short, truncated input sequences, which limits their ability to use discourse-level information present in long-range context to improve their predictions. Recent efforts to improve the efficiency o…
View article: Learning Omni-Frequency Region-adaptive Representations for Real Image Super-Resolution
Learning Omni-Frequency Region-adaptive Representations for Real Image Super-Resolution Open
Traditional single image super-resolution (SISR) methods that focus on solving single and uniform degradation (i.e., bicubic down-sampling), typically suffer from poor performance when applied into real-world low-resolution (LR) images due…
View article: Revisiting Simple Neural Probabilistic Language Models
Revisiting Simple Neural Probabilistic Language Models Open
Recent progress in language modeling has been driven not only by advances in neural architectures, but also through hardware and optimization improvements. In this paper, we revisit the neural probabilistic language model (NPLM) of~\citet{…
View article: Revisiting Simple Neural Probabilistic Language Models
Revisiting Simple Neural Probabilistic Language Models Open
Recent progress in language modeling has been driven not only by advances in neural architectures, but also through hardware and optimization improvements. In this paper, we revisit the neural probabilistic language model (NPLM) of Bengio …
View article: Energy-Based Reranking: Improving Neural Machine Translation Using Energy-Based Models
Energy-Based Reranking: Improving Neural Machine Translation Using Energy-Based Models Open
The discrepancy between maximum likelihood estimation (MLE) and task measures\nsuch as BLEU score has been studied before for autoregressive neural machine\ntranslation (NMT) and resulted in alternative training algorithms (Ranzato et\nal.…
View article: Do Long-Range Language Models Actually Use Long-Range Context?
Do Long-Range Language Models Actually Use Long-Range Context? Open
Language models are generally trained on short, truncated input sequences, which limits their ability to use discourse-level information present in long-range context to improve their predictions. Recent efforts to improve the efficiency o…
View article: IGA: An Intent-Guided Authoring Assistant
IGA: An Intent-Guided Authoring Assistant Open
While large-scale pretrained language models have significantly improved writing assistance functionalities such as autocomplete, more complex and controllable writing assistants have yet to be explored. We leverage advances in language mo…
View article: Learning Omni-frequency Region-adaptive Representations for Real Image Super-Resolution
Learning Omni-frequency Region-adaptive Representations for Real Image Super-Resolution Open
Traditional single image super-resolution (SISR) methods that focus on solving single and uniform degradation (i.e., bicubic down-sampling), typically suffer from poor performance when applied into real-world low-resolution (LR) images due…
View article: Latent-Separated Global Prediction for Learned Image Compression.
Latent-Separated Global Prediction for Learned Image Compression. Open
Over the past several years, we have witnessed the impressive progress of learned image compression. Recent learned image codecs are based on auto-encoders, that first encode an image into low-dimensional latent representations and then de…
View article: Multi-scale Grouped Dense Network for VVC Intra Coding
Multi-scale Grouped Dense Network for VVC Intra Coding Open
Versatile Video Coding (H.266/VVC) standard achieves better image quality when keeping the same bits than any other conventional image codec, such as BPG, JPEG, and etc. However, it is still attractive and challenging to improve the image …
View article: Hard-Coded Gaussian Attention for Neural Machine Translation
Hard-Coded Gaussian Attention for Neural Machine Translation Open
Recent work has questioned the importance of the Transformer's multi-headed attention for achieving high translation quality. We push further in this direction by developing a "hard-coded" attention variant without any learned parameters. …
View article: Hard-Coded Gaussian Attention for Neural Machine Translation
Hard-Coded Gaussian Attention for Neural Machine Translation Open
Recent work has questioned the importance of the Transformer's multi-headed attention for achieving high translation quality. We push further in this direction by developing a "hard-coded" attention variant without any learned parameters. …
View article: The Feasibility of Embedding Based Automatic Evaluation for Single Document Summarization
The Feasibility of Embedding Based Automatic Evaluation for Single Document Summarization Open
Simeng Sun, Ani Nenkova. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019.
View article: How to Compare Summarizers without Target Length? Pitfalls, Solutions and Re-Examination of the Neural Summarization Literature
How to Compare Summarizers without Target Length? Pitfalls, Solutions and Re-Examination of the Neural Summarization Literature Open
We show that plain ROUGE F1 scores are not ideal for comparing current neural systems which on average produce different lengths. This is due to a non-linear pattern between ROUGE F1 and summary length. To alleviate the effect of length du…