Explanipedia

An empirical study on the limitation of Transformers in program trace generation Open

Simeng Sun · 2025

We study Transformers on the task \emph{program trace generation} (PTG), where models produce step-by-step execution traces for synthetic programs. Unlike existing algorithmic problems, PTG externalizes reasoning through long traces where …

nGPT: Normalized Transformer with Representation Learning on the Hypersphere Open

Ilya Loshchilov, Cheng-Ping Hsieh, Simeng Sun, Boris Ginsburg · 2024

We propose a novel neural network architecture, the normalized Transformer (nGPT) with representation learning on the hypersphere. In nGPT, all vectors forming the embeddings, MLP, attention matrices and hidden states are unit norm normali…

Suri: Multi-constraint Instruction Following for Long-form Text Generation Open

Chau Pham, Simeng Sun, Mohit Iyyer · 2024

Computer science Mathematics

Existing research on instruction following largely focuses on tasks with simple instructions and short responses. In this work, we explore multi-constraint instruction following for generating long-form text. We create Suri, a dataset with…

TopicGPT: A Prompt-based Topic Modeling Framework Open

Chau Pham, Alexander Hoyle, Simeng Sun, Mohit Iyyer · 2023

Computer science Geology Philosophy

Topic modeling is a well-established technique for exploring text corpora. Conventional topic models (e.g., LDA) represent topics as bags of words that often require "reading the tea leaves" to interpret; additionally, they offer users min…

PEARL: Prompting Large Language Models to Plan and Execute Actions Over Long Documents Open

Simeng Sun, Yang Liu, Shuohang Wang, Chenguang Zhu, Mohit Iyyer · 2023

Computer science Geography Philosophy

Strategies such as chain-of-thought prompting improve the performance of large language models (LLMs) on complex reasoning tasks by decomposing input examples into intermediate steps. However, it remains unclear how to apply such methods t…

GraphIQA: Learning Distortion Graph Representations for Blind Image Quality Assessment Open

Simeng Sun, Tao Yu, Jiahua Xu, Wei Zhou, Zhibo Chen · 2022

Computer science

A good distortion representation is crucial for the success of deep blind image quality assessment (BIQA). However, most previous methods do not effectively model the relationship between distortions or the distribution of samples with the…

Alternative Input Signals Ease Transfer in Multilingual Machine Translation Open

Simeng Sun, Angela Fan, James Cross, Vishrav Chaudhary, Chau Tran , et al. · 2022

Computer science Philosophy Physics

Simeng Sun, Angela Fan, James Cross, Vishrav Chaudhary, Chau Tran, Philipp Koehn, Francisco Guzmán. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022.

Do Long-Range Language Models Actually Use Long-Range Context? Open

Simeng Sun, Kalpesh Krishna, Andrew Mattarella-Micke, Mohit Iyyer · 2021

Computer science History Materials science

Language models are generally trained on short, truncated input sequences, which limits their ability to use discourse-level information present in long-range context to improve their predictions. Recent efforts to improve the efficiency o…

Learning Omni-Frequency Region-adaptive Representations for Real Image Super-Resolution Open

Xin Li, Xin Jin, Tao Yu, Simeng Sun, Yingxue Pang , et al. · 2021

Computer science Philosophy Physics

Traditional single image super-resolution (SISR) methods that focus on solving single and uniform degradation (i.e., bicubic down-sampling), typically suffer from poor performance when applied into real-world low-resolution (LR) images due…

Revisiting Simple Neural Probabilistic Language Models Open

Simeng Sun, Mohit Iyyer · 2021

Computer science Mathematics Engineering

Recent progress in language modeling has been driven not only by advances in neural architectures, but also through hardware and optimization improvements. In this paper, we revisit the neural probabilistic language model (NPLM) of~\citet{…

Revisiting Simple Neural Probabilistic Language Models Open

Simeng Sun, Mohit Iyyer · 2021

Computer science Physics Mathematics

Recent progress in language modeling has been driven not only by advances in neural architectures, but also through hardware and optimization improvements. In this paper, we revisit the neural probabilistic language model (NPLM) of Bengio …

Energy-Based Reranking: Improving Neural Machine Translation Using Energy-Based Models Open

Sumanta Bhattacharyya, Amirmohammad Rooshenas, Subhajit Naskar, Simeng Sun, Mohit Iyyer , et al. · 2021

Computer science Physics Chemistry

The discrepancy between maximum likelihood estimation (MLE) and task measures\nsuch as BLEU score has been studied before for autoregressive neural machine\ntranslation (NMT) and resulted in alternative training algorithms (Ranzato et\nal.…

Do Long-Range Language Models Actually Use Long-Range Context? Open

Simeng Sun, Kalpesh Krishna, Andrew Mattarella-Micke, Mohit Iyyer · 2021

Computer science History Physics

Language models are generally trained on short, truncated input sequences, which limits their ability to use discourse-level information present in long-range context to improve their predictions. Recent efforts to improve the efficiency o…

IGA: An Intent-Guided Authoring Assistant Open

Simeng Sun, Wenlong Zhao, Samaresh Mallikarjun Silli, Rajiv Jain, Vlad I. Morariu , et al. · 2021

Computer science Philosophy

While large-scale pretrained language models have significantly improved writing assistance functionalities such as autocomplete, more complex and controllable writing assistants have yet to be explored. We leverage advances in language mo…

Learning Omni-frequency Region-adaptive Representations for Real Image Super-Resolution Open

Xin Li, Xin Jin, Tao Yu, Yingxue Pang, Simeng Sun , et al. · 2020

Computer science Philosophy Physics

Traditional single image super-resolution (SISR) methods that focus on solving single and uniform degradation (i.e., bicubic down-sampling), typically suffer from poor performance when applied into real-world low-resolution (LR) images due…

Latent-Separated Global Prediction for Learned Image Compression. Open

Zongyu Guo, Zhizheng Zhang, Runsen Feng, Simeng Sun, Zhibo Chen · 2020

Computer science Physics Chemistry

Over the past several years, we have witnessed the impressive progress of learned image compression. Recent learned image codecs are based on auto-encoders, that first encode an image into low-dimensional latent representations and then de…

Multi-scale Grouped Dense Network for VVC Intra Coding Open

Xin Li, Simeng Sun, Zhizheng Zhang, Zhibo Chen · 2020

Computer science Mathematics

Versatile Video Coding (H.266/VVC) standard achieves better image quality when keeping the same bits than any other conventional image codec, such as BPG, JPEG, and etc. However, it is still attractive and challenging to improve the image …

Hard-Coded Gaussian Attention for Neural Machine Translation Open

Weiqiu You, Simeng Sun, Mohit Iyyer · 2020

Computer science Mathematics Engineering

Recent work has questioned the importance of the Transformer's multi-headed attention for achieving high translation quality. We push further in this direction by developing a "hard-coded" attention variant without any learned parameters. …

Hard-Coded Gaussian Attention for Neural Machine Translation Open

Weiqiu You, Simeng Sun, Mohit Iyyer · 2020

Computer science Engineering Physics

Recent work has questioned the importance of the Transformer's multi-headed attention for achieving high translation quality. We push further in this direction by developing a "hard-coded" attention variant without any learned parameters. …

The Feasibility of Embedding Based Automatic Evaluation for Single Document Summarization Open

Simeng Sun, Ani Nenkova · 2019

Computer science

Simeng Sun, Ani Nenkova. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019.

How to Compare Summarizers without Target Length? Pitfalls, Solutions and Re-Examination of the Neural Summarization Literature Open

Simeng Sun, Ori Shapira, Ido Dagan, Ani Nenkova · 2019

Computer science Psychology Sociology

We show that plain ROUGE F1 scores are not ideal for comparing current neural systems which on average produce different lengths. This is due to a non-linear pattern between ROUGE F1 and summary length. To alleviate the effect of length du…

Simeng Sun YOU? Author Swipe