Doug Downey
YOU?
Author Swipe
View article: Intent-Aware Schema Generation And Refinement For Literature Review Tables
Intent-Aware Schema Generation And Refinement For Literature Review Tables Open
The increasing volume of academic literature makes it essential for researchers to organize, compare, and contrast collections of documents. Large language models (LLMs) can support this process by generating schemas defining shared aspect…
View article: Ai2 Scholar QA: Organized Literature Synthesis with Attribution
Ai2 Scholar QA: Organized Literature Synthesis with Attribution Open
Retrieval-augmented generation is increasingly effective in answering scientific questions from literature, but many state-of-the-art systems are expensive and closed-source. We introduce Ai2 Scholar QA, a free online scientific question a…
View article: OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs
OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs Open
Scientific progress depends on researchers' ability to synthesize the growing body of literature. Can large language models (LMs) assist scientists in this task? We introduce OpenScholar, a specialized retrieval-augmented LM that answers s…
View article: The Semantic Reader Project
The Semantic Reader Project Open
Scholarly publications are key to the transfer of knowledge from scholars to others. However, research papers are information-dense, and as the volume of the scientific literature grows, the greater the need for new technology to support s…
View article: SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature
SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature Open
We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following instances for training and evaluation, covering 54 tasks. These tasks span five core scientific literature understan…
View article: TOPICAL: TOPIC Pages AutomagicaLly
TOPICAL: TOPIC Pages AutomagicaLly Open
Topic pages aggregate useful information about an entity or concept into a single succinct and accessible article. Automated creation of topic pages would enable their rapid curation as information resources, providing an alternative to tr…
View article: MARG: Multi-Agent Review Generation for Scientific Papers
MARG: Multi-Agent Review Generation for Scientific Papers Open
We study the ability of LLMs to generate feedback for scientific papers and develop MARG, a feedback generation approach using multiple LLM instances that engage in internal discussion. By distributing paper text across agents, MARG can co…
View article: CHAMP: Efficient Annotation and Consolidation of Cluster Hierarchies
CHAMP: Efficient Annotation and Consolidation of Cluster Hierarchies Open
Various NLP tasks require a complex hierarchical structure over nodes, where each node is a cluster of items. Examples include generating entailment graphs, hierarchical cross-document coreference resolution, annotating event and subevent …
View article: CARE: Extracting Experimental Findings From Clinical Literature
CARE: Extracting Experimental Findings From Clinical Literature Open
Extracting fine-grained experimental findings from literature can provide dramatic utility for scientific applications. Prior work has developed annotation schemas and datasets for limited aspects of this problem, failing to capture the re…
View article: A Computational Inflection for Scientific Discovery
A Computational Inflection for Scientific Discovery Open
Enabling researchers to leverage systems to overcome the limits of human cognitive capacity.
View article: ARIES: A Corpus of Scientific Paper Edits Made in Response to Peer Reviews
ARIES: A Corpus of Scientific Paper Edits Made in Response to Peer Reviews Open
We introduce the task of automatically revising scientific papers based on peer feedback and release ARIES, a dataset of review comments and their corresponding paper edits. The data is drawn from real reviewer-author interactions from com…
View article: Are Layout-Infused Language Models Robust to Layout Distribution Shifts? A Case Study with Scientific Documents
Are Layout-Infused Language Models Robust to Layout Distribution Shifts? A Case Study with Scientific Documents Open
Recent work has shown that infusing layout features into language models (LMs) improves processing of visually-rich documents such as scientific papers. Layout-infused LMs are often evaluated on documents with familiar layout features (e.g…
View article: SciMON: Scientific Inspiration Machines Optimized for Novelty
SciMON: Scientific Inspiration Machines Optimized for Novelty Open
We explore and enhance the ability of neural language models to generate novel scientific directions grounded in literature. Work on literature-based hypothesis generation has traditionally focused on binary link prediction--severely limit…
View article: S2abEL: A Dataset for Entity Linking from Scientific Tables
S2abEL: A Dataset for Entity Linking from Scientific Tables Open
Entity linking (EL) is the task of linking a textual mention to its corresponding entry in a knowledge base, and is critical for many knowledge-intensive NLP applications. When applied to tables in scientific papers, EL is a step toward la…
View article: Relatedly: Scaffolding Literature Reviews with Existing Related Work Sections
Relatedly: Scaffolding Literature Reviews with Existing Related Work Sections Open
Scholars who want to research a scientific topic must take time to read,\nextract meaning, and identify connections across many papers. As scientific\nliterature grows, this becomes increasingly challenging. Meanwhile, authors\nsummarize p…
View article: CiteSee: Augmenting Citations in Scientific Papers with Persistent and Personalized Historical Context
CiteSee: Augmenting Citations in Scientific Papers with Persistent and Personalized Historical Context Open
When reading a scholarly article, inline citations help researchers contextualize the current article and discover relevant prior work. However, it can be challenging to prioritize and make sense of the hundreds of citations encountered du…
View article: Beyond Summarization: Designing AI Support for Real-World Expository Writing Tasks
Beyond Summarization: Designing AI Support for Real-World Expository Writing Tasks Open
Large language models have introduced exciting new opportunities and challenges in designing and developing new AI-assisted writing support tools. Recent work has shown that leveraging this new technology can transform writing in many scen…
View article: LIMEADE: From AI Explanations to Advice Taking
LIMEADE: From AI Explanations to Advice Taking Open
Research in human-centered AI has shown the benefits of systems that can explain their predictions. Methods that allow AI to take advice from humans in response to explanations are similarly useful. While both capabilities are well develop…
View article: The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces
The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces Open
Scholarly publications are key to the transfer of knowledge from scholars to others. However, research papers are information-dense, and as the volume of the scientific literature grows, the need for new technology to support the reading p…
View article: The Semantic Scholar Open Data Platform
The Semantic Scholar Open Data Platform Open
The volume of scientific output is creating an urgent need for automated tools to help scientists keep up with developments in their field. Semantic Scholar (S2) is an open data platform and website aimed at accelerating science by helping…
View article: S2abEL: A Dataset for Entity Linking from Scientific Tables
S2abEL: A Dataset for Entity Linking from Scientific Tables Open
Entity linking (EL) is the task of linking a textual mention to its corresponding entry in a knowledge base, and is critical for many knowledge-intensive NLP applications. When applied to tables in scientific papers, EL is a step toward la…
View article: PaperMage: A Unified Toolkit for Processing, Representing, and Manipulating Visually-Rich Scientific Documents
PaperMage: A Unified Toolkit for Processing, Representing, and Manipulating Visually-Rich Scientific Documents Open
Kyle Lo, Zejiang Shen, Benjamin Newman, Joseph Chang, Russell Authur, Erin Bransom, Stefan Candra, Yoganand Chandrasekhar, Regan Huff, Bailey Kuehl, Amanpreet Singh, Chris Wilhelm, Angele Zamarron, Marti A. Hearst, Daniel Weld, Doug Downey…
View article: Embedding Recycling for Language Models
Embedding Recycling for Language Models Open
Real-world applications of neural language models often involve running many different models over the same corpus. The high computational cost of these runs has led to interest in techniques that can reuse the contextualized embeddings pr…
View article: CHAMP: Efficient Annotation and Consolidation of Cluster Hierarchies
CHAMP: Efficient Annotation and Consolidation of Cluster Hierarchies Open
Arie Cattan, Tom Hope, Doug Downey, Roy Bar-Haim, Lilach Eden, Yoav Kantor, Ido Dagan. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 2023.