Sergey Feldman
YOU?
Author Swipe
View article: The Alongside Digital Wellness Program for Youth: Longitudinal Pre-Post Outcomes Study
The Alongside Digital Wellness Program for Youth: Longitudinal Pre-Post Outcomes Study Open
Background Youth are increasingly experiencing psychological distress. Schools are ideal settings for disseminating mental health support, but they are often insufficiently resourced to do so. Digital mental health tools represent a unique…
View article: Ai2 Scholar QA: Organized Literature Synthesis with Attribution
Ai2 Scholar QA: Organized Literature Synthesis with Attribution Open
Retrieval-augmented generation is increasingly effective in answering scientific questions from literature, but many state-of-the-art systems are expensive and closed-source. We introduce Ai2 Scholar QA, a free online scientific question a…
View article: OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs
OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs Open
Scientific progress depends on researchers' ability to synthesize the growing body of literature. Can large language models (LMs) assist scientists in this task? We introduce OpenScholar, a specialized retrieval-augmented LM that answers s…
View article: TOPICAL: TOPIC Pages AutomagicaLly
TOPICAL: TOPIC Pages AutomagicaLly Open
Topic pages aggregate useful information about an entity or concept into a single succinct and accessible article. Automated creation of topic pages would enable their rapid curation as information resources, providing an alternative to tr…
View article: On-the-fly Definition Augmentation of LLMs for Biomedical NER
On-the-fly Definition Augmentation of LLMs for Biomedical NER Open
Despite their general capabilities, LLMs still struggle on biomedical NER tasks, which are difficult due to the presence of specialized terminology and lack of training data. In this work we set out to improve LLM performance on biomedical…
View article: RCT Rejection Sampling for Causal Estimation Evaluation
RCT Rejection Sampling for Causal Estimation Evaluation Open
Confounding is a significant obstacle to unbiased estimation of causal effects from observational data. For settings with high-dimensional covariates -- such as text data, genomics, or the behavioral social sciences -- researchers have pro…
View article: S2abEL: A Dataset for Entity Linking from Scientific Tables
S2abEL: A Dataset for Entity Linking from Scientific Tables Open
Entity linking (EL) is the task of linking a textual mention to its corresponding entry in a knowledge base, and is critical for many knowledge-intensive NLP applications. When applied to tables in scientific papers, EL is a step toward la…
View article: The Semantic Scholar Open Data Platform
The Semantic Scholar Open Data Platform Open
The volume of scientific output is creating an urgent need for automated tools to help scientists keep up with developments in their field. Semantic Scholar (S2) is an open data platform and website aimed at accelerating science by helping…
View article: S2abEL: A Dataset for Entity Linking from Scientific Tables
S2abEL: A Dataset for Entity Linking from Scientific Tables Open
Entity linking (EL) is the task of linking a textual mention to its corresponding entry in a knowledge base, and is critical for many knowledge-intensive NLP applications. When applied to tables in scientific papers, EL is a step toward la…
View article: SciRepEval: A Multi-Format Benchmark for Scientific Document Representations
SciRepEval: A Multi-Format Benchmark for Scientific Document Representations Open
Learned representations of scientific documents can serve as valuable input features for downstream tasks without further fine-tuning. However, existing benchmarks for evaluating these representations fail to capture the diversity of relev…
View article: SciRepEval: A Multi-Format Benchmark for Scientific Document Representations
SciRepEval: A Multi-Format Benchmark for Scientific Document Representations Open
Learned representations of scientific documents can serve as valuable input features for downstream tasks without further fine-tuning. However, existing benchmarks for evaluating these representations fail to capture the diversity of relev…
View article: S2AMP
S2AMP Open
Mentorship is a critical component of academia, but is not as visible as publications, citations, grants, and awards. Despite the importance of studying the quality and impact of mentorship, there are few large representative mentorship da…
View article: Infrastructure for Rapid Open Knowledge Network Development
Infrastructure for Rapid Open Knowledge Network Development Open
The past decade has witnessed a growth in the use of knowledge graph technologies for advanced data search, data integration, and query-answering applications. The leading example of a public, general-purpose open knowledge network (aka kn…
View article: Infrastructure for rapid open knowledge network development
Infrastructure for rapid open knowledge network development Open
The past decade has witnessed a growth in the use of knowledge graph technologies for advanced data search, data integration, and query‐answering applications. The leading example of a public, general‐purpose open knowledge network ( aka k…
View article: Literature-Augmented Clinical Outcome Prediction
Literature-Augmented Clinical Outcome Prediction Open
We present BEEP (Biomedical Evidence-Enhanced Predictions), a novel approach for clinical outcome prediction that retrieves patient-specific medical literature and incorporates it into predictive models. Based on each individual patient’s …
View article: ABNIRML: Analyzing the Behavior of Neural IR Models
ABNIRML: Analyzing the Behavior of Neural IR Models Open
Pretrained contextualized language models such as BERT and T5 have established a new state-of-the-art for ad-hoc search. However, it is not yet well understood why these methods are so effective, what makes some variants more effective tha…
View article: Literature-Augmented Clinical Outcome Prediction
Literature-Augmented Clinical Outcome Prediction Open
We present BEEP (Biomedical Evidence-Enhanced Predictions), a novel approach for clinical outcome prediction that retrieves patient-specific medical literature and incorporates it into predictive models. Based on each individual patient's …
View article: Overview of the TREC 2020 Fair Ranking Track
Overview of the TREC 2020 Fair Ranking Track Open
This paper provides an overview of the NIST TREC 2020 Fair Ranking track. For 2020, we again adopted an academic search task, where we have a corpus of academic article abstracts and queries submitted to a production academic search engine…
View article: S2AND: A Benchmark and Evaluation System for Author Name Disambiguation
S2AND: A Benchmark and Evaluation System for Author Name Disambiguation Open
Author Name Disambiguation (AND) is the task of resolving which author mentions in a bibliographic database refer to the same real-world person, and is a critical ingredient of digital library applications such as search and citation analy…
View article: ABNIRML: Analyzing the Behavior of Neural IR Models
ABNIRML: Analyzing the Behavior of Neural IR Models Open
Pretrained contextualized language models such as BERT and T5 have established a new state-of-the-art for ad-hoc search. However, it is not yet well-understood why these methods are so effective, what makes some variants more effective tha…
View article: SPECTER: Document-level Representation Learning using Citation-informed\n Transformers
SPECTER: Document-level Representation Learning using Citation-informed\n Transformers Open
Representation learning is a critical ingredient for natural language\nprocessing systems. Recent Transformer language models like BERT learn powerful\ntextual representations, but these models are targeted towards token- and\nsentence-lev…
View article: SPECTER: Document-level Representation Learning using Citation-informed Transformers
SPECTER: Document-level Representation Learning using Citation-informed Transformers Open
Representation learning is a critical ingredient for natural language processing systems. Recent Transformer language models like BERT learn powerful textual representations, but these models are targeted towards token- and sentence-level …
View article: Quantifying Sex Bias in Clinical Studies at Scale With Automated Data Extraction
Quantifying Sex Bias in Clinical Studies at Scale With Automated Data Extraction Open
Automated extraction of the number of participants in clinical reports provides an effective alternative to manual analysis of demographic bias. Despite legal and policy initiatives to increase female representation, sex bias against femal…
View article: Citation Count Analysis for Papers with Preprints
Citation Count Analysis for Papers with Preprints Open
We explore the degree to which papers prepublished on arXiv garner more citations, in an attempt to paint a sharper picture of fairness issues related to prepublishing. A paper's citation count is estimated using a negative-binomial genera…
View article: Construction of the Literature Graph in Semantic Scholar
Construction of the Literature Graph in Semantic Scholar Open
We describe a deployed scalable system for organizing published scientific literature into a heterogeneous graph to facilitate algorithmic manipulation and discovery. The resulting literature graph consists of more than 280M nodes, represe…
View article: Content-Based Citation Recommendation
Content-Based Citation Recommendation Open
We present a content-based method for recommending citations in an academic paper draft. We embed a given query document into a vector space, then use its nearest neighbors as candidates, and rerank the candidates using a discriminative mo…