Simon Razniewski
YOU?
Author Swipe
Foundations of LLM Knowledge Materialization: Termination, Reproducibility, Robustness Open
Large Language Models (LLMs) encode substantial factual knowledge, yet measuring and systematizing this knowledge remains challenging. Converting it into structured format, for example through recursive extraction approaches such as the GP…
Mining the Mind: What 100M Beliefs Reveal About Frontier LLM Knowledge Open
LLMs are remarkable artifacts that have revolutionized a range of NLP and AI tasks. A significant contributor is their factual knowledge, which, to date, remains poorly understood, and is usually analyzed from biased samples. In this paper…
View article: AGENTS-LLM: Augmentative GENeration of Challenging Traffic Scenarios with an Agentic LLM Framework
AGENTS-LLM: Augmentative GENeration of Challenging Traffic Scenarios with an Agentic LLM Framework Open
Rare, yet critical, scenarios pose a significant challenge in testing and evaluating autonomous driving planners. Relying solely on real-world driving scenes requires collecting massive datasets to capture these scenarios. While automatic …
GPTKB v1.5: A Massive Knowledge Base for Exploring Factual LLM Knowledge Open
Language models are powerful tools, yet their factual knowledge is still poorly understood, and inaccessible to ad-hoc browsing and scalable statistical analysis. This demonstration introduces GPTKB v1.5, a densely interlinked 100-million-…
PEDANTIC: A Dataset for the Automatic Examination of Definiteness in Patent Claims Open
Patent claims define the scope of protection for an invention. If there are ambiguities in a claim, it is rejected by the patent office. In the US, this is referred to as indefiniteness (35 U.S.C § 112(b)) and is among the most frequent re…
Special issue on Wikidata construction, evaluation and applications Open
Wikidata ( Communications of the ACM 57 (2014) 78–85), the open knowledge graph maintained by the Wikimedia Foundation, continues to expand its role as a central hub of structured data for Wikipedia and its sister projects, as well as an i…
Enabling LLM Knowledge Analysis via Extensive Materialization Open
Large language models (LLMs) have majorly advanced NLP and AI, and next to their ability to perform a wide range of procedural tasks, a major success factor is their internalized factual knowledge. Since Petroni et al. (2019), analyzing th…
QUITE: Quantifying Uncertainty in Natural Language Text in Bayesian Reasoning Scenarios Open
Reasoning is key to many decision making processes. It requires consolidating a set of rule-like premises that are often associated with degrees of uncertainty and observations to draw conclusions. In this work, we address both the case wh…
Pap2Pat: Benchmarking Outline-Guided Long-Text Patent Generation with Patent-Paper Pairs Open
Dealing with long and highly complex technical text is a challenge for Large Language Models (LLMs), which still have to unfold their potential in supporting expensive and timeintensive processes like patent drafting. Within patents, the d…
FASETS: Discovering Faceted Sets of Entities Open
Computing related entities for a given seed entity is an important task in exploratory search and comparative data analysis.Prior works, using the seed-based set expansion paradigm, have focused on the single aspect of identifying homogene…
CardiO: Predicting Cardinality from Online Sources Open
Count questions are an important type of information need, though often present in noisy, contradictory, or semantically not fully aligned form on the Web. In this work, we propose CardiO, a lightweight and modular framework for searching …
Recall Them All: Retrieval-Augmented Language Models for Long Object List Extraction from Long Documents Open
Methods for relation extraction from text mostly focus on high precision, at the cost of limited recall. High recall is crucial, though, to populate long lists of object entities that stand in a specific relation with a given subject. Cues…
Cultural Commonsense Knowledge for Intercultural Dialogues Open
Despite recent progress, large language models (LLMs) still face the challenge of appropriately reacting to the intricacies of social and cultural conventions. This paper presents MANGO, a methodology for distilling high-accuracy, high-rec…
Completeness, Recall, and Negation in Open-world Knowledge Bases: A Survey Open
General-purpose knowledge bases (KBs) are a cornerstone of knowledge-centric AI. Many of them are constructed pragmatically from web sources and are thus far from complete. This poses challenges for the consumption as well as the curation …
BoschAI @ Causal News Corpus 2023: Robust Cause-Effect Span Extraction using Multi-Layer Sequence Tagging and Data Augmentation Open
Understanding causality is a core aspect of intelligence. The Event Causality Identification with Causal News Corpus Shared Task addresses two aspects of this challenge: Subtask 1 aims at detecting causal relationships in texts, and Subtas…
BoschAI @ PLABA 2023: Leveraging Edit Operations in End-to-End Neural Sentence Simplification Open
Automatic simplification can help laypeople to comprehend complex scientific text. Language models are frequently applied to this task by translating from complex to simple language. In this paper, we describe our system based on Llama 2, …
Evaluating the Knowledge Base Completion Potential of GPT Open
Structured knowledge bases (KBs) are an asset for search engines and other applications, but are inevitably incomplete. Language models (LMs) have been proposed for unsupervised knowledge base completion (KBC), yet, their ability to do thi…
View article: Large Language Models and Knowledge Graphs: Opportunities and Challenges
Large Language Models and Knowledge Graphs: Opportunities and Challenges Open
Large Language Models (LLMs) have taken Knowledge Representation -- and the world -- by storm. This inflection point marks a shift from explicit knowledge representation to a renewed focus on the hybrid representation of both explicit know…
Extracting Multi-valued Relations from Language Models Open
The widespread usage of latent language representations via pre-trained language models (LMs) suggests that they are a promising source of structured knowledge. However, existing methods focus only on a single object per subject-relation p…
Knowledge Base Completion for Long-Tail Entities Open
Despite their impressive scale, knowledge bases (KBs), such as Wikidata, still contain significant gaps. Language models (LMs) have been proposed as a source for filling these gaps. However, prior works have focused on prominent entities w…
Knowledge Base Completion for Long-Tail Entities Open
We developed a new dataset with an emphasis on the long-tail challenge, called MALT (for “Multi-token, Ambiguous, Long-Tailed facts”). The dataset contains 65.3% triple facts where the O entity is a multi-word phrase, and 58.6% ambiguous f…
Knowledge Base Completion for Long-Tail Entities Open
We developed a new dataset with an emphasis on the long-tail challenge, called MALT (for “Multi-token, Ambiguous, Long-Tailed facts”). The dataset contains 65.3% triple facts where the O entity is a multi-word phrase, and 58.6% ambiguous f…
Knowledge Base Completion for Long-Tail Entities Open
We developed a new dataset with an emphasis on the long-tail challenge, called MALT (for “Multi-token, Ambiguous, Long-Tailed facts”). The dataset contains 65.3% triple facts where the O entity is a multi-word phrase, and 58.6% ambiguous f…
Mapping and Cleaning Open Commonsense Knowledge Bases with Generative Translation Open
Structured knowledge bases (KBs) are the backbone of many know\-ledge-intensive applications, and their automated construction has received considerable attention. In particular, open information extraction (OpenIE) is often used to induce…
Wiki-Based Communities of Interest: Demographics and Outliers Open
In this paper, we release data about demographic information and outliers of communities of interest. Identified from Wiki-based sources, mainly Wikidata, the data covers 7.5k communities, e.g., members of the White House Coronavirus Task …
Can large language models generate salient negative statements? Open
We examine the ability of large language models (LLMs) to generate salient (interesting) negative statements about real-world entities; an emerging research topic of the last few years. We probe the LLMs using zero- and k-shot unconstraine…