Samuel Broscheit
YOU?
Author Swipe
View article: Improving Wikipedia verifiability with AI
Improving Wikipedia verifiability with AI Open
Verifiability is a core content policy of Wikipedia: claims need to be backed by citations. Maintaining and improving the quality of Wikipedia references is an important challenge and there is a pressing need for better tools to assist hum…
View article: Improving Wikipedia Verifiability with AI
Improving Wikipedia Verifiability with AI Open
Verifiability is a core content policy of Wikipedia: claims that are likely to be challenged need to be backed by citations. There are millions of articles available online and thousands of new articles are released each month. For this re…
View article: Improving Wikipedia Verifiability with AI
Improving Wikipedia Verifiability with AI Open
Verifiability is a core content policy of Wikipedia: claims that are likely to be challenged need to be backed by citations. There are millions of articles available online and thousands of new articles are released each month. For this re…
View article: Distributionally Robust Finetuning BERT for Covariate Drift in Spoken Language Understanding
Distributionally Robust Finetuning BERT for Covariate Drift in Spoken Language Understanding Open
In this study, we investigate robustness against covariate drift in spoken language understanding (SLU). Covariate drift can occur in SLUwhen there is a drift between training and testing regarding what users request or how they request it…
View article: The Web Is Your Oyster - Knowledge-Intensive NLP against a Very Large Web Corpus
The Web Is Your Oyster - Knowledge-Intensive NLP against a Very Large Web Corpus Open
In order to address increasing demands of real-world applications, the research for knowledge-intensive NLP (KI-NLP) should advance by capturing the challenges of a truly open-domain environment: web-scale knowledge, lack of structure, inc…
View article: Unsupervised Multi-View Post-OCR Error Correction With Language Models
Unsupervised Multi-View Post-OCR Error Correction With Language Models Open
We investigate post-OCR correction in a setting where we have access to different OCR views of the same document. The goal of this study is to understand if a pretrained language model (LM) can be used in an unsupervised way to reconcile t…
View article: You CAN Teach an Old Dog New Tricks! On Training Knowledge Graph Embeddings
You CAN Teach an Old Dog New Tricks! On Training Knowledge Graph Embeddings Open
Knowledge graph embedding (KGE) models learn algebraic representations of the entities and relations in a knowledge graph. A vast number of KGE techniques for multi-relational link prediction have been proposed in the recent literature, of…
View article: Can We Predict New Facts with Open Knowledge Graph Embeddings? A Benchmark for Open Link Prediction
Can We Predict New Facts with Open Knowledge Graph Embeddings? A Benchmark for Open Link Prediction Open
Open Information Extraction systems extract (“subject text”, “relation text”, “object text”) triples from raw text. Some triples are textual versions of facts, i.e., non-canonicalized mentions of entities and relations. In this paper, we i…
View article: LibKGE - A knowledge graph embedding library for reproducible research
LibKGE - A knowledge graph embedding library for reproducible research Open
LibKGE (https://github.com/uma-pi1/kge) is an open-source PyTorch-based library for training, hyperparameter optimization, and evaluation of knowledge graph embedding models for link prediction. The key goals of LibKGE are to enable reprod…
View article: PRoFET: Predicting the Risk of Firms from Event Transcripts
PRoFET: Predicting the Risk of Firms from Event Transcripts Open
Financial risk, defined as the chance to deviate from return expectations, is most commonly measured with volatility. Due to its value for investment decision making, volatility prediction is probably among the most important tasks in fina…
View article: OPIEC: An Open Information Extraction Corpus
OPIEC: An Open Information Extraction Corpus Open
Open information extraction (OIE) systems extract relations and their arguments from natural language text in an unsupervised manner. The resulting extractions are a valuable resource for downstream tasks such as knowledge base constructio…
View article: A Relational Tucker Decomposition for Multi-Relational Link Prediction
A Relational Tucker Decomposition for Multi-Relational Link Prediction Open
We propose the Relational Tucker3 (RT) decomposition for multi-relational link prediction in knowledge graphs. We show that many existing knowledge graph embedding models are special cases of the RT decomposition with certain predefined sp…
View article: Investigating Entity Knowledge in BERT with Simple Neural End-To-End Entity Linking
Investigating Entity Knowledge in BERT with Simple Neural End-To-End Entity Linking Open
A typical architecture for end-to-end entity linking systems consists of three steps: mention detection, candidate generation and entity disambiguation. In this study we investigate the following questions: (a) Can all those steps be learn…
View article: On Evaluating Embedding Models for Knowledge Base Completion
On Evaluating Embedding Models for Knowledge Base Completion Open
Knowledge graph embedding models have recently received significant attention in the literature. These models learn latent semantic representations for the entities and relations in a given knowledge base; the representations can be used t…
View article: Do Embedding Models Perform Well for Knowledge Base Completion
Do Embedding Models Perform Well for Knowledge Base Completion Open
In this work, we put into question the effectiveness of the evaluation methods currently used to measure the performance of latent factor models for the task of knowledge base completion. We argue that by focusing on a small subset of poss…
View article: On Evaluating Embedding Models for Knowledge Base Completion
On Evaluating Embedding Models for Knowledge Base Completion Open
Knowledge bases contribute to many web search and mining tasks, yet they are often incomplete. To add missing facts to a given knowledge base, various embedding models have been proposed in the recent literature. Perhaps surprisingly, rela…
View article: Learning Distributional Token Representations from Visual Features
Learning Distributional Token Representations from Visual Features Open
In this study, we compare token representations constructed from visual features (i.e., pixels) with standard lookup-based embeddings. Our goal is to gain insight about the challenges of encoding a text representation from low-level featur…
View article: A Neural Autoencoder Approach for Document Ranking and Query Refinement in Pharmacogenomic Information Retrieval
A Neural Autoencoder Approach for Document Ranking and Query Refinement in Pharmacogenomic Information Retrieval Open
In this study, we investigate learning-to-rank and query refinement approaches for information retrieval in the pharmacogenomic domain. The goal is to improve the information retrieval process of biomedical curators, who manually build kno…
View article: Summa At Tac Knowledge Base Population Task 2016
Summa At Tac Knowledge Base Population Task 2016 Open
Our submission to the NIST TAC-KBP-20161 is an initial attempt to apply our ongoing research on text analysis within SUMMA project to TAC shared tasks. The goal of SUMMA is to develop a scalable and extensible media monitoring platform wit…