Julian Risch
YOU?
Author Swipe
View article: Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs Open
Most NLP tasks are modeled as supervised learning and thus require labeled training data to train effective models. However, manually producing such data at sufficient quality and quantity is known to be costly and time-intensive. Current …
View article: Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs Open
Most NLP tasks are modeled as supervised learning and thus require labeled training data to train effective models. However, manually producing such data at sufficient quality and quantity is known to be costly and time-intensive. Current …
View article: Pseudo-Labels Are All You Need
Pseudo-Labels Are All You Need Open
Automatically estimating the complexity of texts for readers has a variety of applications, such as recommending texts with an appropriate complexity level to language learners or supporting the evaluation of text simplification approaches…
View article: Semantic Answer Similarity for Evaluating Question Answering Models
Semantic Answer Similarity for Evaluating Question Answering Models Open
The evaluation of question answering models compares ground-truth annotations with model predictions. However, as of today, this comparison is mostly lexical-based and therefore misses out on answers that have no lexical overlap but are st…
View article: Multi-modal Retrieval of Tables and Texts Using Tri-encoder Models
Multi-modal Retrieval of Tables and Texts Using Tri-encoder Models Open
Open-domain extractive question answering works well on textual data by first retrieving candidate texts and then extracting the answer from those candidates. However, some questions cannot be answered by text alone but require information…
View article: GermanQuAD and GermanDPR: Improving Non-English Question Answering and Passage Retrieval
GermanQuAD and GermanDPR: Improving Non-English Question Answering and Passage Retrieval Open
A major challenge of research on non-English machine reading for question answering (QA) is the lack of annotated datasets. In this paper, we present GermanQuAD, a dataset of 13,722 extractive question/answer pairs. To improve the reproduc…
View article: GermanQuAD and GermanDPR: Improving Non-English Question Answering and Passage Retrieval
GermanQuAD and GermanDPR: Improving Non-English Question Answering and Passage Retrieval Open
A major challenge of research on non-English machine reading for question answering (QA) is the lack of annotated datasets. In this paper, we present GermanQuAD, a dataset of 13,722 extractive question/answer pairs. To improve the reproduc…
View article: Multi-modal Retrieval of Tables and Texts Using Tri-encoder Models
Multi-modal Retrieval of Tables and Texts Using Tri-encoder Models Open
Open-domain extractive question answering works well on textual data by first retrieving candidate texts and then extracting the answer from those candidates. However, some questions cannot be answered by text alone but require information…
View article: Semantic Answer Similarity for Evaluating Question Answering Models
Semantic Answer Similarity for Evaluating Question Answering Models Open
The evaluation of question answering models compares ground-truth annotations with model predictions. However, as of today, this comparison is mostly lexical-based and therefore misses out on answers that have no lexical overlap but are st…
View article: Multifaceted Domain-Specific Document Embeddings
Multifaceted Domain-Specific Document Embeddings Open
Julian Risch, Philipp Hager, Ralf Krestel. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. 2021.
View article: Reader comment analysis on online news platforms
Reader comment analysis on online news platforms Open
Comment sections of online news platforms are an essential space to express opinions and discuss political topics. However, the misuse by spammers, haters, and trolls raises doubts about whether the benefits justify the costs of the time-c…
View article: Toxic Comment Collection: Making More Than 30 Datasets Easily Accessible in One Unified Format
Toxic Comment Collection: Making More Than 30 Datasets Easily Accessible in One Unified Format Open
With the rise of research on toxic comment classification, more and more annotated datasets have been released. The wide variety of the task (different languages, different labeling processes and schemes) has led to a large amount of heter…
View article: PatentMatch: A Dataset for Matching Patent Claims & Prior Art.
PatentMatch: A Dataset for Matching Patent Claims & Prior Art. Open
Patent examiners need to solve a complex information retrieval task when they assess the novelty and inventive step of claims made in a patent application. Given a claim, they search for prior art, which comprises all relevant publicly ava…
View article: PatentMatch: A Dataset for Matching Patent Claims & Prior Art
PatentMatch: A Dataset for Matching Patent Claims & Prior Art Open
Patent examiners need to solve a complex information retrieval task when they assess the novelty and inventive step of claims made in a patent application. Given a claim, they search for prior art, which comprises all relevant publicly ava…
View article: Explaining Offensive Language Detection
Explaining Offensive Language Detection Open
Machine learning approaches have proven to be on or even above human-level accuracy for the task of offensive language detection.In contrast to human experts, however, they often lack the capability of giving explanations for their decisio…
View article: Top Comment or Flop Comment? Predicting and Explaining User Engagement in Online News Discussions
Top Comment or Flop Comment? Predicting and Explaining User Engagement in Online News Discussions Open
Comment sections below online news articles enjoy growing popularity among readers. However, the overwhelming number of comments makes it infeasible for the average news consumer to read all of them and hinders engaging discussions. Most p…
View article: Top Comment or Flop Comment? Predicting and Explaining User Engagement in Online News Discussions
Top Comment or Flop Comment? Predicting and Explaining User Engagement in Online News Discussions Open
Comment sections below online news articles enjoy growing popularity among readers. However, the overwhelming number of comments makes it infeasible for the average news consumer to read all of them and hinders engaging discussions. Most p…
View article: Toxic Comment Detection in Online Discussions
Toxic Comment Detection in Online Discussions Open
Comment sections of online news platforms are an essential space to express opinions and discuss political topics. In contrast to other online posts, news discussions are related to particular news articles, comments refer to each other, a…
View article: My Approach = Your Apparatus? Entropy-Based Topic Modeling on Multiple Domain-Specific Text Collections
My Approach = Your Apparatus? Entropy-Based Topic Modeling on Multiple Domain-Specific Text Collections Open
Comparative text mining extends from genre analysis and political bias detection to the revelation of cultural and geographic differences, through to the search for prior art across patents and scientific papers. These applications use cro…
View article: My Approach = Your Apparatus?
My Approach = Your Apparatus? Open
Comparative text mining extends from genre analysis and political bias\ndetection to the revelation of cultural and geographic differences, through to\nthe search for prior art across patents and scientific papers. These\napplications use …
View article: Prediction for the Newsroom: Which Articles Will Get the Most Comments?
Prediction for the Newsroom: Which Articles Will Get the Most Comments? Open
Carl Ambroselli, Julian Risch, Ralf Krestel, Andreas Loos. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers). 2018.
View article: Challenges for Toxic Comment Classification: An In-Depth Error Analysis
Challenges for Toxic Comment Classification: An In-Depth Error Analysis Open
Toxic comment classification has become an active research field with many recently proposed approaches. However, while these approaches address some of the task's challenges others still remain unsolved and directions for further research…
View article: Real or Fake? Large-Scale Validation of Identity Leaks
Real or Fake? Large-Scale Validation of Identity Leaks Open
On the Internet, criminal hackers frequently leak identity data on a massive scale. Subsequent criminal activities, such as identity theft and misuse, put Internet users at risk. Leak checker services enable users to check whether their pe…