Noémi Ligeti-Nagy
YOU?
Author Swipe
View article: OpenHuEval: Evaluating Large Language Model on Hungarian Specifics
OpenHuEval: Evaluating Large Language Model on Hungarian Specifics Open
We introduce OpenHuEval, the first benchmark for LLMs focusing on the Hungarian language and specifics. OpenHuEval is constructed from a vast collection of Hungarian-specific materials sourced from multiple origins. In the construction, we…
View article: ProkBERT PhaStyle: accurate phage lifestyle prediction with pretrained genomic language models
ProkBERT PhaStyle: accurate phage lifestyle prediction with pretrained genomic language models Open
Motivation Phage lifestyle prediction, i.e. classifying phage sequences as virulent or temperate, is crucial in biomedical and ecological applications. Phage sequences from metagenome or virome assemblies are often fragmented, and the dive…
View article: ProkBERT PhaStyle: Accurate Phage Lifestyle Prediction with Pretrained Genomic Language Models
ProkBERT PhaStyle: Accurate Phage Lifestyle Prediction with Pretrained Genomic Language Models Open
Background Phage lifestyle prediction, i.e. classifying phage sequences as virulent or temperate, is crucial in biomedical and ecological applications. Phage sequences from metagenome or metavirome assemblies are often fragmented, and the …
View article: ParlaMint II: Advancing Comparable Parliamentary Corpora Across Europe
ParlaMint II: Advancing Comparable Parliamentary Corpora Across Europe Open
The paper presents the results of the ParlaMint II project, which comprise comparable corpora of parliamentary debates of 29 European countries and autonomous regions, covering at least the period from 2015 to 2022, and containing over 1 b…
View article: ProkBERT family: genomic language models for microbiome applications
ProkBERT family: genomic language models for microbiome applications Open
Background In the evolving landscape of microbiology and microbiome analysis, the integration of machine learning is crucial for understanding complex microbial interactions, and predicting and recognizing novel functionalities within exte…
View article: OCR Cleaning of Scientific Texts with LLMs
OCR Cleaning of Scientific Texts with LLMs Open
Correcting Optical Character Recognition (OCR) errors is a major challenge in preprocessing datasets consisting of legacy PDF files. In this study, we develop Large Language Models specially finetuned to correct OCR errors. We experimented…
View article: ProkBERT Family: Genomic Language Models for Microbiome Applications
ProkBERT Family: Genomic Language Models for Microbiome Applications Open
Machine learning offers transformative capabilities in microbiology and microbiome analysis, deciphering intricate microbial interactions, predicting functionalities, and unveiling novel patterns in vast datasets. This enriches our compreh…
View article: Building machine reading comprehension model from scratch
Building machine reading comprehension model from scratch Open
In this paper, we introduce a machine reading comprehension
\nmodel and how we built this model from scratch. Reading comprehension
\nis a crucial requisite for artificial intelligence applications, such as Question-Answering systems, chat…
View article: Improve Performance of Fine-tuning Language Models with Prompting
Improve Performance of Fine-tuning Language Models with Prompting Open
This paper explores the effectiveness of prompt programming in the fine-tuning process of a Hungarian language model. The study builds on the prior success of prompt engineering in natural language processing tasks and employs the promptin…
View article: A proof-of-concept meaning discrimination experiment to compile a word-in-context dataset for adjectives – A graph-based distributional approach
A proof-of-concept meaning discrimination experiment to compile a word-in-context dataset for adjectives – A graph-based distributional approach Open
The Word-in-Context corpus, which forms part of the SuperGLUE benchmark dataset, focuses on a specific sense disambiguation task: it has to be decided whether two occurrences of a given target word in two different contexts convey the same…
View article: Winograd schemata and other datasets for anaphora resolution in Hungarian
Winograd schemata and other datasets for anaphora resolution in Hungarian Open
The Winograd Schema Challenge (WSC, proposed by Levesque, Davis & Morgenstern 2012) is considered to be the novel Turing Test to examine machine intelligence. Winograd schema questions require the resolution of anaphora with the help of wo…
View article: Creation of a corpus with semantic role labels for Hungarian
Creation of a corpus with semantic role labels for Hungarian Open
In this article, an ongoing research is presented, the immediate goal of which is to create a corpus annotated with semantic role labels for Hungarian that can be used to train a parser-based system capable of formulating relevant question…
View article: What does the Nom say? An algorithm for case disambiguation in Hungarian
What does the Nom say? An algorithm for case disambiguation in Hungarian Open
In this paper, we present our algorithm called nom-or-not designed for dissolving case-disambiguation in Hungarian.By case, we mean an abstract syntactic case, a kind of syntactic role of the given token.Nouns and proper names, adjectives,…