Explanipedia

OpenHuEval: Evaluating Large Language Model on Hungarian Specifics Open

Haote Yang, Xingjian Wei, Jiang Wu, Noémi Ligeti-Nagy, Jiaxing Sun , et al. · 2025

We introduce OpenHuEval, the first benchmark for LLMs focusing on the Hungarian language and specifics. OpenHuEval is constructed from a vast collection of Hungarian-specific materials sourced from multiple origins. In the construction, we…

ProkBERT PhaStyle: accurate phage lifestyle prediction with pretrained genomic language models Open

Julianna Juhász, Noémi Ligeti-Nagy, Babett Bodnár, János Juhász, Sándor Pongor , et al. · 2024

Motivation Phage lifestyle prediction, i.e. classifying phage sequences as virulent or temperate, is crucial in biomedical and ecological applications. Phage sequences from metagenome or virome assemblies are often fragmented, and the dive…

ProkBERT PhaStyle: Accurate Phage Lifestyle Prediction with Pretrained Genomic Language Models Open

Julianna Juhász, Bodnár Babett, János Juhász, Noémi Ligeti-Nagy, Sándor Pongor , et al. · 2024

Computer science Biology

Background Phage lifestyle prediction, i.e. classifying phage sequences as virulent or temperate, is crucial in biomedical and ecological applications. Phage sequences from metagenome or metavirome assemblies are often fragmented, and the …

ParlaMint II: Advancing Comparable Parliamentary Corpora Across Europe Open

Tomaž Erjavec, Matyáš Kopp, Nikola Ljubešić, Taja Kuzman, Paul Rayson , et al. · 2024

Political science

The paper presents the results of the ParlaMint II project, which comprise comparable corpora of parliamentary debates of 29 European countries and autonomous regions, covering at least the period from 2015 to 2022, and containing over 1 b…

ProkBERT family: genomic language models for microbiome applications Open

Balázs Ligeti, István Szepesi-Nagy, Babett Bodnár, Noémi Ligeti-Nagy, János Juhász · 2024

Computer science Biology

Background In the evolving landscape of microbiology and microbiome analysis, the integration of machine learning is crucial for understanding complex microbial interactions, and predicting and recognizing novel functionalities within exte…

OCR Cleaning of Scientific Texts with LLMs Open

Gábor Madarász, Noémi Ligeti-Nagy, András Holl, Tamás Váradi · 2024

Computer science

Correcting Optical Character Recognition (OCR) errors is a major challenge in preprocessing datasets consisting of legacy PDF files. In this study, we develop Large Language Models specially finetuned to correct OCR errors. We experimented…

ProkBERT Family: Genomic Language Models for Microbiome Applications Open

Balázs Ligeti, István Szepesi-Nagy, Babett Bodnár, Noémi Ligeti-Nagy, János Juhász · 2023

Computer science Biology

Machine learning offers transformative capabilities in microbiology and microbiome analysis, deciphering intricate microbial interactions, predicting functionalities, and unveiling novel patterns in vast datasets. This enriches our compreh…

Building machine reading comprehension model from scratch Open

Zijian Győző Yang, Noémi Ligeti-Nagy · 2023

Computer science Philosophy

In this paper, we introduce a machine reading comprehension \nmodel and how we built this model from scratch. Reading comprehension \nis a crucial requisite for artificial intelligence applications, such as Question-Answering systems, chat…

Improve Performance of Fine-tuning Language Models with Prompting Open

Zijian Győző Yang, Noémi Ligeti-Nagy · 2023

Computer science Geography Physics

This paper explores the effectiveness of prompt programming in the fine-tuning process of a Hungarian language model. The study builds on the prior success of prompt engineering in natural language processing tasks and employs the promptin…

A proof-of-concept meaning discrimination experiment to compile a word-in-context dataset for adjectives – A graph-based distributional approach Open

Enikö Héja, Noémi Ligeti-Nagy · 2022

Computer science Psychology Biology

The Word-in-Context corpus, which forms part of the SuperGLUE benchmark dataset, focuses on a specific sense disambiguation task: it has to be decided whether two occurrences of a given target word in two different contexts convey the same…

Winograd schemata and other datasets for anaphora resolution in Hungarian Open

Noémi Vadász, Noémi Ligeti-Nagy · 2022

Computer science Philosophy

The Winograd Schema Challenge (WSC, proposed by Levesque, Davis & Morgenstern 2012) is considered to be the novel Turing Test to examine machine intelligence. Winograd schema questions require the resolution of anaphora with the help of wo…

Creation of a corpus with semantic role labels for Hungarian Open

Attila Novák, László János Laki, Borbála Novák, Andrea Dömötör, Noémi Ligeti-Nagy , et al. · 2019

Computer science Chemistry Philosophy

In this article, an ongoing research is presented, the immediate goal of which is to create a corpus annotated with semantic role labels for Hungarian that can be used to train a parser-based system capable of formulating relevant question…

What does the Nom say? An algorithm for case disambiguation in Hungarian Open

Noémi Ligeti-Nagy, Andrea Dömötör, Noémi Vadász · 2019

Computer science

In this paper, we present our algorithm called nom-or-not designed for dissolving case-disambiguation in Hungarian.By case, we mean an abstract syntactic case, a kind of syntactic role of the given token.Nouns and proper names, adjectives,…

Noémi Ligeti-Nagy YOU? Author Swipe