Explanipedia

Quality and Efficiency of Manual Annotation: Pre-annotation Bias Open

Marie Mikulová, Milan Straka, Jan Štěpánek, Barbora Štěpánková, Jan Hajič · 2023

This paper presents an analysis of annotation using an automatic pre-annotation for a mid-level annotation complexity task -- dependency syntax annotation. It compares the annotation efforts made by annotators using a pre-annotated version…

Expressing Measure in Czech (A Corpus-Based Study) Open

Marie Mikulová · 2023

Computer science Philosophy

In the contribution, we provide a theory-based and corpus-verified description of expressions for measure in Czech. We demonstrate that the measure expressions may modify quantity of entities ( approximately ten boys ), internal characteri…

Capturing Numerals and Pronouns at the Morphological Layer in the Prague Dependency Treebanks of Czech Open

Barbora Štěpánková, Marie Mikulová · 2021

Computer science Philosophy

The paper presents a novel and unified morphological description of numerals and pronouns, as compiled for the newest edition of the Prague Dependency Treebank (Prague Dependency Treebank – Consolidated 1.0) and its integral part the morph…

Consistency of morphological dictionary MorfFlex Open

Jaroslava Hlaváčová, Marie Mikulová, Barbora Štěpánková · 2021

Computer science Biology Philosophy

Language corpora usually contain, in addition to their own texts, various types of annotations. The most common one is a morphological annotation, which consists in assigning a lemma and a morphological tag to each wordform. For morphologi…

PDT-Vallex: Czech Valency lexicon linked to treebanks 4.0 (PDT-Vallex 4.0) Open

Zdeňka Urešová, Alevtina Bémová, Eva Fučíková, Jan Hajič, Veronika Kolářová , et al. · 2021

Computer science Philosophy

The valency lexicon PDT-Vallex 4.0 has been built in close connection with the annotation of the Prague Dependency Treebank project (PDT) and its successors (mainly the Prague Czech-English Dependency Treebank project, PCEDT, the spoken la…

Prague Dependency Treebank -- Consolidated 1.0 Open

Jan Hajič, Eduard Bejček, Jaroslava Hlaváčová, Marie Mikulová, Milan Straka , et al. · 2020

Computer science Business Mathematics

We present a richly annotated and genre-diversified language resource, the Prague Dependency Treebank-Consolidated 1.0 (PDT-C 1.0), the purpose of which is - as it always been the case for the family of the Prague Dependency Treebanks - to…

Prague Dependency Treebank - Consolidated 1.0 (PDT-C 1.0) Open

Jan Hajič, Eduard Bejček, Alevtina Bémová, Eva Buráňová, Eva Fučíková , et al. · 2020

Computer science Philosophy

A richly annotated and genre-diversified language resource, The Prague Dependency Treebank – Consolidated 1.0 (PDT-C 1.0, or PDT-C in short in the sequel) is a consolidated release of the existing PDT-corpora of Czech data, uniformly annot…

Modifications of the Czech Morphological Dictionary for Consistent Corpus Annotation Open

Jaroslava Hlaváčová, Marie Mikulová, Barbora Štěpánková, Jan Hajič · 2019

Computer science Philosophy

We describe systematic changes that have been made to the Czech morphological dictionary related to annotating new data within the project of Prague Dependency Treebank (PDT). We bring new solutions to several complicated morphological fea…

Search for the Relation of Form and Function Using the ForFun Database Open

Marie Mikulová, Eduard Bejček, Eva Hajičová, Jarmila Panevová · 2018

Computer science Philosophy Biology

The aim of the contribution is to introduce a database of linguistic forms and their functions built with the use of the multi-layer annotated corpora of Czech, the Prague Dependency Treebanks. The purpose of the Prague Database of Forms a…

Prague Dependency Treebank 3.5 Open

Jan Hajič, Eduard Bejček, Alevtina Bémová, Eva Buráňová, Eva Hajičová , et al. · 2018

Computer science Philosophy

The Prague Dependency Treebank 3.5 is the 2018 edition of the core Prague Dependency Treebank (PDT). It contains all PDT annotation made at the Institute of Formal and Applied Linguistics under various projects between 1996 and 2018 on the…

Subcategorization of Adverbial Meanings Based on Corpus Data Open

Marie Mikulová, Eduard Bejček, Veronika Kolářová, Jarmila Panevová · 2017

Computer science Philosophy

We introduce a corpus based description of selected adverbial meanings in Czech sentences. Its basic repertory is one of a long lasting tradition in both scientific and school grammars. However, before the corpus era, researchers had to re…

Prague DaTabase of Spoken Czech 1.0 Open

Jan Hajič, Petr Pajas, Pavel Ircing, Jan Romportl, Nino Peterek , et al. · 2017

Computer science Philosophy

PDTSC 1.0 is a multi-purpose corpus of spoken language. 768,888 tokens, 73,374 sentences and 7,324 minutes of spontaneous dialog speech have been recorded, transcribed and edited in several interlinked layers: audio recordings, automatic a…

Difference between Written and Spoken Czech: The Case of Verbal Nouns Denoting an Action Open

Veronika Kolářová, Jan Kolář, Marie Mikulová · 2017

Computer science Philosophy

The present paper extends understanding of differences in expressing actions by verbal nouns in corpora of written vs. spoken Czech, namely in the Czech part of the Prague Czech-English Dependency Treebank and in the Prague Dependency Tree…

Prague Dependency Treebank of Spoken Czech 2.0 (PDTSC 2.0) Open

Marie Mikulová, Alevtina Bémová, Jan Hajič, Eva Hajičová, Pavel Ircing , et al. · 2017

Computer science Philosophy

The Prague Dependency Treebank of Spoken Czech 2.0 (PDTSC 2.0) is a corpus of spoken language, consisting of 742,316 tokens and 73,835 sentences, representing 7,324 minutes (over 120 hours) of spontaneous dialogs. The dialogs have been rec…

Prague Czech-English Dependency Treebank 2.0 Coref Open

Anna Nedoluzhko, Michal Novák, Silvie Cinková, Marie Mikulová, Jiří Mírovský · 2016

Computer science Philosophy

The Prague Czech-English Dependency Treebank 2.0 Coref (PCEDT 2.0 Coref) is a parallel treebank building upon the original PCEDT 2.0 release and enriching it with the extended manual annotation of coreference, as well as with an improved a…

Marie Mikulová YOU? Author Swipe