Marie Mikulová
YOU?
Author Swipe
View article: Quality and Efficiency of Manual Annotation: Pre-annotation Bias
Quality and Efficiency of Manual Annotation: Pre-annotation Bias Open
This paper presents an analysis of annotation using an automatic pre-annotation for a mid-level annotation complexity task -- dependency syntax annotation. It compares the annotation efforts made by annotators using a pre-annotated version…
View article: Expressing Measure in Czech (A Corpus-Based Study)
Expressing Measure in Czech (A Corpus-Based Study) Open
In the contribution, we provide a theory-based and corpus-verified description of expressions for measure in Czech. We demonstrate that the measure expressions may modify quantity of entities ( approximately ten boys ), internal characteri…
View article: Capturing Numerals and Pronouns at the Morphological Layer in the Prague Dependency Treebanks of Czech
Capturing Numerals and Pronouns at the Morphological Layer in the Prague Dependency Treebanks of Czech Open
The paper presents a novel and unified morphological description of numerals and pronouns, as compiled for the newest edition of the Prague Dependency Treebank (Prague Dependency Treebank – Consolidated 1.0) and its integral part the morph…
View article: Consistency of morphological dictionary MorfFlex
Consistency of morphological dictionary MorfFlex Open
Language corpora usually contain, in addition to their own texts, various types of annotations. The most common one is a morphological annotation, which consists in assigning a lemma and a morphological tag to each wordform. For morphologi…
View article: PDT-Vallex: Czech Valency lexicon linked to treebanks 4.0 (PDT-Vallex 4.0)
PDT-Vallex: Czech Valency lexicon linked to treebanks 4.0 (PDT-Vallex 4.0) Open
The valency lexicon PDT-Vallex 4.0 has been built in close connection with the annotation of the Prague Dependency Treebank project (PDT) and its successors (mainly the Prague Czech-English Dependency Treebank project, PCEDT, the spoken la…
View article: Prague Dependency Treebank -- Consolidated 1.0
Prague Dependency Treebank -- Consolidated 1.0 Open
We present a richly annotated and genre-diversified language resource, the Prague Dependency Treebank-Consolidated 1.0 (PDT-C 1.0), the purpose of which is - as it always been the case for the family of the Prague Dependency Treebanks - to…
View article: Prague Dependency Treebank - Consolidated 1.0 (PDT-C 1.0)
Prague Dependency Treebank - Consolidated 1.0 (PDT-C 1.0) Open
A richly annotated and genre-diversified language resource, The Prague Dependency Treebank – Consolidated 1.0 (PDT-C 1.0, or PDT-C in short in the sequel) is a consolidated release of the existing PDT-corpora of Czech data, uniformly annot…
View article: Modifications of the Czech Morphological Dictionary for Consistent Corpus Annotation
Modifications of the Czech Morphological Dictionary for Consistent Corpus Annotation Open
We describe systematic changes that have been made to the Czech morphological dictionary related to annotating new data within the project of Prague Dependency Treebank (PDT). We bring new solutions to several complicated morphological fea…
View article: Search for the Relation of Form and Function Using the ForFun Database
Search for the Relation of Form and Function Using the ForFun Database Open
The aim of the contribution is to introduce a database of linguistic forms and their functions built with the use of the multi-layer annotated corpora of Czech, the Prague Dependency Treebanks. The purpose of the Prague Database of Forms a…
View article: Prague Dependency Treebank 3.5
Prague Dependency Treebank 3.5 Open
The Prague Dependency Treebank 3.5 is the 2018 edition of the core Prague Dependency Treebank (PDT). It contains all PDT annotation made at the Institute of Formal and Applied Linguistics under various projects between 1996 and 2018 on the…
View article: Subcategorization of Adverbial Meanings Based on Corpus Data
Subcategorization of Adverbial Meanings Based on Corpus Data Open
We introduce a corpus based description of selected adverbial meanings in Czech sentences. Its basic repertory is one of a long lasting tradition in both scientific and school grammars. However, before the corpus era, researchers had to re…
View article: Prague DaTabase of Spoken Czech 1.0
Prague DaTabase of Spoken Czech 1.0 Open
PDTSC 1.0 is a multi-purpose corpus of spoken language. 768,888 tokens, 73,374 sentences and 7,324 minutes of spontaneous dialog speech have been recorded, transcribed and edited in several interlinked layers: audio recordings, automatic a…
View article: Difference between Written and Spoken Czech: The Case of Verbal Nouns Denoting an Action
Difference between Written and Spoken Czech: The Case of Verbal Nouns Denoting an Action Open
The present paper extends understanding of differences in expressing actions by verbal nouns in corpora of written vs. spoken Czech, namely in the Czech part of the Prague Czech-English Dependency Treebank and in the Prague Dependency Tree…
View article: Prague Dependency Treebank of Spoken Czech 2.0 (PDTSC 2.0)
Prague Dependency Treebank of Spoken Czech 2.0 (PDTSC 2.0) Open
The Prague Dependency Treebank of Spoken Czech 2.0 (PDTSC 2.0) is a corpus of spoken language, consisting of 742,316 tokens and 73,835 sentences, representing 7,324 minutes (over 120 hours) of spontaneous dialogs. The dialogs have been rec…
View article: Prague Czech-English Dependency Treebank 2.0 Coref
Prague Czech-English Dependency Treebank 2.0 Coref Open
The Prague Czech-English Dependency Treebank 2.0 Coref (PCEDT 2.0 Coref) is a parallel treebank building upon the original PCEDT 2.0 release and enriching it with the extended manual annotation of coreference, as well as with an improved a…