Alejandro H. Toselli
YOU?
Author Swipe
View article: The PARES Database: Information Extraction over Historical Parish Records
The PARES Database: Information Extraction over Historical Parish Records Open
Historical census records convey information that is key to perform genealogical research and demographic studies. Given the large number of documents of this type that exist, it is crucial to research methods that allow the automatic extr…
View article: What distinguishes conspiracy from critical narratives? A computational analysis of oppositional discourse
What distinguishes conspiracy from critical narratives? A computational analysis of oppositional discourse Open
The current prevalence of conspiracy theories on the internet is a significant issue, tackled by many computational approaches. However, these approaches fail to recognize the relevance of distinguishing between texts which contain a consp…
View article: Segmenting large historical notarial manuscripts into multi-page deeds
Segmenting large historical notarial manuscripts into multi-page deeds Open
View article: Data Augmentation for Offline Handwritten Text Recognition: A Systematic Literature Review
Data Augmentation for Offline Handwritten Text Recognition: A Systematic Literature Review Open
View article: Open set classification of untranscribed handwritten text image documents
Open set classification of untranscribed handwritten text image documents Open
View article: End-to-End page-Level assessment of handwritten text recognition
End-to-End page-Level assessment of handwritten text recognition Open
[EN] The evaluation of Handwritten Text Recognition (HTR) systems has traditionally used metrics based on the edit distance between HTR and ground truth (GT) transcripts, at both the character and word levels. This is very adequate when th…
View article: Lexicon-based probabilistic indexing of handwritten text images
Lexicon-based probabilistic indexing of handwritten text images Open
View article: End-to-End Page-Level Assessment of Handwritten Text Recognition
End-to-End Page-Level Assessment of Handwritten Text Recognition Open
The evaluation of Handwritten Text Recognition (HTR) systems has traditionally used metrics based on the edit distance between HTR and ground truth (GT) transcripts, at both the character and word levels. This is very adequate when the exp…
View article: Revisiting Bag-of-Word Metrics to Assess End-To-End Text Image Recognition Results
Revisiting Bag-of-Word Metrics to Assess End-To-End Text Image Recognition Results Open
View article: End-to-End Page-Level Assessment Of Handwritten Text Recognition
End-to-End Page-Level Assessment Of Handwritten Text Recognition Open
View article: Fake News and Hate Speech: Language in Common
Fake News and Hate Speech: Language in Common Open
In this paper we raise the research question of whether fake news and hate speech spreaders share common patterns in language. We compute a novel index, the ingroup vs outgroup index, in three different datasets and we show that both pheno…
View article: Open Set Classification of Untranscribed Handwritten Documents
Open Set Classification of Untranscribed Handwritten Documents Open
Huge amounts of digital page images of important manuscripts are preserved in archives worldwide. The amounts are so large that it is generally unfeasible for archivists to adequately tag most of the documents with the required metadata so…
View article: PLANTAS Dataset
PLANTAS Dataset Open
The dataset "PLANTAS" (“Historia de las plantas”, Vol.1) were written using a quill-pen by Bernardo de Cienfuegos, one of the most outstanding Spanish botanists in the XVII century. The book was writing mainly in Spanish, but a significant…
View article: PLANTAS Dataset
PLANTAS Dataset Open
The dataset "PLANTAS" (“Historia de las plantas”, Vol.1) were written using a quill-pen by Bernardo de Cienfuegos, one of the most outstanding Spanish botanists in the XVII century. The book was writing mainly in Spanish, but a significant…
View article: A robust handwritten recognition system for learning on different data restriction scenarios
A robust handwritten recognition system for learning on different data restriction scenarios Open
View article: ICDAR 2021 Competition on Components Segmentation Task of Document Photos
ICDAR 2021 Competition on Components Segmentation Task of Document Photos Open
This paper describes the short-term competition on the Components Segmentation Task of Document Photos that was prepared in the context of the 16th International Conference on Document Analysis and Recognition (ICDAR 2021). This competitio…
View article: A Probabilistic Framework for Lexicon-based Keyword Spotting in Handwritten Text Images
A Probabilistic Framework for Lexicon-based Keyword Spotting in Handwritten Text Images Open
Query by String Keyword Spotting (KWS) is here considered as a key technology for indexing large collections of handwritten text images to allow fast textual access to the contents of these collections. Under this perspective, a probabilis…
View article: Digital Editions as Distant Supervision for Layout Analysis of Printed Books
Digital Editions as Distant Supervision for Layout Analysis of Printed Books Open
View article: HU‐PageScan: a fully convolutional neural network for document page crop
HU‐PageScan: a fully convolutional neural network for document page crop Open
November The offer of online, automated, and impersonal services demand users to upload scanned copies of their documents to the organisations. As a consequence of this decentralisation, the documents present more challenges to the already…
View article: Towards the Natural Language Processing as Spelling Correction for Offline Handwritten Text Recognition Systems
Towards the Natural Language Processing as Spelling Correction for Offline Handwritten Text Recognition Systems Open
The increasing portability of physical manuscripts to the digital environment makes it common for systems to offer automatic mechanisms for offline Handwritten Text Recognition (HTR). However, several scenarios and writing variations bring…
View article: HDSR-Flor: A Robust End-to-End System to Solve the Handwritten Digit String Recognition Problem in Real Complex Scenarios
HDSR-Flor: A Robust End-to-End System to Solve the Handwritten Digit String Recognition Problem in Real Complex Scenarios Open
Automatic handwriting recognition systems are of interest for academic research fields and for commercial applications. Recent advances in deep learning techniques have shown dramatic improvement in relation to classic computer vision prob…
View article: Handwritten Music Recognition for Mensural notation with convolutional recurrent neural networks
Handwritten Music Recognition for Mensural notation with convolutional recurrent neural networks Open
View article: Transforming scholarship in the archives through handwritten text recognition
Transforming scholarship in the archives through handwritten text recognition Open
Purpose An overview of the current use of handwritten text recognition (HTR) on archival manuscript material, as provided by the EU H2020 funded Transkribus platform. It explains HTR, demonstrates Transkribus , gives examples of use cases,…
View article: A set of benchmarks for Handwritten Text Recognition on historical documents
A set of benchmarks for Handwritten Text Recognition on historical documents Open
View article: Multi-task Layout Analysis of Handwritten Musical Scores
Multi-task Layout Analysis of Handwritten Musical Scores Open
View article: Probabilistic multi-word spotting in handwritten text images
Probabilistic multi-word spotting in handwritten text images Open
View article: Oficio De Hipotecas De Girona. A Dataset Of Spanish Notarial Deeds (18Th Century) For Handwritten Text Recognition And Layout Analysis Of Historical Documents.
Oficio De Hipotecas De Girona. A Dataset Of Spanish Notarial Deeds (18Th Century) For Handwritten Text Recognition And Layout Analysis Of Historical Documents. Open
This dataset is a subset of 596 documents from the Registre d'Hipoteques de Girona of 1769 collection, guarded by the Arxiu Històric de Girona. This collection, is composed by hundreds of thousands of notarial deeds from …
View article: Oficio De Hipotecas De Girona. A Dataset Of Spanish Notarial Deeds (18Th Century) For Handwritten Text Recognition And Layout Analysis Of Historical Documents.
Oficio De Hipotecas De Girona. A Dataset Of Spanish Notarial Deeds (18Th Century) For Handwritten Text Recognition And Layout Analysis Of Historical Documents. Open
This dataset is a subset of 596 documents from the Registre d'Hipoteques de Girona of 1769 collection, guarded by the Arxiu Històric de Girona. This collection, is composed by hundreds of thousands of notarial deeds from …
View article: Icdar 2015 Competition Htrts: Handwritten Text Recognition On The Transcriptorium Dataset Rerelease
Icdar 2015 Competition Htrts: Handwritten Text Recognition On The Transcriptorium Dataset Rerelease Open
A new release of the dataset used in the ICDAR 2015 HTR competition in which all Page XML files are based on the same 2013-07-15 schema. It only contains page level images, Page XML files for train and test (including the ground truth tran…
View article: Icdar 2015 Competition Htrts: Handwritten Text Recognition On The Transcriptorium Dataset Rerelease
Icdar 2015 Competition Htrts: Handwritten Text Recognition On The Transcriptorium Dataset Rerelease Open
A new release of the dataset used in the ICDAR 2015 HTR competition in which all Page XML files are based on the same 2013-07-15 schema. It only contains page level images, Page XML files for train and test (including the ground truth tran…