SentAlign: Accurate and Scalable Sentence Alignment Article Swipe
Related Concepts
Computer science
Sentence
Machine translation
Natural language processing
Artificial intelligence
Scalability
Task (project management)
Translation (biology)
German
Function (biology)
Linguistics
Database
Economics
Gene
Evolutionary biology
Messenger RNA
Chemistry
Biology
Biochemistry
Philosophy
Management
Steinþór Steingrímsson
,
Hrafn Loftsson
,
Andy Way
·
YOU?
·
· 2023
· Open Access
·
· DOI: https://doi.org/10.18653/v1/2023.emnlp-demo.22
· OA: W4389523635
YOU?
·
· 2023
· Open Access
·
· DOI: https://doi.org/10.18653/v1/2023.emnlp-demo.22
· OA: W4389523635
We present SentAlign, an accurate sentence alignment tool designed to handle very large parallel document pairs. Given user-defined parameters, the alignment algorithm evaluates all possible alignment paths in fairly large documents of thousands of sentences and uses a divide-and-conquer approach to align documents containing tens of thousands of sentences. The scoring function is based on LaBSE bilingual sentence representations. SentAlign outperforms five other sentence alignment tools when evaluated on two different evaluation sets, German-French and English-Icelandic, and on a downstream machine translation task.
Related Topics
Finding more related topics…