R. Teixeira De Lima
YOU?
Author Swipe
View article: Advanced Layout Analysis Models for Docling
Advanced Layout Analysis Models for Docling Open
This technical report documents the development of novel Layout Analysis models integrated into the Docling document-conversion pipeline. We trained several state-of-the-art object detectors based on the RT-DETR, RT-DETRv2 and DFINE archit…
View article: Benchmarking Unsupervised Strategies for Anomaly Detection in Multivariate Time Series
Benchmarking Unsupervised Strategies for Anomaly Detection in Multivariate Time Series Open
Anomaly detection in multivariate time series is an important problem across various fields such as healthcare, financial services, manufacturing or physics detector monitoring. Accurately identifying when unexpected errors or faults occur…
View article: <scp>ChemQuery</scp>: A Natural Language Query‐Driven Service for Comprehensive Exploration of Chemistry Patent Literature
<span>ChemQuery</span>: A Natural Language Query‐Driven Service for Comprehensive Exploration of Chemistry Patent Literature Open
Patents are integral to our shared scientific knowledge, requiring companies and inventors to stay informed about them to conduct research, find licensing opportunities, and manage legal risks. However, the rising rate of filings has made …
View article: SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion Open
We introduce SmolDocling, an ultra-compact vision-language model targeting end-to-end document conversion. Our model comprehensively processes entire pages by generating DocTags, a new universal markup format that captures all page element…
View article: Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence
Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence Open
We introduce Granite Vision, a lightweight large language model with vision capabilities, specifically designed to excel in enterprise use cases, particularly in visual document understanding. Our model is trained on a comprehensive instru…
View article: Docling: An Efficient Open-Source Toolkit for AI-driven Document Conversion
Docling: An Efficient Open-Source Toolkit for AI-driven Document Conversion Open
We introduce Docling, an easy-to-use, self-contained, MIT-licensed, open-source toolkit for document conversion, that can parse several types of popular document formats into a unified, richly structured representation. It is powered by st…
View article: Know Your RAG: Dataset Taxonomy and Generation Strategies for Evaluating RAG Systems
Know Your RAG: Dataset Taxonomy and Generation Strategies for Evaluating RAG Systems Open
Retrieval Augmented Generation (RAG) systems are a widespread application of Large Language Models (LLMs) in the industry. While many tools exist empowering developers to build their own systems, measuring their performance locally, with d…
View article: Docling Technical Report
Docling Technical Report Open
This technical report introduces Docling, an easy to use, self-contained, MIT-licensed open-source package for PDF document conversion. It is powered by state-of-the-art specialized AI models for layout analysis (DocLayNet) and table struc…
View article: INDUS: Effective and Efficient Language Models for Scientific Applications
INDUS: Effective and Efficient Language Models for Scientific Applications Open
Large language models (LLMs) trained on general domain corpora showed remarkable results on natural language processing (NLP) tasks. However, previous research demonstrated LLMs trained using domain-focused corpora perform better on specia…
View article: Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml
Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml Open
Recurrent neural networks have been shown to be effective architectures for many tasks in high energy physics, and thus have been widely adopted. Their use in low-latency environments has, however, been limited as a result of the difficult…
View article: Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml
Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml Open
Recurrent neural networks have been shown to be effective architectures for many tasks in high energy physics, and thus have been widely adopted. Their use in low-latency environments has, however, been limited as a result of the difficult…
View article: Sequence-based Machine Learning Models in Jet Physics
Sequence-based Machine Learning Models in Jet Physics Open
Sequence-based modeling broadly refers to algorithms that act on data that is represented as an ordered set of input elements. In particular, Machine Learning algorithms with sequences as inputs have seen successfull applications to import…
View article: Accessing HH at ATLAS and CMS with the HL-LHC
Accessing HH at ATLAS and CMS with the HL-LHC Open
Directly measuring the Higgs self-coupling through Higgs pair production is an active area of interest for the LHC experiments, with a current limit around 10 times the Standard Model cross section per experiment. These results, exploting …
View article: Deep Sets for Flavor Tagging on the ATLAS Experiment
Deep Sets for Flavor Tagging on the ATLAS Experiment Open
Flavour Tagging is a major client for tracking in particle physics experiments at high energy colliders, where it is used to identify the experimental signatures of heavy flavor production. Among other features, charm and beauty hadron dec…
View article: Higgs Boson Pair Production at Colliders: Status and Perspectives
Higgs Boson Pair Production at Colliders: Status and Perspectives Open
This document summarises the current theoretical and experimental status of the di-Higgs boson production searches, and of the direct and indirect constraints on the Higgs boson self-coupling, with the wish to serve as a useful guide for t…
View article: Overview of Energy Reconstruction, and Electron and Photon Performances with the CMS ECAL in Run II
Overview of Energy Reconstruction, and Electron and Photon Performances with the CMS ECAL in Run II Open
The electromagnetic calorimeter (ECAL) of the Compact Muon Solenoid (CMS) Experiment is crucial for achieving high resolution measurements of electrons and photons. Maintaining and possibly improving the excellent performance achieved in R…
View article: Connectivity Maintenance of a Set of Agents through MST-based Algorithm
Connectivity Maintenance of a Set of Agents through MST-based Algorithm Open
In this paper, it is proposed a solution to the problem of positioning a set of agents that play the role of pursing a set of moving targets, while the global connectivity among such agents is maintained throughout positioning a second set…
View article: Optimizing Image Steganography using Particle Swarm Optimization Algorithm
Optimizing Image Steganography using Particle Swarm Optimization Algorithm Open
Image Steganography is the computing field of hiding information from a source into a target image in a way that it becomes almost imperceptible from one's eyes.Despite the high capacity of hiding information, the usual Least Significant B…
View article: Beyond the standard model Higgs physics with photons with the CMS detector
Beyond the standard model Higgs physics with photons with the CMS detector Open
The experimental discovery of the Higgs boson is one of the latest successes of the Standard Model of particle physics. Although all measurements have confirmed that this newly discovered particle is the Higgs boson predicted by the Standa…