Davide Buscaldi
YOU?
Author Swipe
View article: Leveraging knowledge graphs and LLMs for content-based reviewer assignment
Leveraging knowledge graphs and LLMs for content-based reviewer assignment Open
The growing volume of academic submissions in recent years highlighted the need for scalable and accurate reviewer assignment systems, able to go beyond techniques based on manual processes and basic keyword matching. We propose a novel pi…
View article: Unveiling Decision-Making in LLMs for Text Classification : Extraction of influential and interpretable concepts with Sparse Autoencoders
Unveiling Decision-Making in LLMs for Text Classification : Extraction of influential and interpretable concepts with Sparse Autoencoders Open
Sparse Autoencoders (SAEs) have been successfully used to probe Large Language Models (LLMs) and extract interpretable concepts from their internal representations. These concepts are linear combinations of neuron activations that correspo…
View article: CS-KG 2.0: A Large-scale Knowledge Graph of Computer Science
CS-KG 2.0: A Large-scale Knowledge Graph of Computer Science Open
The rapid evolution of AI and the increased accessibility of scientific articles through open access marks a pivotal moment in research. AI-driven tools are reshaping how scientists explore, interpret, and contribute to the body of scienti…
View article: PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models
PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models Open
Visual Language Models require substantial computational resources for inference due to the additional input tokens needed to represent visual information. However, these visual tokens often contain redundant and unimportant information, r…
View article: Research hypothesis generation over scientific knowledge graphs
Research hypothesis generation over scientific knowledge graphs Open
Generating research hypotheses is a crucial step in scientific investigation that involves the creation of precise, verifiable, and logically valid statements that can be empirically examined. Therefore, many efforts have been made to auto…
View article: Rewiring Techniques to Mitigate Oversquashing and Oversmoothing in GNNs: A Survey
Rewiring Techniques to Mitigate Oversquashing and Oversmoothing in GNNs: A Survey Open
Graph Neural Networks (GNNs) are powerful tools for learning from graph-structured data, but their effectiveness is often constrained by two critical challenges: oversquashing, where the excessive compression of information from distant no…
View article: Predicting memorization within Large Language Models fine-tuned for classification
Predicting memorization within Large Language Models fine-tuned for classification Open
Large Language Models have received significant attention due to their abilities to solve a wide range of complex tasks. However these models memorize a significant proportion of their training data, posing a serious threat when disclosed …
View article: Workshop on Deep Learning and Large Language Models for Knowledge Graphs (DL4KG)
Workshop on Deep Learning and Large Language Models for Knowledge Graphs (DL4KG) Open
The use of Knowledge Graphs (KGs) which constitute large networks of real-world entities and their interrelationships, has grown rapidly. A substantial body of research has emerged, exploring the integration of deep learning (DL) and large…
View article: Triplétoile: Extraction of knowledge from microblogging text
Triplétoile: Extraction of knowledge from microblogging text Open
Numerous methods and pipelines have recently emerged for the automatic extraction of knowledge graphs from documents such as scientific publications and patents. However, adapting these methods to incorporate alternative text sources like …
View article: An Ensemble Method Based on the Combination of Transformers with Convolutional Neural Networks to Detect Artificially Generated Text
An Ensemble Method Based on the Combination of Transformers with Convolutional Neural Networks to Detect Artificially Generated Text Open
Thanks to the state-of-the-art Large Language Models (LLMs), language generation has reached outstanding levels. These models are capable of generating high quality content, thus making it a challenging task to detect generated text from h…
View article: A Knowledge Graph-Based Method for the Geolocation of Tweets
A Knowledge Graph-Based Method for the Geolocation of Tweets Open
Twitter geolocation is useful for various purposes, including tracking COVID-19 perceptions, analyzing political trends, and managing natural disasters. However, accurately predicting geolocations based on tweet content remains a challenge…
View article: Data produced in the context of RCLN particpation to the Visual WSD task at SemEval 2023
Data produced in the context of RCLN particpation to the Visual WSD task at SemEval 2023 Open
The data contain: - generated captions from train, trial and test images - generated images from the diffusion model refer to https://github.com/dbuscaldi/VisualWSD23 for code
View article: Data produced in the context of RCLN particpation to the Visual WSD task at SemEval 2023
Data produced in the context of RCLN particpation to the Visual WSD task at SemEval 2023 Open
The data contain: - generated captions from train, trial and test images - generated images from the diffusion model refer to https://github.com/dbuscaldi/VisualWSD23 for code
View article: RCLN at SemEval-2023 Task 1: Leveraging Stable Diffusion and Image Captions for Visual WSD
RCLN at SemEval-2023 Task 1: Leveraging Stable Diffusion and Image Captions for Visual WSD Open
This paper describes the participation of the RCLN team at the Visual Word Sense Disambiguation task at SemEval 2023. The participation was focused on the use of CLIP as a base model for the matching between text and images with additional…
View article: ArXiV-Entity/Relation annotated dataset
ArXiV-Entity/Relation annotated dataset Open
This dataset is a collection of abstracts from the CS section of ArXiV, each annotated with DyGIE++ (SciERC model) The dataset can be used to train triple extractors or to cluster triples (in the Computer Science and AI domains). Supersede…
View article: ArXiV-Entity/Relation annotated dataset
ArXiV-Entity/Relation annotated dataset Open
This dataset is a collection of abstracts from the CS section of ArXiV, each annotated with DyGIE++ (SciERC model) The dataset can be used to train triple extractors or to cluster triples (in the Computer Science and AI domains). Supersede…
View article: SciCheck
SciCheck Open
This archive contains AI-KG with additional 300K triples used in the paper "Completing Scientific Facts in Knowledge Graphs of Research Concepts", accepted in IEEE Access.
View article: Word Sense Induction with Hierarchical Clustering and Mutual Information Maximization
Word Sense Induction with Hierarchical Clustering and Mutual Information Maximization Open
Word sense induction (WSI) is a difficult problem in natural language processing that involves the unsupervised automatic detection of a word's senses (i.e. meanings). Recent work achieves significant results on the WSI task by pre-trainin…
View article: ArXiV-AIKG dataset
ArXiV-AIKG dataset Open
This dataset is a collection of abstracts from the CS section of ArXiV, each paired with triples from the Artificial Intelligence Knowledge Graph (AIKG) https://scholkg.kmi.open.ac.uk/ The pairing is determined by the fact that one or more…
View article: Editorial of the Special Issue on Deep Learning and Knowledge Graphs
Editorial of the Special Issue on Deep Learning and Knowledge Graphs Open
This special issue aims to reinforce the relationships between these communities and foster interdisciplinary research in the areas of KG, Deep Learning, and Natural Language Processing.The works that we have requested from authors should …
View article: A Benchmark Corpus for the Detection of Automatically Generated Text in Academic Publications
A Benchmark Corpus for the Detection of Automatically Generated Text in Academic Publications Open
Automatic text generation based on neural language models has achieved performance levels that make the generated text almost indistinguishable from those written by humans. Despite the value that text generation can have in various applic…
View article: Completing Scientific Facts in Knowledge Graphs of Research Concepts
Completing Scientific Facts in Knowledge Graphs of Research Concepts Open
In the last few years, we have witnessed the emergence of several knowledge graphs that explicitly describe research knowledge with the aim of enabling intelligent systems for supporting and accelerating the scientific process. These resou…
View article: SciCheck
SciCheck Open
This archive contains AI-KG with additional 300K triples used in the paper "Completing Scientific Facts in Knowledge Graphs of Research Concepts", accepted in IEEE Access.