Prasenjit Mitra
YOU?
Author Swipe
View article: Beyond Borders: Exploring Data Embassies as a Strategy for Digital Sovereignty in Africa
Beyond Borders: Exploring Data Embassies as a Strategy for Digital Sovereignty in Africa Open
Data sovereignty in Africa is challenged by fragmented data protection laws, limited cross-border frameworks, and infrastructural deficits. While over 40 countries have enacted data protection laws, inconsistent implementation and over ten…
View article: Beyond Borders: Exploring Data Embassies as a Strategy for Digital Sovereignty in Africa
Beyond Borders: Exploring Data Embassies as a Strategy for Digital Sovereignty in Africa Open
Data sovereignty in Africa is challenged by fragmented data protection laws, limited cross-border frameworks, and infrastructural deficits. While over 40 countries have enacted data protection laws, inconsistent implementation and over ten…
View article: How to Backdoor the Knowledge Distillation
How to Backdoor the Knowledge Distillation Open
Knowledge distillation has become a cornerstone in modern machine learning systems, celebrated for its ability to transfer knowledge from a large, complex teacher model to a more efficient student model. Traditionally, this process is rega…
View article: When Reasoning Meets Compression: Understanding the Effects of LLMs Compression on Large Reasoning Models
When Reasoning Meets Compression: Understanding the Effects of LLMs Compression on Large Reasoning Models Open
Compression methods, including quantization, distillation, and pruning, improve the computational efficiency of large reasoning models (LRMs). However, existing studies either fail to sufficiently compare all three compression methods on L…
View article: Graph-based Molecular In-context Learning Grounded on Morgan Fingerprints
Graph-based Molecular In-context Learning Grounded on Morgan Fingerprints Open
In-context learning (ICL) effectively conditions large language models (LLMs) for molecular tasks, such as property prediction and molecule captioning, by embedding carefully selected demonstration examples into the input prompt. This appr…
View article: Semantic Captioning: Benchmark Dataset and Graph-Aware Few-Shot In-Context Learning for SQL2Text
Semantic Captioning: Benchmark Dataset and Graph-Aware Few-Shot In-Context Learning for SQL2Text Open
Large Language Models (LLMs) have demonstrated remarkable performance in various NLP tasks, including semantic parsing, which translates natural language into formal code representations. However, the reverse process, translating code into…
View article: SiReRAG: Indexing Similar and Related Information for Multihop Reasoning
SiReRAG: Indexing Similar and Related Information for Multihop Reasoning Open
Indexing is an important step towards strong performance in retrieval-augmented generation (RAG) systems. However, existing methods organize data based on either semantic similarity (similarity) or related information (relatedness), but do…
View article: SiReRAG: Indexing Similar and Related Information for Multihop Reasoning
SiReRAG: Indexing Similar and Related Information for Multihop Reasoning Open
Indexing is an important step towards strong performance in retrieval-augmented generation (RAG) systems. However, existing methods organize data based on either semantic similarity (similarity) or related information (relatedness), but do…
View article: Transfer Learning and Double U-Net Empowered Wave Propagation Model in Complex Indoor Environment
Transfer Learning and Double U-Net Empowered Wave Propagation Model in Complex Indoor Environment Open
A Machine Learning (ML) network based on transfer learning and transformer networks is applied to wave propagation models for complex indoor settings. This network is designed to predict signal propagation in environments with a variety of…
View article: Clock against Chaos: Dynamic Assessment and Temporal Intervention in Reducing Misinformation Propagation
Clock against Chaos: Dynamic Assessment and Temporal Intervention in Reducing Misinformation Propagation Open
As social networks become the primary sources of information, the rise of misinformation poses a significant threat to the information ecosystem. Here, we address this challenge by proposing a dynamic system for real-time evaluation and as…
View article: Towards Precision Healthcare: Robust Fusion of Time Series and Image Data
Towards Precision Healthcare: Robust Fusion of Time Series and Image Data Open
With the increasing availability of diverse data types, particularly images and time series data from medical experiments, there is a growing demand for techniques designed to combine various modalities of data effectively. Our motivation …
View article: Noninvasive Risk Prediction Models for Heart Failure Using Proportional Jaccard Indices and Comorbidity Patterns
Noninvasive Risk Prediction Models for Heart Failure Using Proportional Jaccard Indices and Comorbidity Patterns Open
Background: In the post-coronavirus disease 2019 (COVID-19) era, remote diagnosis and precision preventive medicine have emerged as pivotal clinical medicine applications. This study aims to develop a digital health-monitoring tool that ut…
View article: Uncovering Human Traits in Determining Real and Spoofed Audio: Insights from Blind and Sighted Individuals
Uncovering Human Traits in Determining Real and Spoofed Audio: Insights from Blind and Sighted Individuals Open
This paper explores how blind and sighted individuals perceive real and spoofed audio, highlighting differences and similarities between the groups. Through two studies, we find that both groups focus on specific human traits in audio–such…
View article: ClassInSight: Designing Conversation Support Tools to Visualize Classroom Discussion for Personalized Teacher Professional Development
ClassInSight: Designing Conversation Support Tools to Visualize Classroom Discussion for Personalized Teacher Professional Development Open
Teaching is one of many professions for which personalized feedback and reflection can help improve dialogue and discussion between the professional and those they serve. However, professional development (PD) is often impersonal as human …
View article: Pruning as a Domain-specific LLM Extractor
Pruning as a Domain-specific LLM Extractor Open
Large Language Models (LLMs) have exhibited remarkable proficiency across a wide array of NLP tasks. However, the escalation in model size also engenders substantial deployment costs. While few efforts have explored model pruning technique…
View article: WildGraph: Realistic Graph-based Trajectory Generation for Wildlife
WildGraph: Realistic Graph-based Trajectory Generation for Wildlife Open
Trajectory generation is an important task in movement studies; it circumvents the privacy, ethical, and technical challenges of collecting real trajectories from the target population. In particular, real trajectories in the wildlife doma…
View article: Data Disparity and Temporal Unavailability Aware Asynchronous Federated Learning for Predictive Maintenance on Transportation Fleets
Data Disparity and Temporal Unavailability Aware Asynchronous Federated Learning for Predictive Maintenance on Transportation Fleets Open
Predictive maintenance has emerged as a critical application in modern transportation, leveraging sensor data to forecast potential damages proactively using machine learning. However, privacy concerns limit data sharing, making Federated …
View article: PEaCE: A Chemistry-Oriented Dataset for Optical Character Recognition on Scientific Documents
PEaCE: A Chemistry-Oriented Dataset for Optical Character Recognition on Scientific Documents Open
Optical Character Recognition (OCR) is an established task with the objective of identifying the text present in an image. While many off-the-shelf OCR models exist, they are often trained for either scientific (e.g., formulae) or generic …
View article: Applications of Representation Learning Methods in Professional Baseball
Applications of Representation Learning Methods in Professional Baseball Open
In the realm of sports analytics, players, teams, and managers have primarily been evaluated through a set of summary counting statistics. Typically, these statistics describe the number of times various events occurred on the field of pla…
View article: Automated Multi-Task Learning for Joint Disease Prediction on Electronic Health Records
Automated Multi-Task Learning for Joint Disease Prediction on Electronic Health Records Open
In the realm of big data and digital healthcare, Electronic Health Records (EHR) have become a rich source of information with the potential to improve patient care and medical research. In recent years, machine learning models have prolif…
View article: ClassInSight: Designing Conversation Support Tools to Visualize Classroom Discussion for Personalized Teacher Professional Development
ClassInSight: Designing Conversation Support Tools to Visualize Classroom Discussion for Personalized Teacher Professional Development Open
Teaching is one of many professions for which personalized feedback and reflection can help improve dialogue and discussion between the professional and those they serve. However, professional development (PD) is often impersonal as human …
View article: Milestones in Bengali Sentiment Analysis leveraging Transformer-models: Fundamentals, Challenges and Future Directions
Milestones in Bengali Sentiment Analysis leveraging Transformer-models: Fundamentals, Challenges and Future Directions Open
Sentiment Analysis (SA) refers to the task of associating a view polarity (usually, positive, negative, or neutral; or even fine-grained such as slightly angry, sad, etc.) to a given text, essentially breaking it down to a supervised (sinc…
View article: Towards Efficient Methods in Medical Question Answering using Knowledge Graph Embeddings
Towards Efficient Methods in Medical Question Answering using Knowledge Graph Embeddings Open
In Natural Language Processing (NLP), Machine Reading Comprehension (MRC) is the task of answering a question based on a given context. To handle questions in the medical domain, modern language models such as BioBERT, SciBERT and even Cha…
View article: Embedding and Clustering Multi-Entity Sequences
Embedding and Clustering Multi-Entity Sequences Open
Core to much of modern deep learning is the notion of representation learning, learning representations of things that are useful for performing some task(s) related to those things. Encoder-only language models, for example, learn represe…
View article: Robust Fusion of Time Series and Image Data for Improved Multimodal Clinical Prediction
Robust Fusion of Time Series and Image Data for Improved Multimodal Clinical Prediction Open
With the increasing availability of diverse data types, particularly images and time series data from medical experiments, there is a growing demand for techniques designed to combine various modalities of data effectively. Our motivation …
View article: WildGEN: Long-horizon Trajectory Generation for Wildlife
WildGEN: Long-horizon Trajectory Generation for Wildlife Open
Trajectory generation is an important concern in pedestrian, vehicle, and wildlife movement studies. Generated trajectories help enrich the training corpus in relation to deep learning applications, and may be used to facilitate simulation…
View article: Tweeted Fact vs Fiction: Identifying Vaccine Misinformation and Analyzing Dissent
Tweeted Fact vs Fiction: Identifying Vaccine Misinformation and Analyzing Dissent Open
In this paper, we develop an end-to-end knowledge extraction and management framework for COVID-19 vaccination misinformation. This framework automatically extracts information consistent and inconsistent with scientific evidence regarding…