Bengali
View article: Bangla Sign Language Translation: Dataset Creation Challenges, Benchmarking and Prospects
Bangla Sign Language Translation: Dataset Creation Challenges, Benchmarking and Prospects Open
Bangla Sign Language Translation (BdSLT) has been severely constrained so far as the language itself is very low resource. Standard sentence level dataset creation for BdSLT is of immense importance for developing AI based assistive tools …
View article: Bangladesh Labor Act, 2006 Bangla Dataset For NLP
Bangladesh Labor Act, 2006 Bangla Dataset For NLP Open
This dataset consists of a collection of question-and-answer pairs based on the Bangladesh Labor Act, 2006, designed for Bangla Natural Language Processing (NLP) tasks. It serves as a resource for training and evaluating machine learning m…
View article: BanglaASTE: A Novel Framework for Aspect-Sentiment-Opinion Extraction in Bangla E-commerce Reviews Using Ensemble Deep Learning
BanglaASTE: A Novel Framework for Aspect-Sentiment-Opinion Extraction in Bangla E-commerce Reviews Using Ensemble Deep Learning Open
Aspect-Based Sentiment Analysis (ABSA) has emerged as a critical tool for extracting fine-grained sentiment insights from user-generated content, particularly in e-commerce and social media domains. However, research on Bangla ABSA remains…
View article: BanglaMM-Disaster: A Multimodal Transformer-Based Deep Learning Framework for Multiclass Disaster Classification in Bangla
BanglaMM-Disaster: A Multimodal Transformer-Based Deep Learning Framework for Multiclass Disaster Classification in Bangla Open
Natural disasters remain a major challenge for Bangladesh, so real-time monitoring and quick response systems are essential. In this study, we present BanglaMM-Disaster, an end-to-end deep learning-based multimodal framework for disaster c…
View article: BengaliFig: A Low-Resource Challenge for Figurative and Culturally Grounded Reasoning in Bengali
BengaliFig: A Low-Resource Challenge for Figurative and Culturally Grounded Reasoning in Bengali Open
Large language models excel on broad multilingual benchmarks but remain to be evaluated extensively in figurative and culturally grounded reasoning, especially in low-resource contexts. We present BengaliFig, a compact yet richly annotated…
View article: BengaliFig: A Low-Resource Challenge for Figurative and Culturally Grounded Reasoning in Bengali
BengaliFig: A Low-Resource Challenge for Figurative and Culturally Grounded Reasoning in Bengali Open
Large language models excel on broad multilingual benchmarks but remain to be evaluated extensively in figurative and culturally grounded reasoning, especially in low-resource contexts. We present BengaliFig, a compact yet richly annotated…
View article: A Task-Oriented Evaluation Framework for Text Normalization in Modern NLP Pipelines
A Task-Oriented Evaluation Framework for Text Normalization in Modern NLP Pipelines Open
Text normalization is an essential preprocessing step in many natural language processing (NLP) tasks, and stemming is one such normalization technique that reduces words to their base or root form. However, evaluating stemming methods is …
View article: Pre-registered confirmatory test of the ψ-GIE Universal Law of Structural Evolution across biological, linguistic, cultural, and artificial systems
Pre-registered confirmatory test of the ψ-GIE Universal Law of Structural Evolution across biological, linguistic, cultural, and artificial systems Open
This preregistration is frozen. No hypotheses, metrics, datasets, thresholds, or analysis steps will be modified after publication of this record. 1. Hypotheses (Locked) H1 Decrystallization regime (systems under increased subsidy) Primary…
View article: "When Data is Scarce, Prompt Smarter"... Approaches to Grammatical Error Correction in Low-Resource Settings
"When Data is Scarce, Prompt Smarter"... Approaches to Grammatical Error Correction in Low-Resource Settings Open
Grammatical error correction (GEC) is an important task in Natural Language Processing that aims to automatically detect and correct grammatical mistakes in text. While recent advances in transformer-based models and large annotated datase…
View article: Adaptation and Validation of the Bullying and Cyberbullying Scale for Adolescents in Bangla
Adaptation and Validation of the Bullying and Cyberbullying Scale for Adolescents in Bangla Open
Introduction Adolescent bullying is a pressing global public health and educational issue, including in Bangladesh. The lack of valid, reliable assessment tools in clinical and school settings impedes the identification of victims and indi…
View article: A computer graphics-based model to generate dynamic 3D animations for corresponding Bangla sign language gestures using HamNoSys to SiGML conversion
A computer graphics-based model to generate dynamic 3D animations for corresponding Bangla sign language gestures using HamNoSys to SiGML conversion Open
View article: A Task-Oriented Evaluation Framework for Text Normalization in Modern NLP Pipelines
A Task-Oriented Evaluation Framework for Text Normalization in Modern NLP Pipelines Open
Text normalization is an essential preprocessing step in many natural language processing (NLP) tasks, and stemming is one such normalization technique that reduces words to their base or root form. However, evaluating stemming methods is …
View article: Pre-registered confirmatory test of the ψ-GIE Universal Law of Structural Evolution across biological, linguistic, cultural, and artificial systems
Pre-registered confirmatory test of the ψ-GIE Universal Law of Structural Evolution across biological, linguistic, cultural, and artificial systems Open
This preregistration is frozen. No hypotheses, metrics, datasets, thresholds, or analysis steps will be modified after publication of this record. 1. Hypotheses (Locked) H1 Decrystallization regime (systems under increased subsidy) Primary…
View article: A speech dataset of three ethnic languages of Bangladesh: Chakma, Marma and Garo.
A speech dataset of three ethnic languages of Bangladesh: Chakma, Marma and Garo. Open
This dataset provides a curated collection of speech recordings from three ethnic languages of Bangladesh: Chakma, Marma, and Garo. It contains a total of 2321 WAV audio recordings, each ranging from 1 to 7 seconds in duration. All recordi…
View article: Pre-registered confirmatory test of the ψ-GIE Universal Law of Structural Evolution across biological, linguistic, cultural, and artificial systems
Pre-registered confirmatory test of the ψ-GIE Universal Law of Structural Evolution across biological, linguistic, cultural, and artificial systems Open
This preregistration is frozen. No hypotheses, metrics, datasets, thresholds, or analysis steps will be modified after publication of this record. 1. Hypotheses (Locked) H1 Decrystallization regime (systems under increased subsidy) Primary…
View article: MultiBanAbs: A Comprehensive Multi-Domain Bangla Abstractive Text Summarization Dataset
MultiBanAbs: A Comprehensive Multi-Domain Bangla Abstractive Text Summarization Dataset Open
This study developed a new Bangla abstractive summarization dataset to generate concise summaries of Bangla articles from diverse sources. Most existing studies in this field have concentrated on news articles, where journalists usually fo…
View article: Bangla Sign Language Word Dataset Image based Sign Recognition
Bangla Sign Language Word Dataset Image based Sign Recognition Open
This data set was developed to facilitate the study of automatic identification of the Bangali Sign Language (BdSL) in response to the lack of publicly accessible materials for the Bangladeshi Deaf and hard-of-hearing community. It is a co…
View article: A speech dataset of three ethnic languages of Bangladesh: Chakma, Marma and Garo.
A speech dataset of three ethnic languages of Bangladesh: Chakma, Marma and Garo. Open
This dataset provides a curated collection of speech recordings from three ethnic languages of Bangladesh: Chakma, Marma, and Garo. It contains a total of 2321 WAV audio recordings, each ranging from 1 to 7 seconds in duration. All recordi…
View article: MultiBanAbs: A Comprehensive Multi-Domain Bangla Abstractive Text Summarization Dataset
MultiBanAbs: A Comprehensive Multi-Domain Bangla Abstractive Text Summarization Dataset Open
This study developed a new Bangla abstractive summarization dataset to generate concise summaries of Bangla articles from diverse sources. Most existing studies in this field have concentrated on news articles, where journalists usually fo…
View article: Bangla Sign Language Word Dataset Image based Sign Recognition
Bangla Sign Language Word Dataset Image based Sign Recognition Open
This data set was developed to facilitate the study of automatic identification of the Bangali Sign Language (BdSL) in response to the lack of publicly accessible materials for the Bangladeshi Deaf and hard-of-hearing community. It is a co…
View article: A Unified BERT-CNN-BiLSTM Framework for Simultaneous Headline Classification and Sentiment Analysis of Bangla News
A Unified BERT-CNN-BiLSTM Framework for Simultaneous Headline Classification and Sentiment Analysis of Bangla News Open
In our daily lives, newspapers are an essential information source that impacts how the public talks about present-day issues. However, effectively navigating the vast amount of news content from different newspapers and online news portal…
View article: Gradient Masters at BLP-2025 Task 1: Advancing Low-Resource NLP for Bengali using Ensemble-Based Adversarial Training for Hate Speech Detection
Gradient Masters at BLP-2025 Task 1: Advancing Low-Resource NLP for Bengali using Ensemble-Based Adversarial Training for Hate Speech Detection Open
This paper introduces the approach of "Gradient Masters" for BLP-2025 Task 1: "Bangla Multitask Hate Speech Identification Shared Task". We present an ensemble-based fine-tuning strategy for addressing subtasks 1A (hate-type classification…
View article: Gradient Masters at BLP-2025 Task 1: Advancing Low-Resource NLP for Bengali using Ensemble-Based Adversarial Training for Hate Speech Detection
Gradient Masters at BLP-2025 Task 1: Advancing Low-Resource NLP for Bengali using Ensemble-Based Adversarial Training for Hate Speech Detection Open
This paper introduces the approach of "Gradient Masters" for BLP-2025 Task 1: "Bangla Multitask Hate Speech Identification Shared Task". We present an ensemble-based fine-tuning strategy for addressing subtasks 1A (hate-type classification…
View article: A Unified BERT-CNN-BiLSTM Framework for Simultaneous Headline Classification and Sentiment Analysis of Bangla News
A Unified BERT-CNN-BiLSTM Framework for Simultaneous Headline Classification and Sentiment Analysis of Bangla News Open
In our daily lives, newspapers are an essential information source that impacts how the public talks about present-day issues. However, effectively navigating the vast amount of news content from different newspapers and online news portal…
View article: IsharaKotha: A Comprehensive Avatar-based Bangla Sign Language Corpus
IsharaKotha: A Comprehensive Avatar-based Bangla Sign Language Corpus Open
Sign language is a vital communication medium for the hearing-impaired community, enabling effective interaction and self-expression. To help bridge the communication gap between hearing and hearing-impaired individuals, a text-to-sign tra…
View article: IsharaKotha: A Comprehensive Avatar-based Bangla Sign Language Corpus
IsharaKotha: A Comprehensive Avatar-based Bangla Sign Language Corpus Open
Sign language is a vital communication medium for the hearing-impaired community, enabling effective interaction and self-expression. To help bridge the communication gap between hearing and hearing-impaired individuals, a text-to-sign tra…
View article: NALA_MAINZ at BLP-2025 Task 2: A Multi-agent Approach for Bangla Instruction to Python Code Generation
NALA_MAINZ at BLP-2025 Task 2: A Multi-agent Approach for Bangla Instruction to Python Code Generation Open
This paper presents JGU Mainz's winning system for the BLP-2025 Shared Task on Code Generation from Bangla Instructions. We propose a multi-agent-based pipeline. First, a code-generation agent produces an initial solution from the input in…
View article: NALA_MAINZ at BLP-2025 Task 2: A Multi-agent Approach for Bangla Instruction to Python Code Generation
NALA_MAINZ at BLP-2025 Task 2: A Multi-agent Approach for Bangla Instruction to Python Code Generation Open
This paper presents JGU Mainz's winning system for the BLP-2025 Shared Task on Code Generation from Bangla Instructions. We propose a multi-agent-based pipeline. First, a code-generation agent produces an initial solution from the input in…
View article: Distinguishing Repetition Disfluency from Morphological Reduplication in Bangla ASR Transcripts: A Novel Corpus and Benchmarking Analysis
Distinguishing Repetition Disfluency from Morphological Reduplication in Bangla ASR Transcripts: A Novel Corpus and Benchmarking Analysis Open
Automatic Speech Recognition (ASR) transcripts, especially in low-resource languages like Bangla, contain a critical ambiguity: word-word repetitions can be either Repetition Disfluency (unintentional ASR error/hesitation) or Morphological…
View article: Distinguishing Repetition Disfluency from Morphological Reduplication in Bangla ASR Transcripts: A Novel Corpus and Benchmarking Analysis
Distinguishing Repetition Disfluency from Morphological Reduplication in Bangla ASR Transcripts: A Novel Corpus and Benchmarking Analysis Open
Automatic Speech Recognition (ASR) transcripts, especially in low-resource languages like Bangla, contain a critical ambiguity: word-word repetitions can be either Repetition Disfluency (unintentional ASR error/hesitation) or Morphological…