Explanipedia

Wildlife Action Recognition using Deep Learning Open

Weining Li, S Swetha, Mubarak Shah · 2025

StretchySnake: Flexible SSM Training Unlocks Action Recognition Across Spatio-Temporal Scales Open

Nyle Siddiqui, Rohit Gupta, S Swetha, Mubarak Shah · 2025

State space models (SSMs) have emerged as a competitive alternative to transformers in various tasks. Their linear complexity and hidden-state recurrence make them particularly attractive for modeling long sequences, whereas attention beco…

Exploring Multi-Agent Reinforcement Learning for Cell Mechanics Open

Muhammad Waris, A. Cutolo, Musarat Abbas, Mubarak Shah · 2025

Cross-View Open-Vocabulary Object Detection in Aerial Imagery Open

Jyoti Kini, Rohit Gupta, Mubarak Shah · 2025

Traditional object detection models are typically trained on a fixed set of classes, limiting their flexibility and making it costly to incorporate new categories. Open-vocabulary object detection addresses this limitation by enabling mode…

Agentic Large-Language-Model Systems in Medicine: A Systematic Review and Taxonomy Open

Abdul Mohaimen Al Radi, Xu Cao, Fanyang Yu, Yuyuan Liu, Fengkai Liu , et al. · 2025

Beyond Simple Edits: Composed Video Retrieval with Dense Modifications Open

Omkar Thawakar, Dmitry Demidov, Ritesh Thawkar, Rao Muhammad Anwer, Mubarak Shah , et al. · 2025

Composed video retrieval is a challenging task that strives to retrieve a target video based on a query video and a textual description detailing specific modifications. Standard retrieval frameworks typically struggle to handle the comple…

GT-Loc: Unifying When and Where in Images Through a Joint Embedding Space Open

David G. Shatwell, Ishan Rajendrakumar Dave, S Swetha, Mubarak Shah · 2025

Timestamp prediction aims to determine when an image was captured using only visual information, supporting applications such as metadata correction, retrieval, and digital forensics. In outdoor scenarios, hourly estimates rely on cues lik…

From Play to Replay: Composed Video Retrieval for Temporally Fine-Grained Videos Open

Animesh Gupta, Jasneet Parmar, Ishan R. Dave, Mubarak Shah · 2025

Composed Video Retrieval (CoVR) retrieves a target video given a query video and a modification text describing the intended change. Existing CoVR benchmarks emphasize appearance shifts or coarse event changes and therefore do not test the…

On Transfer-based Universal Attacks in Pure Black-box Setting Open

Mohammad A. A. K. Jalwana, Naveed Akhtar, Ajmal Mian, Nazanin Rahnavard, Mubarak Shah · 2025

Despite their impressive performance, deep visual models are susceptible to transferable black-box adversarial attacks. Principally, these attacks craft perturbations in a target model-agnostic manner. However, surprisingly, we find that e…

VLDBench Evaluating Multimodal Disinformation with Regulatory Alignment Open

Shaina Raza, Ashmal Vayani, Aditya Jain, A. G. Hari Narayanan, Vahid Reza Khazaie , et al. · 2025

Detecting disinformation that blends manipulated text and images has become increasingly challenging, as AI tools make synthetic content easy to generate and disseminate. While most existing AI safety benchmarks focus on single modality mi…

SB-Bench: Stereotype Bias Benchmark for Large Multimodal Models Open

Vishal Narnaware, Ashmal Vayani, Rohit Gupta, Swetha Sirnam, Mubarak Shah · 2025

Stereotype biases in Large Multimodal Models (LMMs) perpetuate harmful societal prejudices, undermining the fairness and equity of AI applications. As LMMs grow increasingly influential, addressing and mitigating inherent biases related to…

Large Language Models: A Comprehensive Survey of its Applications, Challenges, Limitations, and Future Prospects Open

Muhammad Usman Hadi, Qasem Al Tashi, Rizwan Qureshi, Abbas Shah, Amgad Muneer , et al. · 2025

A Note on Exact State Visit Probabilities in Two-State Markov Chains Open

Mubarak Shah · 2025

In this note we derive the exact probability that a specific state in a two-state Markov chain is visited exactly $k$ times after $N$ transitions. We provide a closed-form solution for $\mathbb{P}(N_l = k \mid N)$, considering initial stat…

ALBAR: Adversarial Learning approach to mitigate Biases in Action Recognition Open

Joseph Fioresi, Ishan Rajendrakumar Dave, Mubarak Shah · 2025

Bias in machine learning models can lead to unfair decision making, and while it has been well-studied in the image and text domains, it remains underexplored in action recognition. Action recognition models often suffer from background bi…

A guided approach for cross-view geolocalization estimation with land cover semantic segmentation Open

Nathan A. Z. Xavier, Elcio Hideiti Shiguemori, Marcos R. O. A. Máximo, Mubarak Shah · 2025

Geolocalization is a crucial process that leverages environmental information and contextual data to accurately identify a position. In particular, cross-view geolocalization utilizes images from various perspectives, such as satellite and…

Emotional intelligence and its impact on postgraduate students: A study at the University of Kashmir Open

Yasir Mursaleen Ayub, Hind Baba, Aneesa Bashee, Mubarak Shah · 2025

A Culturally-diverse Multilingual Multimodal Video Benchmark & Model Open

Bhuiyan Sanjid Shafique, Ashmal Vayani, Muhammad Maaz, Hanoona Rasheed, Dinura Dissanayake , et al. · 2025

LIAR: Leveraging Inference Time Alignment (Best-of-N) to Jailbreak LLMs in Seconds Open

James Beetham, Souradip Chakraborty, Mengdi Wang, Furong Huang, Amrit Singh Bedi , et al. · 2024

Traditional jailbreaks have successfully exposed vulnerabilities in LLMs, primarily relying on discrete combinatorial optimization, while more recent methods focus on training LLMs to generate adversarial prompts. However, both approaches …

CityGuessr: City-Level Video Geo-Localization on a Global Scale Open

Parth Parag Kulkarni, Guruprasad Nayak, Mubarak Shah · 2024

Video geolocalization is a crucial problem in current times. Given just a video, ascertaining where it was captured from can have a plethora of advantages. The problem of worldwide geolocalization has been tackled before, but only using th…

Investigating Memorization in Video Diffusion Models Open

Chen Chen, Erjia Liu, Daochang Liu, Mubarak Shah, Chang Xu · 2024

Diffusion models, widely used for image and video generation, face a significant limitation: the risk of memorizing and reproducing training data during inference, potentially generating unauthorized copyrighted content. While prior resear…

Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning Open

Weitai Kang, Haifeng Huang, Yongjia Shang, Mubarak Shah, Yan Yan · 2024

Recent advancements in 3D Large Language Models (3DLLMs) have highlighted their potential in building general-purpose agents in the 3D real world, yet challenges remain due to the lack of high-quality robust instruction-following data, lea…

Perceptions of Subject Specialists Regarding the Relationship between Principal Leadership Skills and School Effectiveness Open

Mubarak Shah, Muhammad Niqab, Mubarak Zaib Khan · 2024

The purpose of this current study is to analyze the relationship between leadership skills and school effectiveness. Employing a quantitative approach, a co-relational research design has been used for the instant study. The target populat…

Large Language Models: A Comprehensive Survey of its Applications, Challenges, Limitations, and Future Prospects Open

Muhammad Usman Hadi, Qasem Al Tashi, Rizwan Qureshi, Abbas Shah, Amgad Muneer , et al. · 2024

FinePseudo: Improving Pseudo-Labelling through Temporal-Alignablity for Semi-Supervised Fine-Grained Action Recognition Open

Ishan Rajendrakumar Dave, Mamshad Nayeem Rizve, Mubarak Shah · 2024

Real-life applications of action recognition often require a fine-grained understanding of subtle movements, e.g., in sports analytics, user interactions in AR/VR, and surgical videos. Although fine-grained actions are more costly to annot…

Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets Open

Ishan Rajendrakumar Dave, Fabian Caba Heilbron, Mubarak Shah, Simon Jenni · 2024

Temporal video alignment aims to synchronize the key events like object interactions or action phase transitions in two videos. Such methods could benefit various video editing, processing, and understanding tasks. However, existing approa…

Large Language Models: A Comprehensive Survey of its Applications, Challenges, Limitations, and Future Prospects Open

Muhammad Usman Hadi, Qasem Al Tashi, Abbas Shah, Rizwan Qureshi, Amgad Muneer , et al. · 2024

Within the vast expanse of computerized language processing, a revolutionary entity known as Large Language Models (LLMs) has emerged, wielding immense power in its capacity to comprehend intricate linguistic patterns and conjure coherent …

Large Language Models: A Comprehensive Survey of its Applications, Challenges, Limitations, and Future Prospects Open

Muhammad Usman Hadi, Qasem Al Tashi, Abbas Shah, Rizwan Qureshi, Amgad Muneer , et al. · 2024

GAReT: Cross-view Video Geolocalization with Adapters and Auto-Regressive Transformers Open

Manu S. Pillai, Mamshad Nayeem Rizve, Mubarak Shah · 2024

Cross-view video geo-localization (CVGL) aims to derive GPS trajectories from street-view videos by aligning them with aerial-view images. Despite their promising performance, current CVGL methods face significant challenges. These methods…

Unifying Video Self-Supervised Learning across Families of Tasks: A Survey Open

Ishan R. Dave, Malitha Gunawardhana, Limalka Sadith, Honglu Zhou, Liel David , et al. · 2024

Video self-supervised learning (VideoSSL) offers significant potential for reducing annotation costs and enhancing a wide range of downstream tasks in video understanding. The ultimate goal of VideoSSL is to achieve human-level video intel…

X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs Open

S Swetha, Jinyu Yang, Tal Neiman, Mamshad Nayeem Rizve, Tran The Son , et al. · 2024

Recent advancements in Multimodal Large Language Models (MLLMs) have revolutionized the field of vision-language understanding by integrating visual perception capabilities into Large Language Models (LLMs). The prevailing trend in this fi…

Mubarak Shah YOU? Author Swipe