Aidong Zhang
YOU?
Author Swipe
View article: Atacformer: A transformer-based foundation model for analysis and interpretation of ATAC-seq data
Atacformer: A transformer-based foundation model for analysis and interpretation of ATAC-seq data Open
Introduction Chromatin accessibility profiling is an important tool for understanding gene regulation and cellular function. While public repositories house nearly 10,000 scATAC-seq experiments, unifying this data for meaningful analysis r…
View article: ConceptDrift: leveraging spatial, temporal and semantic evolution of biomedical concepts for hypothesis generation
ConceptDrift: leveraging spatial, temporal and semantic evolution of biomedical concepts for hypothesis generation Open
Motivation Hypothesis generation is a fundamental problem in biomedical text mining that aims to generate ideas that are new, interesting, and plausible by discovering unexplored links between biomedical concepts. Despite significant advan…
View article: KDD 2025 Panel on AI for Science
KDD 2025 Panel on AI for Science Open
View article: AI and Science Day
AI and Science Day Open
View article: IdeaBench: Benchmarking Large Language Models for Research Idea Generation
IdeaBench: Benchmarking Large Language Models for Research Idea Generation Open
View article: Improving Group Robustness on Spurious Correlation via Evidential Alignment
Improving Group Robustness on Spurious Correlation via Evidential Alignment Open
Deep neural networks often learn and rely on spurious correlations, i.e., superficial associations between non-causal features and the targets. For instance, an image classifier may identify camels based on the desert backgrounds. While it…
View article: NeuronTune: Towards Self-Guided Spurious Bias Mitigation
NeuronTune: Towards Self-Guided Spurious Bias Mitigation Open
Deep neural networks often develop spurious bias, reliance on correlations between non-essential features and classes for predictions. For example, a model may identify objects based on frequently co-occurring backgrounds rather than intri…
View article: ShortcutProbe: Probing Prediction Shortcuts for Learning Robust Models
ShortcutProbe: Probing Prediction Shortcuts for Learning Robust Models Open
Deep learning models often achieve high performance by inadvertently learning spurious correlations between targets and non-essential features. For example, an image classifier may identify an object via its background that spuriously corr…
View article: Client-Centric Federated Adaptive Optimization
Client-Centric Federated Adaptive Optimization Open
Federated Learning (FL) is a distributed learning paradigm where clients collaboratively train a model while keeping their own data private. With an increasing scale of clients and models, FL encounters two key challenges, client drift due…
View article: ASCENT-ViT: Attention-based Scale-aware Concept Learning Framework for Enhanced Alignment in Vision Transformers
ASCENT-ViT: Attention-based Scale-aware Concept Learning Framework for Enhanced Alignment in Vision Transformers Open
As Vision Transformers (ViTs) are increasingly adopted in sensitive vision applications, there is a growing demand for improved interpretability. This has led to efforts to forward-align these models with carefully annotated abstract, huma…
View article: Determining the Importance of Clinical Modalities for NeuroDegenerative Disorders and Risk of Patient Injury Using Machine Learning and Survival Analysis.
Determining the Importance of Clinical Modalities for NeuroDegenerative Disorders and Risk of Patient Injury Using Machine Learning and Survival Analysis. Open
Falls among the elderly and especially those with NeuroDegenerative Disorders (NDD) reduces life expectancy. The purpose of this study is to explore the role of Machine Learning on Electronic Health Records (EHR) data for time-to-event sur…
View article: COCO-Tree: Compositional Hierarchical Concept Trees for Enhanced Reasoning in Vision-Language Models
COCO-Tree: Compositional Hierarchical Concept Trees for Enhanced Reasoning in Vision-Language Models Open
View article: InfAL: Inference Time Adversarial Learning for Improving Research Ideation
InfAL: Inference Time Adversarial Learning for Improving Research Ideation Open
View article: Embracing Foundation Models for Advancing Scientific Discovery
Embracing Foundation Models for Advancing Scientific Discovery Open
Machine learning foundation models, particularly large language models (LLMs) such as GPT-4o, have revolutionized traditional applications in computer vision and natural language processing, marking a significant shift in recent years. Bui…
View article: Uncovering Important Diagnostic Features for Alzheimer’s, Parkinson’s and Other Dementias Using Interpretable Association Mining Methods
Uncovering Important Diagnostic Features for Alzheimer’s, Parkinson’s and Other Dementias Using Interpretable Association Mining Methods Open
Alzheimer's Disease and Related Dementias (ADRD) afflict almost 7 million people in the USA alone. The majority of research in ADRD is conducted using post-mortem samples of brain tissue or carefully recruited clinical trial patients. Whil…
View article: Ensuring Safety and Trust: Analyzing the Risks of Large Language Models in Medicine
Ensuring Safety and Trust: Analyzing the Risks of Large Language Models in Medicine Open
The remarkable capabilities of Large Language Models (LLMs) make them increasingly compelling for adoption in real-world healthcare applications. However, the risks associated with using LLMs in medical applications have not been systemati…
View article: Improving Scientific Hypothesis Generation with Knowledge Grounded Large Language Models
Improving Scientific Hypothesis Generation with Knowledge Grounded Large Language Models Open
Large language models (LLMs) have demonstrated remarkable capabilities in various scientific domains, from natural language processing to complex problem-solving tasks. Their ability to understand and generate human-like text has opened up…
View article: IdeaBench: Benchmarking Large Language Models for Research Idea Generation
IdeaBench: Benchmarking Large Language Models for Research Idea Generation Open
Large Language Models (LLMs) have transformed how people interact with artificial intelligence (AI) systems, achieving state-of-the-art results in various tasks, including scientific discovery and hypothesis generation. However, the lack o…
View article: Demystifying Large Language Models for Medicine: A Primer
Demystifying Large Language Models for Medicine: A Primer Open
Large language models (LLMs) represent a transformative class of AI tools capable of revolutionizing various aspects of healthcare by generating human-like responses across diverse contexts and adapting to novel tasks following human instr…
View article: Structural Causality-based Generalizable Concept Discovery Models
Structural Causality-based Generalizable Concept Discovery Models Open
The rising need for explainable deep neural network architectures has utilized semantic concepts as explainable units. Several approaches utilizing disentangled representation learning estimate the generative factors and utilize them as co…
View article: ProtoNAM: Prototypical Neural Additive Models for Interpretable Deep Tabular Learning
ProtoNAM: Prototypical Neural Additive Models for Interpretable Deep Tabular Learning Open
Generalized additive models (GAMs) have long been a powerful white-box tool for the intelligible analysis of tabular data, revealing the influence of each feature on the model predictions. Despite the success of neural networks (NNs) in va…
View article: BEDMS: A metadata standardizer for genomic region attributes
BEDMS: A metadata standardizer for genomic region attributes Open
High-throughput sequencing technologies have generated vast omics data annotating genomic regions. A challenge arises in integrating this data because the associated metadata does not follow a uniform schema. This hinders data management, …
View article: Benchmarking Spurious Bias in Few-Shot Image Classifiers
Benchmarking Spurious Bias in Few-Shot Image Classifiers Open
Few-shot image classifiers are designed to recognize and classify new data with minimal supervision and limited data but often show reliance on spurious correlations between classes and spurious attributes, known as spurious bias. Spurious…
View article: CoLiDR: <u>Co</u> ncept <u>L</u> earn <u>i</u> ng using Aggregated <u>D</u> isentangled <u>R</u> epresentations
CoLiDR: Co ncept L earn i ng using Aggregated D isentangled R epresentations Open
Interpretability of Deep Neural Networks using concept-based models offers a promising way to explain model behavior through human understandable concepts. A parallel line of research focuses on disentangling the data distribution into its…
View article: Spuriousness-Aware Meta-Learning for Learning Robust Classifiers
Spuriousness-Aware Meta-Learning for Learning Robust Classifiers Open
Spurious correlations are brittle associations between certain attributes of inputs and target variables, such as the correlation between an image background and an object class. Deep image classifiers often leverage them for predictions, …
View article: MAML-en-LLM: Model Agnostic Meta-Training of LLMs for Improved In-Context Learning
MAML-en-LLM: Model Agnostic Meta-Training of LLMs for Improved In-Context Learning Open
Adapting large language models (LLMs) to unseen tasks with incontext training samples without fine-tuning remains an important research problem. To learn a robust LLM that adapts well to unseen tasks, multiple meta-training approaches have…
View article: WRKY transcription factor 40 from eggplant (Solanum melongena L.) regulates ABA and salt stress responses
WRKY transcription factor 40 from eggplant (Solanum melongena L.) regulates ABA and salt stress responses Open
View article: Methods for constructing and evaluating consensus genomic interval sets
Methods for constructing and evaluating consensus genomic interval sets Open
The amount of genomic region data continues to increase. Integrating across diverse genomic region sets requires consensus regions, which enable comparing regions across experiments, but also by necessity lose precision in region definitio…
View article: CoLiDR: Concept Learning using Aggregated Disentangled Representations
CoLiDR: Concept Learning using Aggregated Disentangled Representations Open
Interpretability of Deep Neural Networks using concept-based models offers a promising way to explain model behavior through human-understandable concepts. A parallel line of research focuses on disentangling the data distribution into its…
View article: Fast clustering and cell-type annotation of scATAC data using pre-trained embeddings
Fast clustering and cell-type annotation of scATAC data using pre-trained embeddings Open
Data from the single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) are now widely available. One major computational challenge is dealing with high dimensionality and inherent sparsity, which is typically ad…