Xiangliang Zhang
YOU?
Author Swipe
View article: Better Datasets Start From RefineLab: Automatic Optimization for High-Quality Dataset Refinement
Better Datasets Start From RefineLab: Automatic Optimization for High-Quality Dataset Refinement Open
High-quality Question-Answer (QA) datasets are foundational for reliable Large Language Model (LLM) evaluation, yet even expert-crafted datasets exhibit persistent gaps in domain coverage, misaligned difficulty distributions, and factual i…
View article: Jailbreaking LLMs Through Alignment Vulnerabilities in Out-of-Distribution Settings
Jailbreaking LLMs Through Alignment Vulnerabilities in Out-of-Distribution Settings Open
View article: Think it Image by Image: Multi-Image Moral Reasoning of Large Vision-Language Models
Think it Image by Image: Multi-Image Moral Reasoning of Large Vision-Language Models Open
View article: Proto-Yield: An Uncertainty-Aware Prototype Network for Yield Prediction in Real-world Chemical Reactions
Proto-Yield: An Uncertainty-Aware Prototype Network for Yield Prediction in Real-world Chemical Reactions Open
View article: Towards Few-shot Chemical Reaction Outcome Prediction
Towards Few-shot Chemical Reaction Outcome Prediction Open
View article: The Indian Ocean Dipole drives imported-dominated dengue outbreaks in China: Mechanisms and predictions
The Indian Ocean Dipole drives imported-dominated dengue outbreaks in China: Mechanisms and predictions Open
Dengue fever, influenced by climate dynamics and human mobility in nonendemic regions, remains poorly understood. We assessed the effects of large-scale climate features on domestic dengue outbreaks using data from China (2013–2021) and pr…
View article: The Role of Computing Resources in Publishing Foundation Model Research
The Role of Computing Resources in Publishing Foundation Model Research Open
Cutting-edge research in Artificial Intelligence (AI) requires considerable resources, including Graphics Processing Units (GPUs), data, and human resources. In this paper, we evaluate of the relationship between these resources and the sc…
View article: Research on Multi-Agent Competition Based on Large Language Models
Research on Multi-Agent Competition Based on Large Language Models Open
View article: MTSQL-R1: Towards Long-Horizon Multi-Turn Text-to-SQL via Agentic Training
MTSQL-R1: Towards Long-Horizon Multi-Turn Text-to-SQL via Agentic Training Open
Multi-turn Text-to-SQL aims to translate a user's conversational utterances into executable SQL while preserving dialogue coherence and grounding to the target schema. However, most existing systems only regard this task as a simple text t…
View article: SenWave: A Fine-Grained Multi-Language Sentiment Analysis Dataset Sourced from COVID-19 Tweets
SenWave: A Fine-Grained Multi-Language Sentiment Analysis Dataset Sourced from COVID-19 Tweets Open
The global impact of the COVID-19 pandemic has highlighted the need for a comprehensive understanding of public sentiment and reactions. Despite the availability of numerous public datasets on COVID-19, some reaching volumes of up to 100 b…
View article: Causally-Enhanced Reinforcement Policy Optimization
Causally-Enhanced Reinforcement Policy Optimization Open
Large language models (LLMs) trained with reinforcement objectives often achieve superficially correct answers via shortcut strategies, pairing correct outputs with spurious or unfaithful reasoning and degrading under small causal perturba…
View article: Autonomous Data Agents: A New Opportunity for Smart Data
Autonomous Data Agents: A New Opportunity for Smart Data Open
As data continues to grow in scale and complexity, preparing, transforming, and analyzing it remains labor-intensive, repetitive, and difficult to scale. Since data contains knowledge and AI learns knowledge from it, the alignment between …
View article: ChemOrch: Empowering LLMs with Chemical Intelligence via Synthetic Instructions
ChemOrch: Empowering LLMs with Chemical Intelligence via Synthetic Instructions Open
Empowering large language models (LLMs) with chemical intelligence remains a challenge due to the scarcity of high-quality, domain-specific instruction-response datasets and the misalignment of existing synthetic data generation pipelines …
View article: My Favorite Streamer is an LLM: Discovering, Bonding, and Co-Creating in AI VTuber Fandom
My Favorite Streamer is an LLM: Discovering, Bonding, and Co-Creating in AI VTuber Fandom Open
AI VTubers, where the performer is not human but algorithmically generated, introduce a new context for fandom. While human VTubers have been substantially studied for their cultural appeal, parasocial dynamics, and community economies, li…
View article: Machine learning for 2D material–based devices
Machine learning for 2D material–based devices Open
View article: AI4DE: The 1st International Workshop on AI for Data Editing
AI4DE: The 1st International Workshop on AI for Data Editing Open
View article: On The Design Choices of Next Level LLMs
On The Design Choices of Next Level LLMs Open
View article: Seeing the Invisible: Machine learning-Based QPI Kernel Extraction via Latent Alignment
Seeing the Invisible: Machine learning-Based QPI Kernel Extraction via Latent Alignment Open
Quasiparticle interference (QPI) imaging is a powerful tool for probing electronic structures in quantum materials, but extracting the single-scatterer QPI pattern (i.e., the kernel) from a multi-scatterer image remains a fundamentally ill…
View article: SocialMaze: A Benchmark for Evaluating Social Reasoning in Large Language Models
SocialMaze: A Benchmark for Evaluating Social Reasoning in Large Language Models Open
Large language models (LLMs) are increasingly applied to socially grounded tasks, such as online community moderation, media content analysis, and social reasoning games. Success in these contexts depends on a model's social reasoning abil…
View article: Cross-Lingual Pitfalls: Automatic Probing Cross-Lingual Weakness of Multilingual Large Language Models
Cross-Lingual Pitfalls: Automatic Probing Cross-Lingual Weakness of Multilingual Large Language Models Open
Large Language Models (LLMs) have achieved remarkable success in Natural Language Processing (NLP), yet their cross-lingual performance consistency remains a significant challenge. This paper introduces a novel methodology for efficiently …
View article: Invisible Entropy: Towards Safe and Efficient Low-Entropy LLM Watermarking
Invisible Entropy: Towards Safe and Efficient Low-Entropy LLM Watermarking Open
Logit-based LLM watermarking traces and verifies AI-generated content by maintaining green and red token lists and increasing the likelihood of green tokens during generation. However, it fails in low-entropy scenarios, where predictable o…
View article: A Personalized Conversational Benchmark: Towards Simulating Personalized Conversations
A Personalized Conversational Benchmark: Towards Simulating Personalized Conversations Open
We present PersonaConvBench, a large-scale benchmark for evaluating personalized reasoning and generation in multi-turn conversations with large language models (LLMs). Unlike existing work that focuses on either personalization or convers…
View article: New Paradigm for Evaluating Scholar Summaries: A Facet-aware Metric and A Meta-evaluation Benchmark.
New Paradigm for Evaluating Scholar Summaries: A Facet-aware Metric and A Meta-evaluation Benchmark. Open
Evaluation of summary quality is particularly crucial within the scientific domain, because it facilitates efficient knowledge dissemination and automated scientific information retrieval. This paper presents conceptual and experimental an…
View article: Evaluating and Mitigating Bias in AI-Based Medical Text Generation
Evaluating and Mitigating Bias in AI-Based Medical Text Generation Open
Artificial intelligence (AI) systems, particularly those based on deep learning models, have increasingly achieved expert-level performance in medical applications. However, there is growing concern that such AI systems may reflect and amp…
View article: Evaluating and mitigating bias in AI-based medical text generation
Evaluating and mitigating bias in AI-based medical text generation Open
Artificial intelligence (AI) systems, particularly those based on deep learning models, have increasingly achieved expert-level performance in medical applications. However, there is growing concern that such AI systems may reflect and amp…
View article: Unlocking the Potential of Black-box Pre-trained GNNs for Graph Few-shot Learning
Unlocking the Potential of Black-box Pre-trained GNNs for Graph Few-shot Learning Open
Few-shot learning has emerged as an important problem on graphs to combat label scarcity, which can be approached by current trends in pre-trained graph neural networks (GNNs) and meta-learning. Recent efforts integrate both paradigms in a…
View article: Special topic on cloud-edge collaboration for on-device recommendation
Special topic on cloud-edge collaboration for on-device recommendation Open
View article: On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective Open
Generative Foundation Models (GenFMs) have emerged as transformative tools. However, their widespread adoption raises critical concerns regarding trustworthiness across dimensions. This paper presents a comprehensive framework to address t…
View article: Artificial Intelligence in Spectroscopy: Advancing Chemistry from Prediction to Generation and Beyond
Artificial Intelligence in Spectroscopy: Advancing Chemistry from Prediction to Generation and Beyond Open
The rapid advent of machine learning (ML) and artificial intelligence (AI) has catalyzed major transformations in chemistry, yet the application of these methods to spectroscopic and spectrometric data, referred to as Spectroscopy Machine …
View article: Prioritization First, Principles Second: An Adaptive Interpretation of Helpful, Honest, and Harmless Principles
Prioritization First, Principles Second: An Adaptive Interpretation of Helpful, Honest, and Harmless Principles Open
The Helpful, Honest, and Harmless (HHH) principle is a foundational framework for aligning AI systems with human values. However, existing interpretations of the HHH principle often overlook contextual variability and conflicting requireme…