Explanipedia

Agent Safety Alignment via Reinforcement Learning Open

Zeyang Sha, Hanling Tian, Zhuoer Xu, Shiwen Cui, Changhua Meng , et al. · 2025

The emergence of autonomous Large Language Model (LLM) agents capable of tool usage has introduced new safety risks that go beyond traditional conversational misuse. These agents, empowered to execute external functions, are vulnerable to …

A Survey of LLM-Driven AI Agent Communication: Protocols, Security Risks, and Defense Countermeasures Open

Dezhang Kong, Lin Shi, Zhenhua Xu, Z. D. Wang, Minghao Li , et al. · 2025

In recent years, Large-Language-Model-driven AI agents have exhibited unprecedented intelligence and adaptability. Nowadays, agents are undergoing a new round of evolution. They no longer act as an isolated island like LLMs. Instead, they …

Can VLMs Detect and Localize Fine-Grained AI-Edited Images? Open

Zhen Sun, Ziyi Zhang, Zeren Luo, Zeyang Sha, Tianshuo Cong , et al. · 2025

Fine-grained detection and localization of localized image edits is crucial for assessing content authenticity, especially as modern diffusion models and image editors can produce highly realistic manipulations. However, this problem faces…

ZeroFake: Zero-Shot Detection of Fake Images Generated and Edited by Text-to-Image Generation Models Open

Zeyang Sha, Yicong Tan, Mingjie Li, Michael Backes, Yang Zhang · 2024

The text-to-image generation model has attracted significant interest from both academic and industrial communities. These models can generate the images based on the given prompt descriptions. Their potent capabilities, while beneficial, …

Games and Beyond: Analyzing the Bullet Chats of Esports Livestreaming Open

Yukun Jiang, Xinyue Shen, Rui Wen, Zeyang Sha, Junjie Chu , et al. · 2024

Computer science Sociology

Esports, short for electronic sports, is a form of competition using video games and has attracted more than 530 million audiences worldwide. To watch esports, people utilize online livestreaming platforms. Recently, a novel interaction me…

Prompt Stealing Attacks Against Large Language Models Open

Zeyang Sha, Yang Zhang · 2024

Computer science

The increasing reliance on large language models (LLMs) such as ChatGPT in various fields emphasizes the importance of ``prompt engineering,'' a technology to improve the quality of model outputs. With companies investing significantly in …

Reconstruct Your Previous Conversations! Comprehensively Investigating Privacy Leakage Risks in Conversations with GPT Models Open

Junjie Chu, Zeyang Sha, Michael Backes, Yang Zhang · 2024

Computer science Psychology

Significant advancements have recently been made in large language models represented by GPT models. Users frequently have multi-round private conversations with cloud-hosted GPT models for task optimization. Yet, this operational paradigm…

Comprehensive Assessment of Toxicity in ChatGPT Open

Boyang Zhang, Xinyue Shen, Wai Man Si, Zeyang Sha, Zeyuan Chen , et al. · 2023

Computer science Engineering Philosophy

Moderating offensive, hateful, and toxic language has always been an important but challenging topic in the domain of safe use in NLP. The emerging large language models (LLMs), such as ChatGPT, can potentially further accentuate this thre…

Can't Steal? Cont-Steal! Contrastive Stealing Attacks Against Image Encoders Open

Zeyang Sha, Xinlei He, Ning Yu, Michael Backes, Yang Zhang · 2023

Computer science Political science

Self-supervised representation learning techniques have been developing rapidly to make full use of unlabeled images. They encode images into rich features that are oblivious to downstream tasks. Behind their revolutionary representation p…

From Visual Prompt Learning to Zero-Shot Transfer: Mapping Is All You Need Open

Ziqing Yang, Zeyang Sha, Michael Backes, Yang Zhang · 2023

Computer science Engineering Philosophy

Visual prompt learning, as a newly emerged technique, leverages the knowledge learned by a large-scale pre-trained model and adapts it to downstream tasks through the usage of prompts. While previous research has focused on designing effec…

Fine-Tuning Is All You Need to Mitigate Backdoor Attacks Open

Zeyang Sha, Xinlei He, Pascal Berrang, Mathias Humbert, Yang Zhang · 2022

Computer science Psychology

Backdoor attacks represent one of the major threats to machine learning models. Various efforts have been made to mitigate backdoors. However, existing defenses have become increasingly complex and often require high computational resource…

DE-FAKE: Detection and Attribution of Fake Images Generated by Text-to-Image Generation Models Open

Zeyang Sha, Zheng Li, Ning Yu, Yang Zhang · 2022

Computer science Psychology Geography

Text-to-image generation models that generate images based on prompt descriptions have attracted an increasing amount of attention during the past few months. Despite their encouraging performance, these models raise concerns about the mis…

Can't Steal? Cont-Steal! Contrastive Stealing Attacks Against Image Encoders Open

Zeyang Sha, Xinlei He, Ning Yu, Michael Backes, Yang Zhang · 2022

Computer science Political science

Self-supervised representation learning techniques have been developing rapidly to make full use of unlabeled images. They encode images into rich features that are oblivious to downstream tasks. Behind their revolutionary representation p…

Zeyang Sha YOU? Author Swipe