Explanipedia

OS-Kairos: Adaptive Interaction for MLLM-Powered GUI Agents Open

Pengzhou Cheng, Zheng Wu, Zongru Wu, Aston Zhang, Zhuosheng Zhang , et al. · 2025

Autonomous graphical user interface (GUI) agents powered by multimodal large language models have shown great promise. However, a critical yet underexplored issue persists: over-execution, where the agent executes tasks in a fully autonomo…

A Systematic Examination of Preference Learning through the Lens of Instruction-Following Open

Joongwon Kim, Anirudh Goyal, Aston Zhang, Bo Xiong, Rui Hou , et al. · 2025

OS-Kairos: Adaptive Interaction for MLLM-Powered GUI Agents Open

Pengzhou Cheng, Zheng Wu, Zongru Wu, Tianjie Ju, Aston Zhang , et al. · 2025

Caution for the Environment: Multimodal LLM Agents are Susceptible to Environmental Distractions Open

Xinbei Ma, Yiting Wang, Yao Yao, Tongxin Yuan, Aston Zhang , et al. · 2025

A Systematic Examination of Preference Learning through the Lens of Instruction-Following Open

Joongwon Kim, Anirudh Goyal, Aston Zhang, Bo Xiong, Rui Hou , et al. · 2024

Preference learning is a widely adopted post-training technique that aligns large language models (LLMs) to human preferences and improves specific downstream task capabilities. In this work we systematically investigate how specific attri…

Law of the Weakest Link: Cross Capabilities of Large Language Models Open

Ming Zhong, Aston Zhang, Xuewei Wang, Rui Li, Wenhan Xiong , et al. · 2024

The development and evaluation of Large Language Models (LLMs) have largely focused on individual capabilities. However, this overlooks the intersection of multiple abilities across different types of expertise that are often required for …

Caution for the Environment: Multimodal LLM Agents are Susceptible to Environmental Distractions Open

Xinbei Ma, Yiting Wang, Yao Yao, Tongxin Yuan, Aston Zhang , et al. · 2024

This paper investigates the faithfulness of multimodal large language model (MLLM) agents in a graphical user interface (GUI) environment, aiming to address the research question of whether multimodal GUI agents can be distracted by enviro…

In-Context Learning with Iterative Demonstration Selection Open

Chengwei Qin, Aston Zhang, Anirudh Dagar, Wenming Ye · 2023

Spurred by advancements in scale, large language models (LLMs) have demonstrated strong few-shot learning ability via in-context learning (ICL). However, the performance of ICL has been shown to be highly sensitive to the selection of few-…

You Only Look at Screens: Multimodal Chain-of-Action Agents Open

Zhuosheng Zhang, Aston Zhang · 2023

Autonomous graphical user interface (GUI) agents aim to facilitate task automation by interacting with the user interface without manual intervention. Recent studies have investigated eliciting the capabilities of large language models (LL…

Automated Few-shot Classification with Instruction-Finetuned Language Models Open

Rami Aly, Xingjian Shi, Kaixiang Lin, Aston Zhang, Andrew Gordon Wilson · 2023

A particularly successful class of approaches for few-shot learning combines language models with prompts -- hand-crafted task descriptions that complement data samples. However, designing prompts by hand for each task commonly requires do…

Vcc: Scaling Transformers to 128K Tokens or More by Prioritizing Important Tokens Open

Zhanpeng Zeng, Cole Hawkins, Mingyi Hong, Aston Zhang, Nikolaos Pappas , et al. · 2023

Transformers are central in modern natural language processing and computer vision applications. Despite recent works devoted to reducing the quadratic cost of such models (as a function of the sequence length), dealing with ultra long seq…

Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition Open

Shuhuai Ren, Aston Zhang, Yi Zhu, Shuai Zhang, Shuai Zheng , et al. · 2023

This work proposes POMP, a prompt pre-training method for vision-language models. Being memory and computation efficient, POMP enables the learned prompt to condense semantic information for a rich set of visual concepts with over twenty-t…

AIM: Adapting Image Models for Efficient Video Action Recognition Open

Taojiannan Yang, Yi Zhu, Yusheng Xie, Aston Zhang, Chen Chen , et al. · 2023

Recent vision transformer based video models mostly follow the ``image pre-training then finetuning" paradigm and have achieved great success on multiple video benchmarks. However, full finetuning such a video model could be computationall…

Automated Few-Shot Classification with Instruction-Finetuned Language Models Open

Rami Aly, Xingjian Shi, Kaixiang Lin, Aston Zhang, Andrew Gordon Wilson · 2023

A particularly successful class of approaches for few-shot learning combines language models with prompts - hand-crafted task descriptions that complement data samples. However, designing prompts by hand for each task commonly requires dom…

Learning Multimodal Data Augmentation in Feature Space Open

Zichang Liu, Zhiqiang Tang, Xingjian Shi, Aston Zhang, Mu Li , et al. · 2022

The ability to jointly learn from multiple modalities, such as text, audio, and visual data, is a defining feature of intelligent systems. While there have been promising advances in designing neural networks to harness multimodal data, th…

SPT: Semi-Parametric Prompt Tuning for Multitask Prompted Learning Open

M Saiful Bari, Aston Zhang, Shuai Zheng, Xingjian Shi, Yi Zhu , et al. · 2022

Pre-trained large language models can efficiently interpolate human-written prompts in a natural way. Multitask prompted learning can help generalization through a diverse set of tasks at once, thus enhancing the potential for more effecti…

PHNNs: Lightweight Neural Networks via Parameterized Hypercomplex Convolutions Open

Eleonora Grassucci, Aston Zhang, Danilo Comminiello · 2022

Hypercomplex neural networks have proven to reduce the overall number of parameters while ensuring valuable performance by leveraging the properties of Clifford algebras. Recently, hypercomplex linear layers have been further improved by i…

Automatic Chain of Thought Prompting in Large Language Models Open

Zhuosheng Zhang, Aston Zhang, Mu Li, Alex Smola · 2022

Large language models (LLMs) can perform complex reasoning by generating intermediate reasoning steps. Providing these steps for prompting demonstrations is called chain-of-thought (CoT) prompting. CoT prompting has two major paradigms. On…

Partial and Asymmetric Contrastive Learning for Out-of-Distribution Detection in Long-Tailed Recognition Open

Haotao Wang, Aston Zhang, Yi Zhu, Shuai Zheng, Mu Li , et al. · 2022

Existing out-of-distribution (OOD) detection methods are typically benchmarked on training sets with balanced class distributions. However, in real-world applications, it is common for the training sets to have long-tailed distributions. I…

Removing Batch Normalization Boosts Adversarial Training Open

Haotao Wang, Aston Zhang, Shuai Zheng, Xingjian Shi, Mu Li , et al. · 2022

Adversarial training (AT) defends deep neural networks against adversarial attacks. One challenge that limits its practical application is the performance degradation on clean samples. A major bottleneck identified by previous works is the…

Lightweight Convolutional Neural Networks By Hypercomplex Parameterization. Open

Eleonora Grassucci, Aston Zhang, Danilo Comminiello · 2021

Hypercomplex neural networks have proved to reduce the overall number of parameters while ensuring valuable performances by leveraging the properties of Clifford algebras. Recently, hypercomplex linear layers have been further improved by …

PHNNs: Lightweight Neural Networks via Parameterized Hypercomplex Convolutions Open

Eleonora Grassucci, Aston Zhang, Danilo Comminiello · 2021

Hypercomplex neural networks have proven to reduce the overall number of parameters while ensuring valuable performance by leveraging the properties of Clifford algebras. Recently, hypercomplex linear layers have been further improved by i…

Dive into Deep Learning Open

Aston Zhang, Zachary C. Lipton, Li Mu, Alexander J. Smola · 2021

This open-source book represents our attempt to make deep learning approachable, teaching readers the concepts, the context, and the code. The entire book is drafted in Jupyter notebooks, seamlessly integrating exposition figures, math, an…

Controllable and Diverse Text Generation in E-commerce Open

Huajie Shao, Jun Wang, Haohong Lin, Xuezhou Zhang, Aston Zhang , et al. · 2021

In E-commerce, a key challenge in text generation is to find a good trade-off between word diversity and accuracy (relevance) in order to make generated text appear more natural and human-like. In order to improve the relevance of generate…

Controllable and Diverse Text Generation in E-commerce Open

Huajie Shao, Jun Wang, Haohong Lin, Xuezhou Zhang, Aston Zhang , et al. · 2021

In E-commerce, a key challenge in text generation is to find a good trade-off between word diversity and accuracy (relevance) in order to make generated text appear more natural and human-like. In order to improve the relevance of generate…

Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with $1/n$ Parameters Open

Aston Zhang, Yi Tay, Shuai Zhang, Alvin Chan, Anh Tuan Luu , et al. · 2021

Recent works have demonstrated reasonable success of representation learning in hypercomplex space. Specifically, "fully-connected layers with Quaternions" (4D hypercomplex numbers), which replace real-valued matrix multiplications in full…

On Orthogonality Constraints for Transformers Open

Aston Zhang, Alvin Chan, Yi Tay, Jie Fu, Shuohang Wang , et al. · 2021

Orthogonality constraints encourage matrices to be orthogonal for numerical stability. These plug-and-play constraints, which can be conveniently incorporated into model training, have been studied for popular architectures in natural lang…

Learning User Representations with Hypercuboids for Recommender Systems Open

Shuai Zhang, Huoyu Liu, Aston Zhang, Yue Hu, Ce Zhang , et al. · 2020

Modeling user interests is crucial in real-world recommender systems. In this paper, we present a new user interest representation model for personalized recommendation. Specifically, the key novelty behind our model is that it explicitly …

ControlVAE: Tuning, Analytical Properties, and Performance Analysis Open

Huajie Shao, Zhisheng Xiao, Shuochao Yao, Aston Zhang, Shengzhong Liu , et al. · 2020

This paper reviews the novel concept of controllable variational autoencoder (ControlVAE), discusses its parameter tuning to meet application needs, derives its key analytic properties, and offers useful extensions and applications. Contro…

Text Style Transfer: A Review and Experimental Evaluation Open

Zhiqiang Hu, Roy Ka-Wei Lee, Charu C. Aggarwal, Aston Zhang · 2020

The stylistic properties of text have intrigued computational linguistics researchers in recent years. Specifically, researchers have investigated the Text Style Transfer (TST) task, which aims to change the stylistic properties of the tex…

Aston Zhang YOU? Author Swipe