Eldon Schoop
YOU?
Author Swipe
View article: AgentBuilder: Exploring Scaffolds for Prototyping User Experiences of Interface Agents
AgentBuilder: Exploring Scaffolds for Prototyping User Experiences of Interface Agents Open
Interface agents powered by generative AI models (referred to as "agents") can automate actions based on user commands. An important aspect of developing agents is their user experience (i.e., agent experience). There is a growing need to …
View article: Athena: Intermediate Representations for Iterative Scaffolded App Generation with an LLM
Athena: Intermediate Representations for Iterative Scaffolded App Generation with an LLM Open
It is challenging to generate the code for a complete user interface using a Large Language Model (LLM). User interfaces are complex and their implementations often consist of multiple, inter-related files that together specify the content…
View article: From Interaction to Impact: Towards Safer AI Agents Through Understanding and Evaluating Mobile UI Operation Impacts
From Interaction to Impact: Towards Safer AI Agents Through Understanding and Evaluating Mobile UI Operation Impacts Open
With advances in generative AI, there is increasing work towards creating autonomous agents that can manage daily tasks by operating user interfaces (UIs). While prior research has studied the mechanics of how AI agents might navigate UIs …
View article: UICoder: Finetuning Large Language Models to Generate User Interface Code through Automated Feedback
UICoder: Finetuning Large Language Models to Generate User Interface Code through Automated Feedback Open
Large language models (LLMs) struggle to consistently generate UI code that compiles and produces visually relevant designs. Existing approaches to improve generation rely on expensive human feedback or distilling a proprietary model. In t…
View article: AXNav: Replaying Accessibility Tests from Natural Language
AXNav: Replaying Accessibility Tests from Natural Language Open
Publisher Copyright: © 2024 Copyright held by the owner/author(s)
View article: Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs Open
Recent advancements in multimodal large language models (MLLMs) have been noteworthy, yet, these general-domain MLLMs often fall short in their ability to comprehend and interact effectively with user interface (UI) screens. In this paper,…
View article: Never-ending Learning of User Interfaces
Never-ending Learning of User Interfaces Open
Machine learning models have been trained to predict semantic information about user interfaces (UIs) to make apps more accessible, easier to test, and to automate. Currently, most models rely on datasets of static screenshots that are lab…
View article: ILuvUI: Instruction-tuned LangUage-Vision modeling of UIs from Machine Conversations
ILuvUI: Instruction-tuned LangUage-Vision modeling of UIs from Machine Conversations Open
Multimodal Vision-Language Models (VLMs) enable powerful applications from their fused understanding of images and language, but many perform poorly on UI tasks due to the lack of UI training data. In this paper, we adapt a recipe for gene…
View article: Never-ending Learning of User Interfaces
Never-ending Learning of User Interfaces Open
Machine learning models have been trained to predict semantic information about user interfaces (UIs) to make apps more accessible, easier to test, and to automate. Currently, most models rely on datasets that are collected and labeled by …
View article: Predicting and Explaining Mobile UI Tappability with Vision Modeling and Saliency Analysis
Predicting and Explaining Mobile UI Tappability with Vision Modeling and Saliency Analysis Open
We use a deep learning based approach to predict whether a selected element in a mobile UI screenshot will be perceived by users as tappable, based on pixels only instead of view hierarchies required by previous work. To help designers bet…
View article: Predicting and Explaining Mobile UI Tappability with Vision Modeling and Saliency Analysis
Predicting and Explaining Mobile UI Tappability with Vision Modeling and Saliency Analysis Open
We use a deep learning based approach to predict whether a selected element in a mobile UI screenshot will be perceived by users as tappable, based on pixels only instead of view hierarchies required by previous work. To help designers bet…
View article: IMACS: Image Model Attribution Comparison Summaries
IMACS: Image Model Attribution Comparison Summaries Open
Developing a suitable Deep Neural Network (DNN) often requires significant iteration, where different model versions are evaluated and compared. While metrics such as accuracy are a powerful means to succinctly describe a model's performan…
View article: UMLAUT: Debugging Deep Learning Programs using Program Structure and Model Behavior
UMLAUT: Debugging Deep Learning Programs using Program Structure and Model Behavior Open
Training deep neural networks can generate non-descriptive error messages or produce unusual output without any explicit errors at all. While experts rely on tacit knowledge to apply debugging strategies, non-experts lack the experience re…
View article: HindSight
HindSight Open
Our perception of our surrounding environment is limited by the constraints of human biology. The field of augmented perception asks how our sensory capabilities can be usefully extended through computational means. We argue that spatial a…
View article: Drill Sergeant
Drill Sergeant Open
Mapping techniques from software tutorials onto physical craft processes can assist novices in building multi-material assemblies. By providing in-situ step instructions and progress tracking, generating dynamic feedback on technique, and …