OpenApps: Simulating Environment Variations to Measure UI-Agent Reliability

Exploring foci of: arXiv (Cornell University) OpenApps: Simulating Environment Variations to Measure UI-Agent Reliability November 2025 • Karen Ullrich, Arjun Subramonian, Amir Bar, Ivan Evtimov, Nikolaos Tsilivis, Randall Balestriero, Julia Kempe Reliability is key to realizing the promise of autonomous UI-Agents, multimodal agents that directly interact with apps in the same manner as humans, as users must be able to trust an agent to complete a given task. Current evaluations rely on fixed environments, often clones of existing apps, which are limited in that they can only shed light on whether or how often an agent can complete a task within a specific environment. When deployed however, agents are likely to encounter variations in app design and conten… Open Article Page

Social Environment Impact Of The Covid-19 Pandemic On The Environment Rio Declaration On Environment And Development Learning Environment World Environment Day Integrated Development Environment Natural Environment Proxmox Virtual Environment Human Impact On The Environment Open Article

Automatic Certificate Management Environment Autumn Variations Measure For Measure Cinnamon (Desktop Environment) Goldberg Variations Enigma Variations The Last Full Measure (2019 Film) Hustler's P.O.M.E. (Product Of My Environment) Cosmic (Desktop Environment) Open Article

Ministry Of Environment, Forest And Climate Change Hot Dog Variations Preboot Execution Environment Tape Measure Budgie (Desktop Environment) Common Desktop Environment Open Article