Explanipedia

UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning Open

Haoming Wang, Haoyang Zou, Hongmei Song, Jiazhan Feng, Jun‐Jie Fang , et al. · 2025

The development of autonomous agents for graphical user interfaces (GUIs) presents major challenges in artificial intelligence. While recent advances in native agent models have shown promise by unifying perception, reasoning, action, and …

Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving Open

Daoguang Zan, Zhirong Huang, Wei Liu, Hao Chen, Linhao Zhang , et al. · 2025

The task of issue resolving is to modify a codebase to generate a patch that addresses a given issue. However, existing benchmarks, such as SWE-bench, focus almost exclusively on Python, making them insufficient for evaluating Large Langua…

FullStack Bench: Evaluating LLMs as Full Stack Coders Open

Siyao Liu, Zhu He, Jerry Liu, Shijie Xin, Aoyan Li , et al. · 2024

As the capabilities of code large language models (LLMs) continue to expand, their applications across diverse code intelligence domains are rapidly increasing. However, most existing datasets only evaluate limited application domains. To …

Aoyan Li YOU? Author Swipe