Aoyan Li
YOU?
Author Swipe
View article: UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning Open
The development of autonomous agents for graphical user interfaces (GUIs) presents major challenges in artificial intelligence. While recent advances in native agent models have shown promise by unifying perception, reasoning, action, and …
View article: Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving
Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving Open
The task of issue resolving is to modify a codebase to generate a patch that addresses a given issue. However, existing benchmarks, such as SWE-bench, focus almost exclusively on Python, making them insufficient for evaluating Large Langua…
View article: FullStack Bench: Evaluating LLMs as Full Stack Coders
FullStack Bench: Evaluating LLMs as Full Stack Coders Open
As the capabilities of code large language models (LLMs) continue to expand, their applications across diverse code intelligence domains are rapidly increasing. However, most existing datasets only evaluate limited application domains. To …