X-WebAgentBench: A Multilingual Interactive Web Benchmark for Evaluating Global Agentic System Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2505.15372
Recently, large language model (LLM)-based agents have achieved significant success in interactive environments, attracting significant academic and industrial attention. Despite these advancements, current research predominantly focuses on English scenarios. In reality, there are over 7,000 languages worldwide, all of which demand access to comparable agentic services. Nevertheless, the development of language agents remains inadequate for meeting the diverse requirements of multilingual agentic applications. To fill this gap, we introduce X-WebAgentBench, a novel multilingual agent benchmark in an interactive web environment, which evaluates the planning and interaction performance of language agents across multiple languages, thereby contributing to the advancement of global agent intelligence. Additionally, we assess the performance of various LLMs and cross-lingual alignment methods, examining their effectiveness in enhancing agents. Our findings reveal that even advanced models like GPT-4o, when combined with cross-lingual techniques, fail to achieve satisfactory results. We hope that X-WebAgentBench can serve as a valuable benchmark for multilingual agent scenario in real-world applications.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2505.15372
- https://arxiv.org/pdf/2505.15372
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4415328836
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4415328836Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2505.15372Digital Object Identifier
- Title
-
X-WebAgentBench: A Multilingual Interactive Web Benchmark for Evaluating Global Agentic SystemWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-05-21Full publication date if available
- Authors
-
Peng Wang, Ran Tao, Qiguang Chen, Mengkang Hu, Libo QinList of authors in order
- Landing page
-
https://arxiv.org/abs/2505.15372Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2505.15372Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2505.15372Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4415328836 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2505.15372 |
| ids.doi | https://doi.org/10.48550/arxiv.2505.15372 |
| ids.openalex | https://openalex.org/W4415328836 |
| fwci | |
| type | preprint |
| title | X-WebAgentBench: A Multilingual Interactive Web Benchmark for Evaluating Global Agentic System |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10456 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9448000192642212 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Multi-Agent Systems and Negotiation |
| topics[1].id | https://openalex.org/T10215 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9078999757766724 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Semantic Web and Ontologies |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2505.15372 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | cc-by-nc-sa |
| locations[0].pdf_url | https://arxiv.org/pdf/2505.15372 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | https://openalex.org/licenses/cc-by-nc-sa |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2505.15372 |
| locations[1].id | doi:10.48550/arxiv.2505.15372 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2505.15372 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5058176560 |
| authorships[0].author.orcid | https://orcid.org/0000-0001-8782-857X |
| authorships[0].author.display_name | Peng Wang |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Wang, Peng |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5067803447 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-5243-7189 |
| authorships[1].author.display_name | Ran Tao |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Tao, Ruihan |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5103207823 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-9154-7858 |
| authorships[2].author.display_name | Qiguang Chen |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Chen, Qiguang |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5081386315 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Mengkang Hu |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Hu, Mengkang |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5029082837 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-3619-675X |
| authorships[4].author.display_name | Libo Qin |
| authorships[4].author_position | last |
| authorships[4].raw_author_name | Qin, Libo |
| authorships[4].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2505.15372 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-19T00:00:00 |
| display_name | X-WebAgentBench: A Multilingual Interactive Web Benchmark for Evaluating Global Agentic System |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10456 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9448000192642212 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Multi-Agent Systems and Negotiation |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2505.15372 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | cc-by-nc-sa |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2505.15372 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by-nc-sa |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2505.15372 |
| primary_location.id | pmh:oai:arXiv.org:2505.15372 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | cc-by-nc-sa |
| primary_location.pdf_url | https://arxiv.org/pdf/2505.15372 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | https://openalex.org/licenses/cc-by-nc-sa |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2505.15372 |
| publication_date | 2025-05-21 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 70, 146 |
| abstract_inverted_index.In | 29 |
| abstract_inverted_index.To | 63 |
| abstract_inverted_index.We | 139 |
| abstract_inverted_index.an | 76 |
| abstract_inverted_index.as | 145 |
| abstract_inverted_index.in | 10, 75, 117, 153 |
| abstract_inverted_index.of | 38, 49, 59, 87, 98, 107 |
| abstract_inverted_index.on | 26 |
| abstract_inverted_index.to | 42, 95, 135 |
| abstract_inverted_index.we | 67, 103 |
| abstract_inverted_index.Our | 120 |
| abstract_inverted_index.all | 37 |
| abstract_inverted_index.and | 16, 84, 110 |
| abstract_inverted_index.are | 32 |
| abstract_inverted_index.can | 143 |
| abstract_inverted_index.for | 54, 149 |
| abstract_inverted_index.the | 47, 56, 82, 96, 105 |
| abstract_inverted_index.web | 78 |
| abstract_inverted_index.LLMs | 109 |
| abstract_inverted_index.even | 124 |
| abstract_inverted_index.fail | 134 |
| abstract_inverted_index.fill | 64 |
| abstract_inverted_index.gap, | 66 |
| abstract_inverted_index.have | 6 |
| abstract_inverted_index.hope | 140 |
| abstract_inverted_index.like | 127 |
| abstract_inverted_index.over | 33 |
| abstract_inverted_index.that | 123, 141 |
| abstract_inverted_index.this | 65 |
| abstract_inverted_index.when | 129 |
| abstract_inverted_index.with | 131 |
| abstract_inverted_index.7,000 | 34 |
| abstract_inverted_index.agent | 73, 100, 151 |
| abstract_inverted_index.large | 1 |
| abstract_inverted_index.model | 3 |
| abstract_inverted_index.novel | 71 |
| abstract_inverted_index.serve | 144 |
| abstract_inverted_index.their | 115 |
| abstract_inverted_index.there | 31 |
| abstract_inverted_index.these | 20 |
| abstract_inverted_index.which | 39, 80 |
| abstract_inverted_index.access | 41 |
| abstract_inverted_index.across | 90 |
| abstract_inverted_index.agents | 5, 51, 89 |
| abstract_inverted_index.assess | 104 |
| abstract_inverted_index.demand | 40 |
| abstract_inverted_index.global | 99 |
| abstract_inverted_index.models | 126 |
| abstract_inverted_index.reveal | 122 |
| abstract_inverted_index.Despite | 19 |
| abstract_inverted_index.English | 27 |
| abstract_inverted_index.GPT-4o, | 128 |
| abstract_inverted_index.achieve | 136 |
| abstract_inverted_index.agentic | 44, 61 |
| abstract_inverted_index.agents. | 119 |
| abstract_inverted_index.current | 22 |
| abstract_inverted_index.diverse | 57 |
| abstract_inverted_index.focuses | 25 |
| abstract_inverted_index.meeting | 55 |
| abstract_inverted_index.remains | 52 |
| abstract_inverted_index.success | 9 |
| abstract_inverted_index.thereby | 93 |
| abstract_inverted_index.various | 108 |
| abstract_inverted_index.academic | 15 |
| abstract_inverted_index.achieved | 7 |
| abstract_inverted_index.advanced | 125 |
| abstract_inverted_index.combined | 130 |
| abstract_inverted_index.findings | 121 |
| abstract_inverted_index.language | 2, 50, 88 |
| abstract_inverted_index.methods, | 113 |
| abstract_inverted_index.multiple | 91 |
| abstract_inverted_index.planning | 83 |
| abstract_inverted_index.reality, | 30 |
| abstract_inverted_index.research | 23 |
| abstract_inverted_index.results. | 138 |
| abstract_inverted_index.scenario | 152 |
| abstract_inverted_index.valuable | 147 |
| abstract_inverted_index.Recently, | 0 |
| abstract_inverted_index.alignment | 112 |
| abstract_inverted_index.benchmark | 74, 148 |
| abstract_inverted_index.enhancing | 118 |
| abstract_inverted_index.evaluates | 81 |
| abstract_inverted_index.examining | 114 |
| abstract_inverted_index.introduce | 68 |
| abstract_inverted_index.languages | 35 |
| abstract_inverted_index.services. | 45 |
| abstract_inverted_index.attention. | 18 |
| abstract_inverted_index.attracting | 13 |
| abstract_inverted_index.comparable | 43 |
| abstract_inverted_index.inadequate | 53 |
| abstract_inverted_index.industrial | 17 |
| abstract_inverted_index.languages, | 92 |
| abstract_inverted_index.real-world | 154 |
| abstract_inverted_index.scenarios. | 28 |
| abstract_inverted_index.worldwide, | 36 |
| abstract_inverted_index.(LLM)-based | 4 |
| abstract_inverted_index.advancement | 97 |
| abstract_inverted_index.development | 48 |
| abstract_inverted_index.interaction | 85 |
| abstract_inverted_index.interactive | 11, 77 |
| abstract_inverted_index.performance | 86, 106 |
| abstract_inverted_index.significant | 8, 14 |
| abstract_inverted_index.techniques, | 133 |
| abstract_inverted_index.contributing | 94 |
| abstract_inverted_index.environment, | 79 |
| abstract_inverted_index.multilingual | 60, 72, 150 |
| abstract_inverted_index.requirements | 58 |
| abstract_inverted_index.satisfactory | 137 |
| abstract_inverted_index.Additionally, | 102 |
| abstract_inverted_index.Nevertheless, | 46 |
| abstract_inverted_index.advancements, | 21 |
| abstract_inverted_index.applications. | 62, 155 |
| abstract_inverted_index.cross-lingual | 111, 132 |
| abstract_inverted_index.effectiveness | 116 |
| abstract_inverted_index.environments, | 12 |
| abstract_inverted_index.intelligence. | 101 |
| abstract_inverted_index.predominantly | 24 |
| abstract_inverted_index.X-WebAgentBench | 142 |
| abstract_inverted_index.X-WebAgentBench, | 69 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 5 |
| citation_normalized_percentile |