WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2407.05291
The ability of large language models (LLMs) to mimic human-like intelligence has led to a surge in LLM-based autonomous agents. Though recent LLMs seem capable of planning and reasoning given user instructions, their effectiveness in applying these capabilities for autonomous task solving remains underexplored. This is especially true in enterprise settings, where automated agents hold the promise of a high impact. To fill this gap, we propose WorkArena++, a novel benchmark consisting of 682 tasks corresponding to realistic workflows routinely performed by knowledge workers. WorkArena++ is designed to evaluate the planning, problem-solving, logical/arithmetic reasoning, retrieval, and contextual understanding abilities of web agents. Our empirical studies across state-of-the-art LLMs and vision-language models (VLMs), as well as human workers, reveal several challenges for such models to serve as useful assistants in the workplace. In addition to the benchmark, we provide a mechanism to effortlessly generate thousands of ground-truth observation/action traces, which can be used for fine-tuning existing models. Overall, we expect this work to serve as a useful resource to help the community progress toward capable autonomous agents. The benchmark can be found at https://github.com/ServiceNow/WorkArena.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2407.05291
- https://arxiv.org/pdf/2407.05291
- OA Status
- green
- Cited By
- 1
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4400480865
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4400480865Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2407.05291Digital Object Identifier
- Title
-
WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work TasksWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-07-07Full publication date if available
- Authors
-
Léo Boisvert, Megh Thakkar, Maxime Gasse, Massimo Caccia, Thibault Le Sellier De Chezelles, Quentin Cappart, Nicolas Chapados, Alexandre Lacoste, Alexandre DrouinList of authors in order
- Landing page
-
https://arxiv.org/abs/2407.05291Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2407.05291Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2407.05291Direct OA link when available
- Concepts
-
Computer science, Work (physics), Model-based reasoning, Knowledge management, Artificial intelligence, Knowledge representation and reasoning, Engineering, Mechanical engineeringTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2024: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4400480865 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2407.05291 |
| ids.doi | https://doi.org/10.48550/arxiv.2407.05291 |
| ids.openalex | https://openalex.org/W4400480865 |
| fwci | |
| type | preprint |
| title | WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10703 |
| topics[0].field.id | https://openalex.org/fields/14 |
| topics[0].field.display_name | Business, Management and Accounting |
| topics[0].score | 0.9776999950408936 |
| topics[0].domain.id | https://openalex.org/domains/2 |
| topics[0].domain.display_name | Social Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1404 |
| topics[0].subfield.display_name | Management Information Systems |
| topics[0].display_name | Business Process Modeling and Analysis |
| topics[1].id | https://openalex.org/T10215 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9767000079154968 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Semantic Web and Ontologies |
| topics[2].id | https://openalex.org/T10679 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9284999966621399 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1710 |
| topics[2].subfield.display_name | Information Systems |
| topics[2].display_name | Service-Oriented Architecture and Web Services |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.6118787527084351 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C18762648 |
| concepts[1].level | 2 |
| concepts[1].score | 0.5505431890487671 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q42213 |
| concepts[1].display_name | Work (physics) |
| concepts[2].id | https://openalex.org/C37335422 |
| concepts[2].level | 3 |
| concepts[2].score | 0.4419613480567932 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q6888134 |
| concepts[2].display_name | Model-based reasoning |
| concepts[3].id | https://openalex.org/C56739046 |
| concepts[3].level | 1 |
| concepts[3].score | 0.41794389486312866 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q192060 |
| concepts[3].display_name | Knowledge management |
| concepts[4].id | https://openalex.org/C154945302 |
| concepts[4].level | 1 |
| concepts[4].score | 0.3383557200431824 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[4].display_name | Artificial intelligence |
| concepts[5].id | https://openalex.org/C161301231 |
| concepts[5].level | 2 |
| concepts[5].score | 0.25356775522232056 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q3478658 |
| concepts[5].display_name | Knowledge representation and reasoning |
| concepts[6].id | https://openalex.org/C127413603 |
| concepts[6].level | 0 |
| concepts[6].score | 0.09707912802696228 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q11023 |
| concepts[6].display_name | Engineering |
| concepts[7].id | https://openalex.org/C78519656 |
| concepts[7].level | 1 |
| concepts[7].score | 0.0 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q101333 |
| concepts[7].display_name | Mechanical engineering |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.6118787527084351 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/work |
| keywords[1].score | 0.5505431890487671 |
| keywords[1].display_name | Work (physics) |
| keywords[2].id | https://openalex.org/keywords/model-based-reasoning |
| keywords[2].score | 0.4419613480567932 |
| keywords[2].display_name | Model-based reasoning |
| keywords[3].id | https://openalex.org/keywords/knowledge-management |
| keywords[3].score | 0.41794389486312866 |
| keywords[3].display_name | Knowledge management |
| keywords[4].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[4].score | 0.3383557200431824 |
| keywords[4].display_name | Artificial intelligence |
| keywords[5].id | https://openalex.org/keywords/knowledge-representation-and-reasoning |
| keywords[5].score | 0.25356775522232056 |
| keywords[5].display_name | Knowledge representation and reasoning |
| keywords[6].id | https://openalex.org/keywords/engineering |
| keywords[6].score | 0.09707912802696228 |
| keywords[6].display_name | Engineering |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2407.05291 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2407.05291 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2407.05291 |
| locations[1].id | doi:10.48550/arxiv.2407.05291 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2407.05291 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5094134758 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Léo Boisvert |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Boisvert, Léo |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5073755881 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Megh Thakkar |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Thakkar, Megh |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5011602062 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-6982-062X |
| authorships[2].author.display_name | Maxime Gasse |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Gasse, Maxime |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5081810172 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-4482-4541 |
| authorships[3].author.display_name | Massimo Caccia |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Caccia, Massimo |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5100493934 |
| authorships[4].author.orcid | |
| authorships[4].author.display_name | Thibault Le Sellier De Chezelles |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | De Chezelles, Thibault Le Sellier |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5065781444 |
| authorships[5].author.orcid | https://orcid.org/0000-0002-8742-0774 |
| authorships[5].author.display_name | Quentin Cappart |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Cappart, Quentin |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5039831068 |
| authorships[6].author.orcid | https://orcid.org/0000-0003-0249-7607 |
| authorships[6].author.display_name | Nicolas Chapados |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Chapados, Nicolas |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5040248599 |
| authorships[7].author.orcid | |
| authorships[7].author.display_name | Alexandre Lacoste |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Lacoste, Alexandre |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A5028239106 |
| authorships[8].author.orcid | https://orcid.org/0000-0001-7718-0319 |
| authorships[8].author.display_name | Alexandre Drouin |
| authorships[8].author_position | last |
| authorships[8].raw_author_name | Drouin, Alexandre |
| authorships[8].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2407.05291 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10703 |
| primary_topic.field.id | https://openalex.org/fields/14 |
| primary_topic.field.display_name | Business, Management and Accounting |
| primary_topic.score | 0.9776999950408936 |
| primary_topic.domain.id | https://openalex.org/domains/2 |
| primary_topic.domain.display_name | Social Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1404 |
| primary_topic.subfield.display_name | Management Information Systems |
| primary_topic.display_name | Business Process Modeling and Analysis |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W2358668433, https://openalex.org/W4396701345, https://openalex.org/W2376932109, https://openalex.org/W2001405890, https://openalex.org/W4396696052, https://openalex.org/W2382290278, https://openalex.org/W4395014643 |
| cited_by_count | 1 |
| counts_by_year[0].year | 2024 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2407.05291 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2407.05291 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2407.05291 |
| primary_location.id | pmh:oai:arXiv.org:2407.05291 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2407.05291 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2407.05291 |
| publication_date | 2024-07-07 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 14, 58, 68, 138, 164 |
| abstract_inverted_index.In | 131 |
| abstract_inverted_index.To | 61 |
| abstract_inverted_index.as | 112, 114, 125, 163 |
| abstract_inverted_index.at | 181 |
| abstract_inverted_index.be | 150, 179 |
| abstract_inverted_index.by | 81 |
| abstract_inverted_index.in | 16, 34, 48, 128 |
| abstract_inverted_index.is | 45, 85 |
| abstract_inverted_index.of | 2, 25, 57, 72, 99, 144 |
| abstract_inverted_index.to | 7, 13, 76, 87, 123, 133, 140, 161, 167 |
| abstract_inverted_index.we | 65, 136, 157 |
| abstract_inverted_index.682 | 73 |
| abstract_inverted_index.Our | 102 |
| abstract_inverted_index.The | 0, 176 |
| abstract_inverted_index.and | 27, 95, 108 |
| abstract_inverted_index.can | 149, 178 |
| abstract_inverted_index.for | 38, 120, 152 |
| abstract_inverted_index.has | 11 |
| abstract_inverted_index.led | 12 |
| abstract_inverted_index.the | 55, 89, 129, 134, 169 |
| abstract_inverted_index.web | 100 |
| abstract_inverted_index.LLMs | 22, 107 |
| abstract_inverted_index.This | 44 |
| abstract_inverted_index.fill | 62 |
| abstract_inverted_index.gap, | 64 |
| abstract_inverted_index.help | 168 |
| abstract_inverted_index.high | 59 |
| abstract_inverted_index.hold | 54 |
| abstract_inverted_index.seem | 23 |
| abstract_inverted_index.such | 121 |
| abstract_inverted_index.task | 40 |
| abstract_inverted_index.this | 63, 159 |
| abstract_inverted_index.true | 47 |
| abstract_inverted_index.used | 151 |
| abstract_inverted_index.user | 30 |
| abstract_inverted_index.well | 113 |
| abstract_inverted_index.work | 160 |
| abstract_inverted_index.found | 180 |
| abstract_inverted_index.given | 29 |
| abstract_inverted_index.human | 115 |
| abstract_inverted_index.large | 3 |
| abstract_inverted_index.mimic | 8 |
| abstract_inverted_index.novel | 69 |
| abstract_inverted_index.serve | 124, 162 |
| abstract_inverted_index.surge | 15 |
| abstract_inverted_index.tasks | 74 |
| abstract_inverted_index.their | 32 |
| abstract_inverted_index.these | 36 |
| abstract_inverted_index.where | 51 |
| abstract_inverted_index.which | 148 |
| abstract_inverted_index.(LLMs) | 6 |
| abstract_inverted_index.Though | 20 |
| abstract_inverted_index.across | 105 |
| abstract_inverted_index.agents | 53 |
| abstract_inverted_index.expect | 158 |
| abstract_inverted_index.models | 5, 110, 122 |
| abstract_inverted_index.recent | 21 |
| abstract_inverted_index.reveal | 117 |
| abstract_inverted_index.toward | 172 |
| abstract_inverted_index.useful | 126, 165 |
| abstract_inverted_index.(VLMs), | 111 |
| abstract_inverted_index.ability | 1 |
| abstract_inverted_index.agents. | 19, 101, 175 |
| abstract_inverted_index.capable | 24, 173 |
| abstract_inverted_index.impact. | 60 |
| abstract_inverted_index.models. | 155 |
| abstract_inverted_index.promise | 56 |
| abstract_inverted_index.propose | 66 |
| abstract_inverted_index.provide | 137 |
| abstract_inverted_index.remains | 42 |
| abstract_inverted_index.several | 118 |
| abstract_inverted_index.solving | 41 |
| abstract_inverted_index.studies | 104 |
| abstract_inverted_index.traces, | 147 |
| abstract_inverted_index.Overall, | 156 |
| abstract_inverted_index.addition | 132 |
| abstract_inverted_index.applying | 35 |
| abstract_inverted_index.designed | 86 |
| abstract_inverted_index.evaluate | 88 |
| abstract_inverted_index.existing | 154 |
| abstract_inverted_index.generate | 142 |
| abstract_inverted_index.language | 4 |
| abstract_inverted_index.planning | 26 |
| abstract_inverted_index.progress | 171 |
| abstract_inverted_index.resource | 166 |
| abstract_inverted_index.workers, | 116 |
| abstract_inverted_index.workers. | 83 |
| abstract_inverted_index.LLM-based | 17 |
| abstract_inverted_index.abilities | 98 |
| abstract_inverted_index.automated | 52 |
| abstract_inverted_index.benchmark | 70, 177 |
| abstract_inverted_index.community | 170 |
| abstract_inverted_index.empirical | 103 |
| abstract_inverted_index.knowledge | 82 |
| abstract_inverted_index.mechanism | 139 |
| abstract_inverted_index.performed | 80 |
| abstract_inverted_index.planning, | 90 |
| abstract_inverted_index.realistic | 77 |
| abstract_inverted_index.reasoning | 28 |
| abstract_inverted_index.routinely | 79 |
| abstract_inverted_index.settings, | 50 |
| abstract_inverted_index.thousands | 143 |
| abstract_inverted_index.workflows | 78 |
| abstract_inverted_index.assistants | 127 |
| abstract_inverted_index.autonomous | 18, 39, 174 |
| abstract_inverted_index.benchmark, | 135 |
| abstract_inverted_index.challenges | 119 |
| abstract_inverted_index.consisting | 71 |
| abstract_inverted_index.contextual | 96 |
| abstract_inverted_index.enterprise | 49 |
| abstract_inverted_index.especially | 46 |
| abstract_inverted_index.human-like | 9 |
| abstract_inverted_index.reasoning, | 93 |
| abstract_inverted_index.retrieval, | 94 |
| abstract_inverted_index.workplace. | 130 |
| abstract_inverted_index.WorkArena++ | 84 |
| abstract_inverted_index.fine-tuning | 153 |
| abstract_inverted_index.WorkArena++, | 67 |
| abstract_inverted_index.capabilities | 37 |
| abstract_inverted_index.effortlessly | 141 |
| abstract_inverted_index.ground-truth | 145 |
| abstract_inverted_index.intelligence | 10 |
| abstract_inverted_index.corresponding | 75 |
| abstract_inverted_index.effectiveness | 33 |
| abstract_inverted_index.instructions, | 31 |
| abstract_inverted_index.understanding | 97 |
| abstract_inverted_index.underexplored. | 43 |
| abstract_inverted_index.vision-language | 109 |
| abstract_inverted_index.problem-solving, | 91 |
| abstract_inverted_index.state-of-the-art | 106 |
| abstract_inverted_index.logical/arithmetic | 92 |
| abstract_inverted_index.observation/action | 146 |
| abstract_inverted_index.https://github.com/ServiceNow/WorkArena. | 182 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 9 |
| citation_normalized_percentile |