M^3-Bench: Multi-Modal, Multi-Hop, Multi-Threaded Tool-Using MLLM Agent Benchmark Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2511.17729
We present M^3-Bench, the first benchmark for evaluating multimodal tool use under the Model Context Protocol. The benchmark targets realistic, multi-hop and multi-threaded workflows that require visual grounding and textual reasoning, cross-tool dependencies, and persistence of intermediate resources across steps. We introduce a similarity-driven alignment that serializes each tool call, embeds signatures with a sentence encoder, and performs similarity-bucketed Hungarian matching to obtain auditable one-to-one correspondences. On top of this alignment, we report interpretable metrics that decouple semantic fidelity from workflow consistency. The benchmark spans 28 servers with 231 tools, and provides standardized trajectories curated through an Executor & Judge pipeline with human verification; an auxiliary four large language models (LLMs) judge ensemble reports end-task Task Completion and information grounding. Evaluations of representative state-of-the-art Multimodal LLMs (MLLMs) reveal persistent gaps in multimodal MCP tool use, particularly in argument fidelity and structure consistency, underscoring the need for methods that jointly reason over images, text, and tool graphs. Our Benchmark's anonymous repository is at https://github.com/EtaYang10th/Open-M3-Bench
Related Topics
- Type
- preprint
- Landing Page
- https://doi.org/10.48550/arxiv.2511.17729
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W7106712151
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W7106712151Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2511.17729Digital Object Identifier
- Title
-
M^3-Bench: Multi-Modal, Multi-Hop, Multi-Threaded Tool-Using MLLM Agent BenchmarkWork title
- Type
-
preprintOpenAlex work type
- Publication year
-
2025Year of publication
- Publication date
-
2025-11-21Full publication date if available
- Authors
-
Yang Zhou, Zhao Ming-yu, Wang Zhenting, Gu, Difei, Guo, Bangwei, Ye RuoSong, Han, Ligong, Jin Can, Metaxas, Dimitris N.List of authors in order
- Landing page
-
https://doi.org/10.48550/arxiv.2511.17729Publisher landing page
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://doi.org/10.48550/arxiv.2511.17729Direct OA link when available
- Concepts
-
Benchmark (surveying), Computer science, Workflow, Context (archaeology), Pipeline (software), Task (project management), Artificial intelligence, Matching (statistics), Semantics (computer science), Sentence, Machine learning, Server, Parsing, Argument (complex analysis), Executor, Scheme (mathematics), Natural language processing, Coreference, Language model, Data mining, Information retrieval, Context model, Pattern matching, Process (computing), Benchmarking, Data modeling, Fidelity, Semantic heterogeneity, Human-in-the-loop, OracleTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W7106712151 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2511.17729 |
| ids.doi | https://doi.org/10.48550/arxiv.2511.17729 |
| ids.openalex | https://openalex.org/W7106712151 |
| fwci | |
| type | preprint |
| title | M^3-Bench: Multi-Modal, Multi-Hop, Multi-Threaded Tool-Using MLLM Agent Benchmark |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C185798385 |
| concepts[0].level | 2 |
| concepts[0].score | 0.8338862657546997 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q1161707 |
| concepts[0].display_name | Benchmark (surveying) |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.7966123819351196 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C177212765 |
| concepts[2].level | 2 |
| concepts[2].score | 0.6959211230278015 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q627335 |
| concepts[2].display_name | Workflow |
| concepts[3].id | https://openalex.org/C2779343474 |
| concepts[3].level | 2 |
| concepts[3].score | 0.6401889324188232 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q3109175 |
| concepts[3].display_name | Context (archaeology) |
| concepts[4].id | https://openalex.org/C43521106 |
| concepts[4].level | 2 |
| concepts[4].score | 0.6215966939926147 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q2165493 |
| concepts[4].display_name | Pipeline (software) |
| concepts[5].id | https://openalex.org/C2780451532 |
| concepts[5].level | 2 |
| concepts[5].score | 0.5843798518180847 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q759676 |
| concepts[5].display_name | Task (project management) |
| concepts[6].id | https://openalex.org/C154945302 |
| concepts[6].level | 1 |
| concepts[6].score | 0.5173117518424988 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[6].display_name | Artificial intelligence |
| concepts[7].id | https://openalex.org/C165064840 |
| concepts[7].level | 2 |
| concepts[7].score | 0.4918988049030304 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q1321061 |
| concepts[7].display_name | Matching (statistics) |
| concepts[8].id | https://openalex.org/C184337299 |
| concepts[8].level | 2 |
| concepts[8].score | 0.44271695613861084 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q1437428 |
| concepts[8].display_name | Semantics (computer science) |
| concepts[9].id | https://openalex.org/C2777530160 |
| concepts[9].level | 2 |
| concepts[9].score | 0.4233413636684418 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q41796 |
| concepts[9].display_name | Sentence |
| concepts[10].id | https://openalex.org/C119857082 |
| concepts[10].level | 1 |
| concepts[10].score | 0.40053269267082214 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q2539 |
| concepts[10].display_name | Machine learning |
| concepts[11].id | https://openalex.org/C93996380 |
| concepts[11].level | 2 |
| concepts[11].score | 0.39873790740966797 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q44127 |
| concepts[11].display_name | Server |
| concepts[12].id | https://openalex.org/C186644900 |
| concepts[12].level | 2 |
| concepts[12].score | 0.39798158407211304 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q194152 |
| concepts[12].display_name | Parsing |
| concepts[13].id | https://openalex.org/C98184364 |
| concepts[13].level | 2 |
| concepts[13].score | 0.39399585127830505 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q1780131 |
| concepts[13].display_name | Argument (complex analysis) |
| concepts[14].id | https://openalex.org/C180591056 |
| concepts[14].level | 2 |
| concepts[14].score | 0.38836103677749634 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q654437 |
| concepts[14].display_name | Executor |
| concepts[15].id | https://openalex.org/C77618280 |
| concepts[15].level | 2 |
| concepts[15].score | 0.3849237859249115 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q1155772 |
| concepts[15].display_name | Scheme (mathematics) |
| concepts[16].id | https://openalex.org/C204321447 |
| concepts[16].level | 1 |
| concepts[16].score | 0.36012765765190125 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[16].display_name | Natural language processing |
| concepts[17].id | https://openalex.org/C28076734 |
| concepts[17].level | 3 |
| concepts[17].score | 0.34633752703666687 |
| concepts[17].wikidata | https://www.wikidata.org/wiki/Q63087 |
| concepts[17].display_name | Coreference |
| concepts[18].id | https://openalex.org/C137293760 |
| concepts[18].level | 2 |
| concepts[18].score | 0.34118345379829407 |
| concepts[18].wikidata | https://www.wikidata.org/wiki/Q3621696 |
| concepts[18].display_name | Language model |
| concepts[19].id | https://openalex.org/C124101348 |
| concepts[19].level | 1 |
| concepts[19].score | 0.3376232087612152 |
| concepts[19].wikidata | https://www.wikidata.org/wiki/Q172491 |
| concepts[19].display_name | Data mining |
| concepts[20].id | https://openalex.org/C23123220 |
| concepts[20].level | 1 |
| concepts[20].score | 0.3306768834590912 |
| concepts[20].wikidata | https://www.wikidata.org/wiki/Q816826 |
| concepts[20].display_name | Information retrieval |
| concepts[21].id | https://openalex.org/C183322885 |
| concepts[21].level | 3 |
| concepts[21].score | 0.29223552346229553 |
| concepts[21].wikidata | https://www.wikidata.org/wiki/Q17007702 |
| concepts[21].display_name | Context model |
| concepts[22].id | https://openalex.org/C68859911 |
| concepts[22].level | 2 |
| concepts[22].score | 0.28330859541893005 |
| concepts[22].wikidata | https://www.wikidata.org/wiki/Q1503724 |
| concepts[22].display_name | Pattern matching |
| concepts[23].id | https://openalex.org/C98045186 |
| concepts[23].level | 2 |
| concepts[23].score | 0.27824723720550537 |
| concepts[23].wikidata | https://www.wikidata.org/wiki/Q205663 |
| concepts[23].display_name | Process (computing) |
| concepts[24].id | https://openalex.org/C86251818 |
| concepts[24].level | 2 |
| concepts[24].score | 0.27686211466789246 |
| concepts[24].wikidata | https://www.wikidata.org/wiki/Q816754 |
| concepts[24].display_name | Benchmarking |
| concepts[25].id | https://openalex.org/C67186912 |
| concepts[25].level | 2 |
| concepts[25].score | 0.2730334401130676 |
| concepts[25].wikidata | https://www.wikidata.org/wiki/Q367664 |
| concepts[25].display_name | Data modeling |
| concepts[26].id | https://openalex.org/C2776459999 |
| concepts[26].level | 2 |
| concepts[26].score | 0.26774126291275024 |
| concepts[26].wikidata | https://www.wikidata.org/wiki/Q2119376 |
| concepts[26].display_name | Fidelity |
| concepts[27].id | https://openalex.org/C2778180026 |
| concepts[27].level | 4 |
| concepts[27].score | 0.2602304220199585 |
| concepts[27].wikidata | https://www.wikidata.org/wiki/Q18378163 |
| concepts[27].display_name | Semantic heterogeneity |
| concepts[28].id | https://openalex.org/C2780626000 |
| concepts[28].level | 2 |
| concepts[28].score | 0.25922906398773193 |
| concepts[28].wikidata | https://www.wikidata.org/wiki/Q5936775 |
| concepts[28].display_name | Human-in-the-loop |
| concepts[29].id | https://openalex.org/C55166926 |
| concepts[29].level | 2 |
| concepts[29].score | 0.2537321150302887 |
| concepts[29].wikidata | https://www.wikidata.org/wiki/Q2892946 |
| concepts[29].display_name | Oracle |
| keywords[0].id | https://openalex.org/keywords/benchmark |
| keywords[0].score | 0.8338862657546997 |
| keywords[0].display_name | Benchmark (surveying) |
| keywords[1].id | https://openalex.org/keywords/workflow |
| keywords[1].score | 0.6959211230278015 |
| keywords[1].display_name | Workflow |
| keywords[2].id | https://openalex.org/keywords/context |
| keywords[2].score | 0.6401889324188232 |
| keywords[2].display_name | Context (archaeology) |
| keywords[3].id | https://openalex.org/keywords/pipeline |
| keywords[3].score | 0.6215966939926147 |
| keywords[3].display_name | Pipeline (software) |
| keywords[4].id | https://openalex.org/keywords/task |
| keywords[4].score | 0.5843798518180847 |
| keywords[4].display_name | Task (project management) |
| keywords[5].id | https://openalex.org/keywords/matching |
| keywords[5].score | 0.4918988049030304 |
| keywords[5].display_name | Matching (statistics) |
| keywords[6].id | https://openalex.org/keywords/semantics |
| keywords[6].score | 0.44271695613861084 |
| keywords[6].display_name | Semantics (computer science) |
| keywords[7].id | https://openalex.org/keywords/sentence |
| keywords[7].score | 0.4233413636684418 |
| keywords[7].display_name | Sentence |
| keywords[8].id | https://openalex.org/keywords/server |
| keywords[8].score | 0.39873790740966797 |
| keywords[8].display_name | Server |
| language | |
| locations[0].id | doi:10.48550/arxiv.2511.17729 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | cc-by |
| locations[0].pdf_url | |
| locations[0].version | |
| locations[0].raw_type | article |
| locations[0].license_id | https://openalex.org/licenses/cc-by |
| locations[0].is_accepted | False |
| locations[0].is_published | |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | https://doi.org/10.48550/arxiv.2511.17729 |
| indexed_in | datacite |
| authorships[0].author.id | https://openalex.org/A2012237231 |
| authorships[0].author.orcid | https://orcid.org/0000-0001-5203-8199 |
| authorships[0].author.display_name | Yang Zhou |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Zhou, Yang |
| authorships[0].is_corresponding | True |
| authorships[1].author.id | https://openalex.org/A2347531055 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Zhao Ming-yu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Zhao, Mingyu |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A2315542665 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Wang Zhenting |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Wang, Zhenting |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Gu, Difei |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Gu, Difei |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A4281901083 |
| authorships[4].author.orcid | |
| authorships[4].author.display_name | Guo, Bangwei |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Guo, Bangwei |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A2102770750 |
| authorships[5].author.orcid | |
| authorships[5].author.display_name | Ye RuoSong |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Ye, Ruosong |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A4202193308 |
| authorships[6].author.orcid | |
| authorships[6].author.display_name | Han, Ligong |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Han, Ligong |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A2155437886 |
| authorships[7].author.orcid | |
| authorships[7].author.display_name | Jin Can |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Jin, Can |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A4222827944 |
| authorships[8].author.orcid | |
| authorships[8].author.display_name | Metaxas, Dimitris N. |
| authorships[8].author_position | last |
| authorships[8].raw_author_name | Metaxas, Dimitris N. |
| authorships[8].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://doi.org/10.48550/arxiv.2511.17729 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-11-27T00:00:00 |
| display_name | M^3-Bench: Multi-Modal, Multi-Hop, Multi-Threaded Tool-Using MLLM Agent Benchmark |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-12-03T23:09:05.601824 |
| primary_topic | |
| cited_by_count | 0 |
| locations_count | 1 |
| best_oa_location.id | doi:10.48550/arxiv.2511.17729 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | cc-by |
| best_oa_location.pdf_url | |
| best_oa_location.version | |
| best_oa_location.raw_type | article |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | https://doi.org/10.48550/arxiv.2511.17729 |
| primary_location.id | doi:10.48550/arxiv.2511.17729 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | cc-by |
| primary_location.pdf_url | |
| primary_location.version | |
| primary_location.raw_type | article |
| primary_location.license_id | https://openalex.org/licenses/cc-by |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | https://doi.org/10.48550/arxiv.2511.17729 |
| publication_date | 2025-11-21 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 42, 53 |
| abstract_inverted_index.28 | 85 |
| abstract_inverted_index.On | 66 |
| abstract_inverted_index.We | 0, 40 |
| abstract_inverted_index.an | 96, 104 |
| abstract_inverted_index.at | 161 |
| abstract_inverted_index.in | 130, 136 |
| abstract_inverted_index.is | 160 |
| abstract_inverted_index.of | 35, 68, 121 |
| abstract_inverted_index.to | 61 |
| abstract_inverted_index.we | 71 |
| abstract_inverted_index.231 | 88 |
| abstract_inverted_index.MCP | 132 |
| abstract_inverted_index.Our | 156 |
| abstract_inverted_index.The | 16, 82 |
| abstract_inverted_index.and | 21, 28, 33, 56, 90, 117, 139, 153 |
| abstract_inverted_index.for | 6, 145 |
| abstract_inverted_index.the | 3, 12, 143 |
| abstract_inverted_index.top | 67 |
| abstract_inverted_index.use | 10 |
| abstract_inverted_index.LLMs | 125 |
| abstract_inverted_index.Task | 115 |
| abstract_inverted_index.each | 47 |
| abstract_inverted_index.four | 106 |
| abstract_inverted_index.from | 79 |
| abstract_inverted_index.gaps | 129 |
| abstract_inverted_index.need | 144 |
| abstract_inverted_index.over | 150 |
| abstract_inverted_index.that | 24, 45, 75, 147 |
| abstract_inverted_index.this | 69 |
| abstract_inverted_index.tool | 9, 48, 133, 154 |
| abstract_inverted_index.use, | 134 |
| abstract_inverted_index.with | 52, 87, 101 |
| abstract_inverted_index.& | 98 |
| abstract_inverted_index.Judge | 99 |
| abstract_inverted_index.Model | 13 |
| abstract_inverted_index.call, | 49 |
| abstract_inverted_index.first | 4 |
| abstract_inverted_index.human | 102 |
| abstract_inverted_index.judge | 111 |
| abstract_inverted_index.large | 107 |
| abstract_inverted_index.spans | 84 |
| abstract_inverted_index.text, | 152 |
| abstract_inverted_index.under | 11 |
| abstract_inverted_index.(LLMs) | 110 |
| abstract_inverted_index.across | 38 |
| abstract_inverted_index.embeds | 50 |
| abstract_inverted_index.models | 109 |
| abstract_inverted_index.obtain | 62 |
| abstract_inverted_index.reason | 149 |
| abstract_inverted_index.report | 72 |
| abstract_inverted_index.reveal | 127 |
| abstract_inverted_index.steps. | 39 |
| abstract_inverted_index.tools, | 89 |
| abstract_inverted_index.visual | 26 |
| abstract_inverted_index.(MLLMs) | 126 |
| abstract_inverted_index.Context | 14 |
| abstract_inverted_index.curated | 94 |
| abstract_inverted_index.graphs. | 155 |
| abstract_inverted_index.images, | 151 |
| abstract_inverted_index.jointly | 148 |
| abstract_inverted_index.methods | 146 |
| abstract_inverted_index.metrics | 74 |
| abstract_inverted_index.present | 1 |
| abstract_inverted_index.reports | 113 |
| abstract_inverted_index.require | 25 |
| abstract_inverted_index.servers | 86 |
| abstract_inverted_index.targets | 18 |
| abstract_inverted_index.textual | 29 |
| abstract_inverted_index.through | 95 |
| abstract_inverted_index.Executor | 97 |
| abstract_inverted_index.argument | 137 |
| abstract_inverted_index.decouple | 76 |
| abstract_inverted_index.encoder, | 55 |
| abstract_inverted_index.end-task | 114 |
| abstract_inverted_index.ensemble | 112 |
| abstract_inverted_index.fidelity | 78, 138 |
| abstract_inverted_index.language | 108 |
| abstract_inverted_index.matching | 60 |
| abstract_inverted_index.performs | 57 |
| abstract_inverted_index.pipeline | 100 |
| abstract_inverted_index.provides | 91 |
| abstract_inverted_index.semantic | 77 |
| abstract_inverted_index.sentence | 54 |
| abstract_inverted_index.workflow | 80 |
| abstract_inverted_index.Hungarian | 59 |
| abstract_inverted_index.Protocol. | 15 |
| abstract_inverted_index.alignment | 44 |
| abstract_inverted_index.anonymous | 158 |
| abstract_inverted_index.auditable | 63 |
| abstract_inverted_index.auxiliary | 105 |
| abstract_inverted_index.benchmark | 5, 17, 83 |
| abstract_inverted_index.grounding | 27 |
| abstract_inverted_index.introduce | 41 |
| abstract_inverted_index.multi-hop | 20 |
| abstract_inverted_index.resources | 37 |
| abstract_inverted_index.structure | 140 |
| abstract_inverted_index.workflows | 23 |
| abstract_inverted_index.Completion | 116 |
| abstract_inverted_index.M^3-Bench, | 2 |
| abstract_inverted_index.Multimodal | 124 |
| abstract_inverted_index.alignment, | 70 |
| abstract_inverted_index.cross-tool | 31 |
| abstract_inverted_index.evaluating | 7 |
| abstract_inverted_index.grounding. | 119 |
| abstract_inverted_index.multimodal | 8, 131 |
| abstract_inverted_index.one-to-one | 64 |
| abstract_inverted_index.persistent | 128 |
| abstract_inverted_index.realistic, | 19 |
| abstract_inverted_index.reasoning, | 30 |
| abstract_inverted_index.repository | 159 |
| abstract_inverted_index.serializes | 46 |
| abstract_inverted_index.signatures | 51 |
| abstract_inverted_index.Benchmark's | 157 |
| abstract_inverted_index.Evaluations | 120 |
| abstract_inverted_index.information | 118 |
| abstract_inverted_index.persistence | 34 |
| abstract_inverted_index.consistency, | 141 |
| abstract_inverted_index.consistency. | 81 |
| abstract_inverted_index.intermediate | 36 |
| abstract_inverted_index.particularly | 135 |
| abstract_inverted_index.standardized | 92 |
| abstract_inverted_index.trajectories | 93 |
| abstract_inverted_index.underscoring | 142 |
| abstract_inverted_index.dependencies, | 32 |
| abstract_inverted_index.interpretable | 73 |
| abstract_inverted_index.verification; | 103 |
| abstract_inverted_index.multi-threaded | 22 |
| abstract_inverted_index.representative | 122 |
| abstract_inverted_index.correspondences. | 65 |
| abstract_inverted_index.state-of-the-art | 123 |
| abstract_inverted_index.similarity-driven | 43 |
| abstract_inverted_index.similarity-bucketed | 58 |
| abstract_inverted_index.https://github.com/EtaYang10th/Open-M3-Bench | 162 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 9 |
| citation_normalized_percentile |