Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena Article Swipe
YOU?
·
· 2023
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2306.05685
Evaluating large language model (LLM) based chat assistants is challenging due to their broad capabilities and the inadequacy of existing benchmarks in measuring human preferences. To address this, we explore using strong LLMs as judges to evaluate these models on more open-ended questions. We examine the usage and limitations of LLM-as-a-judge, including position, verbosity, and self-enhancement biases, as well as limited reasoning ability, and propose solutions to mitigate some of them. We then verify the agreement between LLM judges and human preferences by introducing two benchmarks: MT-bench, a multi-turn question set; and Chatbot Arena, a crowdsourced battle platform. Our results reveal that strong LLM judges like GPT-4 can match both controlled and crowdsourced human preferences well, achieving over 80% agreement, the same level of agreement between humans. Hence, LLM-as-a-judge is a scalable and explainable way to approximate human preferences, which are otherwise very expensive to obtain. Additionally, we show our benchmark and traditional benchmarks complement each other by evaluating several variants of LLaMA and Vicuna. The MT-bench questions, 3K expert votes, and 30K conversations with human preferences are publicly available at https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2306.05685
- https://arxiv.org/pdf/2306.05685
- OA Status
- green
- Cited By
- 419
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4380353763
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4380353763Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2306.05685Digital Object Identifier
- Title
-
Judging LLM-as-a-Judge with MT-Bench and Chatbot ArenaWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2023Year of publication
- Publication date
-
2023-06-09Full publication date if available
- Authors
-
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Lin Zi, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, Ion StoicaList of authors in order
- Landing page
-
https://arxiv.org/abs/2306.05685Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2306.05685Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2306.05685Direct OA link when available
- Concepts
-
Computer science, Chatbot, Set (abstract data type), Complement (music), Benchmark (surveying), Crowdsourcing, Data science, Artificial intelligence, World Wide Web, Programming language, Complementation, Chemistry, Phenotype, Geography, Biochemistry, Geodesy, GeneTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
419Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 182, 2024: 177, 2023: 59, 2022: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4380353763 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2306.05685 |
| ids.doi | https://doi.org/10.48550/arxiv.2306.05685 |
| ids.openalex | https://openalex.org/W4380353763 |
| fwci | |
| type | preprint |
| title | Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10028 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.998199999332428 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Topic Modeling |
| topics[1].id | https://openalex.org/T12031 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9966999888420105 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Speech and dialogue systems |
| topics[2].id | https://openalex.org/T12128 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9829999804496765 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | AI in Service Interactions |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.6617642641067505 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C2779041454 |
| concepts[1].level | 2 |
| concepts[1].score | 0.6229898929595947 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q870780 |
| concepts[1].display_name | Chatbot |
| concepts[2].id | https://openalex.org/C177264268 |
| concepts[2].level | 2 |
| concepts[2].score | 0.5745055079460144 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q1514741 |
| concepts[2].display_name | Set (abstract data type) |
| concepts[3].id | https://openalex.org/C112313634 |
| concepts[3].level | 5 |
| concepts[3].score | 0.5706790685653687 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q7886648 |
| concepts[3].display_name | Complement (music) |
| concepts[4].id | https://openalex.org/C185798385 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5528709292411804 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q1161707 |
| concepts[4].display_name | Benchmark (surveying) |
| concepts[5].id | https://openalex.org/C62230096 |
| concepts[5].level | 2 |
| concepts[5].score | 0.5047496557235718 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q275969 |
| concepts[5].display_name | Crowdsourcing |
| concepts[6].id | https://openalex.org/C2522767166 |
| concepts[6].level | 1 |
| concepts[6].score | 0.41879650950431824 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q2374463 |
| concepts[6].display_name | Data science |
| concepts[7].id | https://openalex.org/C154945302 |
| concepts[7].level | 1 |
| concepts[7].score | 0.3230215311050415 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[7].display_name | Artificial intelligence |
| concepts[8].id | https://openalex.org/C136764020 |
| concepts[8].level | 1 |
| concepts[8].score | 0.21174326539039612 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q466 |
| concepts[8].display_name | World Wide Web |
| concepts[9].id | https://openalex.org/C199360897 |
| concepts[9].level | 1 |
| concepts[9].score | 0.08537787199020386 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q9143 |
| concepts[9].display_name | Programming language |
| concepts[10].id | https://openalex.org/C188082640 |
| concepts[10].level | 4 |
| concepts[10].score | 0.0 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q1780899 |
| concepts[10].display_name | Complementation |
| concepts[11].id | https://openalex.org/C185592680 |
| concepts[11].level | 0 |
| concepts[11].score | 0.0 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q2329 |
| concepts[11].display_name | Chemistry |
| concepts[12].id | https://openalex.org/C127716648 |
| concepts[12].level | 3 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q104053 |
| concepts[12].display_name | Phenotype |
| concepts[13].id | https://openalex.org/C205649164 |
| concepts[13].level | 0 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q1071 |
| concepts[13].display_name | Geography |
| concepts[14].id | https://openalex.org/C55493867 |
| concepts[14].level | 1 |
| concepts[14].score | 0.0 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q7094 |
| concepts[14].display_name | Biochemistry |
| concepts[15].id | https://openalex.org/C13280743 |
| concepts[15].level | 1 |
| concepts[15].score | 0.0 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q131089 |
| concepts[15].display_name | Geodesy |
| concepts[16].id | https://openalex.org/C104317684 |
| concepts[16].level | 2 |
| concepts[16].score | 0.0 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q7187 |
| concepts[16].display_name | Gene |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.6617642641067505 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/chatbot |
| keywords[1].score | 0.6229898929595947 |
| keywords[1].display_name | Chatbot |
| keywords[2].id | https://openalex.org/keywords/set |
| keywords[2].score | 0.5745055079460144 |
| keywords[2].display_name | Set (abstract data type) |
| keywords[3].id | https://openalex.org/keywords/complement |
| keywords[3].score | 0.5706790685653687 |
| keywords[3].display_name | Complement (music) |
| keywords[4].id | https://openalex.org/keywords/benchmark |
| keywords[4].score | 0.5528709292411804 |
| keywords[4].display_name | Benchmark (surveying) |
| keywords[5].id | https://openalex.org/keywords/crowdsourcing |
| keywords[5].score | 0.5047496557235718 |
| keywords[5].display_name | Crowdsourcing |
| keywords[6].id | https://openalex.org/keywords/data-science |
| keywords[6].score | 0.41879650950431824 |
| keywords[6].display_name | Data science |
| keywords[7].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[7].score | 0.3230215311050415 |
| keywords[7].display_name | Artificial intelligence |
| keywords[8].id | https://openalex.org/keywords/world-wide-web |
| keywords[8].score | 0.21174326539039612 |
| keywords[8].display_name | World Wide Web |
| keywords[9].id | https://openalex.org/keywords/programming-language |
| keywords[9].score | 0.08537787199020386 |
| keywords[9].display_name | Programming language |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2306.05685 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2306.05685 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2306.05685 |
| locations[1].id | doi:10.48550/arxiv.2306.05685 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2306.05685 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5061339425 |
| authorships[0].author.orcid | https://orcid.org/0000-0001-5812-731X |
| authorships[0].author.display_name | Lianmin Zheng |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Zheng, Lianmin |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5009640963 |
| authorships[1].author.orcid | https://orcid.org/0009-0009-0105-723X |
| authorships[1].author.display_name | Wei-Lin Chiang |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Chiang, Wei-Lin |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5058825905 |
| authorships[2].author.orcid | https://orcid.org/0000-0003-2311-3393 |
| authorships[2].author.display_name | Ying Sheng |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Sheng, Ying |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5083595182 |
| authorships[3].author.orcid | https://orcid.org/0009-0007-3787-0316 |
| authorships[3].author.display_name | Siyuan Zhuang |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Zhuang, Siyuan |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5082308834 |
| authorships[4].author.orcid | https://orcid.org/0000-0003-2352-4002 |
| authorships[4].author.display_name | Zhanghao Wu |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Wu, Zhanghao |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5076407338 |
| authorships[5].author.orcid | |
| authorships[5].author.display_name | Yonghao Zhuang |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Zhuang, Yonghao |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5103777591 |
| authorships[6].author.orcid | https://orcid.org/0009-0008-0052-2817 |
| authorships[6].author.display_name | Lin Zi |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Lin, Zi |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5009362721 |
| authorships[7].author.orcid | https://orcid.org/0000-0001-5372-9450 |
| authorships[7].author.display_name | Zhuohan Li |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Li, Zhuohan |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A5082252423 |
| authorships[8].author.orcid | https://orcid.org/0000-0002-0487-2581 |
| authorships[8].author.display_name | Dacheng Li |
| authorships[8].author_position | middle |
| authorships[8].raw_author_name | Li, Dacheng |
| authorships[8].is_corresponding | False |
| authorships[9].author.id | https://openalex.org/A5009547049 |
| authorships[9].author.orcid | https://orcid.org/0009-0005-9158-4201 |
| authorships[9].author.display_name | Eric P. Xing |
| authorships[9].author_position | middle |
| authorships[9].raw_author_name | Xing, Eric. P |
| authorships[9].is_corresponding | False |
| authorships[10].author.id | https://openalex.org/A5092145402 |
| authorships[10].author.orcid | https://orcid.org/0009-0003-8392-3977 |
| authorships[10].author.display_name | Hao Zhang |
| authorships[10].author_position | middle |
| authorships[10].raw_author_name | Zhang, Hao |
| authorships[10].is_corresponding | False |
| authorships[11].author.id | https://openalex.org/A5072427753 |
| authorships[11].author.orcid | https://orcid.org/0000-0003-2921-956X |
| authorships[11].author.display_name | Joseph E. Gonzalez |
| authorships[11].author_position | middle |
| authorships[11].raw_author_name | Gonzalez, Joseph E. |
| authorships[11].is_corresponding | False |
| authorships[12].author.id | https://openalex.org/A5041920173 |
| authorships[12].author.orcid | https://orcid.org/0000-0002-5373-0088 |
| authorships[12].author.display_name | Ion Stoica |
| authorships[12].author_position | last |
| authorships[12].raw_author_name | Stoica, Ion |
| authorships[12].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2306.05685 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10028 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.998199999332428 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Topic Modeling |
| related_works | https://openalex.org/W4383501580, https://openalex.org/W3032998312, https://openalex.org/W4214931137, https://openalex.org/W4387007686, https://openalex.org/W4313813117, https://openalex.org/W135177976, https://openalex.org/W4382052417, https://openalex.org/W3192088754, https://openalex.org/W4384486036, https://openalex.org/W1503094549 |
| cited_by_count | 419 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 182 |
| counts_by_year[1].year | 2024 |
| counts_by_year[1].cited_by_count | 177 |
| counts_by_year[2].year | 2023 |
| counts_by_year[2].cited_by_count | 59 |
| counts_by_year[3].year | 2022 |
| counts_by_year[3].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2306.05685 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2306.05685 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2306.05685 |
| primary_location.id | pmh:oai:arXiv.org:2306.05685 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2306.05685 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2306.05685 |
| publication_date | 2023-06-09 |
| publication_year | 2023 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 87, 94, 130 |
| abstract_inverted_index.3K | 168 |
| abstract_inverted_index.To | 25 |
| abstract_inverted_index.We | 43, 71 |
| abstract_inverted_index.as | 33, 57, 59 |
| abstract_inverted_index.at | 180 |
| abstract_inverted_index.by | 82, 157 |
| abstract_inverted_index.in | 21 |
| abstract_inverted_index.is | 8, 129 |
| abstract_inverted_index.of | 18, 49, 69, 123, 161 |
| abstract_inverted_index.on | 39 |
| abstract_inverted_index.to | 11, 35, 66, 135, 144 |
| abstract_inverted_index.we | 28, 147 |
| abstract_inverted_index.30K | 172 |
| abstract_inverted_index.80% | 118 |
| abstract_inverted_index.LLM | 77, 103 |
| abstract_inverted_index.Our | 98 |
| abstract_inverted_index.The | 165 |
| abstract_inverted_index.and | 15, 47, 54, 63, 79, 91, 111, 132, 151, 163, 171 |
| abstract_inverted_index.are | 140, 177 |
| abstract_inverted_index.can | 107 |
| abstract_inverted_index.due | 10 |
| abstract_inverted_index.our | 149 |
| abstract_inverted_index.the | 16, 45, 74, 120 |
| abstract_inverted_index.two | 84 |
| abstract_inverted_index.way | 134 |
| abstract_inverted_index.LLMs | 32 |
| abstract_inverted_index.both | 109 |
| abstract_inverted_index.chat | 6 |
| abstract_inverted_index.each | 155 |
| abstract_inverted_index.like | 105 |
| abstract_inverted_index.more | 40 |
| abstract_inverted_index.over | 117 |
| abstract_inverted_index.same | 121 |
| abstract_inverted_index.set; | 90 |
| abstract_inverted_index.show | 148 |
| abstract_inverted_index.some | 68 |
| abstract_inverted_index.that | 101 |
| abstract_inverted_index.then | 72 |
| abstract_inverted_index.very | 142 |
| abstract_inverted_index.well | 58 |
| abstract_inverted_index.with | 174 |
| abstract_inverted_index.(LLM) | 4 |
| abstract_inverted_index.GPT-4 | 106 |
| abstract_inverted_index.LLaMA | 162 |
| abstract_inverted_index.based | 5 |
| abstract_inverted_index.broad | 13 |
| abstract_inverted_index.human | 23, 80, 113, 137, 175 |
| abstract_inverted_index.large | 1 |
| abstract_inverted_index.level | 122 |
| abstract_inverted_index.match | 108 |
| abstract_inverted_index.model | 3 |
| abstract_inverted_index.other | 156 |
| abstract_inverted_index.their | 12 |
| abstract_inverted_index.them. | 70 |
| abstract_inverted_index.these | 37 |
| abstract_inverted_index.this, | 27 |
| abstract_inverted_index.usage | 46 |
| abstract_inverted_index.using | 30 |
| abstract_inverted_index.well, | 115 |
| abstract_inverted_index.which | 139 |
| abstract_inverted_index.Arena, | 93 |
| abstract_inverted_index.Hence, | 127 |
| abstract_inverted_index.battle | 96 |
| abstract_inverted_index.expert | 169 |
| abstract_inverted_index.judges | 34, 78, 104 |
| abstract_inverted_index.models | 38 |
| abstract_inverted_index.reveal | 100 |
| abstract_inverted_index.strong | 31, 102 |
| abstract_inverted_index.verify | 73 |
| abstract_inverted_index.votes, | 170 |
| abstract_inverted_index.Chatbot | 92 |
| abstract_inverted_index.Vicuna. | 164 |
| abstract_inverted_index.address | 26 |
| abstract_inverted_index.between | 76, 125 |
| abstract_inverted_index.biases, | 56 |
| abstract_inverted_index.examine | 44 |
| abstract_inverted_index.explore | 29 |
| abstract_inverted_index.humans. | 126 |
| abstract_inverted_index.limited | 60 |
| abstract_inverted_index.obtain. | 145 |
| abstract_inverted_index.propose | 64 |
| abstract_inverted_index.results | 99 |
| abstract_inverted_index.several | 159 |
| abstract_inverted_index.MT-bench | 166 |
| abstract_inverted_index.ability, | 62 |
| abstract_inverted_index.evaluate | 36 |
| abstract_inverted_index.existing | 19 |
| abstract_inverted_index.language | 2 |
| abstract_inverted_index.mitigate | 67 |
| abstract_inverted_index.publicly | 178 |
| abstract_inverted_index.question | 89 |
| abstract_inverted_index.scalable | 131 |
| abstract_inverted_index.variants | 160 |
| abstract_inverted_index.MT-bench, | 86 |
| abstract_inverted_index.achieving | 116 |
| abstract_inverted_index.agreement | 75, 124 |
| abstract_inverted_index.available | 179 |
| abstract_inverted_index.benchmark | 150 |
| abstract_inverted_index.expensive | 143 |
| abstract_inverted_index.including | 51 |
| abstract_inverted_index.measuring | 22 |
| abstract_inverted_index.otherwise | 141 |
| abstract_inverted_index.platform. | 97 |
| abstract_inverted_index.position, | 52 |
| abstract_inverted_index.reasoning | 61 |
| abstract_inverted_index.solutions | 65 |
| abstract_inverted_index.Evaluating | 0 |
| abstract_inverted_index.agreement, | 119 |
| abstract_inverted_index.assistants | 7 |
| abstract_inverted_index.benchmarks | 20, 153 |
| abstract_inverted_index.complement | 154 |
| abstract_inverted_index.controlled | 110 |
| abstract_inverted_index.evaluating | 158 |
| abstract_inverted_index.inadequacy | 17 |
| abstract_inverted_index.multi-turn | 88 |
| abstract_inverted_index.open-ended | 41 |
| abstract_inverted_index.questions, | 167 |
| abstract_inverted_index.questions. | 42 |
| abstract_inverted_index.verbosity, | 53 |
| abstract_inverted_index.approximate | 136 |
| abstract_inverted_index.benchmarks: | 85 |
| abstract_inverted_index.challenging | 9 |
| abstract_inverted_index.explainable | 133 |
| abstract_inverted_index.introducing | 83 |
| abstract_inverted_index.limitations | 48 |
| abstract_inverted_index.preferences | 81, 114, 176 |
| abstract_inverted_index.traditional | 152 |
| abstract_inverted_index.capabilities | 14 |
| abstract_inverted_index.crowdsourced | 95, 112 |
| abstract_inverted_index.preferences, | 138 |
| abstract_inverted_index.preferences. | 24 |
| abstract_inverted_index.Additionally, | 146 |
| abstract_inverted_index.conversations | 173 |
| abstract_inverted_index.LLM-as-a-judge | 128 |
| abstract_inverted_index.LLM-as-a-judge, | 50 |
| abstract_inverted_index.self-enhancement | 55 |
| abstract_inverted_index.https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge. | 181 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 13 |
| citation_normalized_percentile |