SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for reference-free open-ended text Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2411.16077
Large Language Model (LLM) integrations into applications like Microsoft365 suite and Google Workspace for creating/processing documents, emails, presentations, etc. has led to considerable enhancements in productivity and time savings. But as these integrations become more more complex, it is paramount to ensure that the quality of output from the LLM-integrated applications are relevant and appropriate for use. Identifying the need to develop robust evaluation approaches for natural language generation, wherein references/ground labels doesn't exist or isn't amply available, this paper introduces a novel framework called "SAGEval" which utilizes a critiquing Agent to provide feedback on scores generated by LLM evaluators. We show that the critiquing Agent is able to rectify scores from LLM evaluators, in absence of references/ground-truth labels, thereby reducing the need for labeled data even for complex NLG evaluation scenarios, like the generation of JSON-structured forms/surveys with responses in different styles like multiple choice, likert ratings, single choice questions, etc.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2411.16077
- https://arxiv.org/pdf/2411.16077
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4404987074
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4404987074Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2411.16077Digital Object Identifier
- Title
-
SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for reference-free open-ended textWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-11-25Full publication date if available
- Authors
-
Reshmi Ghosh, Tianyi Yao, L. Chen, Sadid A. Hasan, Tianwei Chen, Dario Bernal, H. Jiao, H M Sajjad HossainList of authors in order
- Landing page
-
https://arxiv.org/abs/2411.16077Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2411.16077Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2411.16077Direct OA link when available
- Concepts
-
Computer science, Information retrieval, Artificial intelligence, Natural language processingTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4404987074 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2411.16077 |
| ids.doi | https://doi.org/10.48550/arxiv.2411.16077 |
| ids.openalex | https://openalex.org/W4404987074 |
| fwci | 0.0 |
| type | preprint |
| title | SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for reference-free open-ended text |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T13083 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.8234999775886536 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Advanced Text Analysis Techniques |
| topics[1].id | https://openalex.org/T10028 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.8116999864578247 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Topic Modeling |
| topics[2].id | https://openalex.org/T12031 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.8051000237464905 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Speech and dialogue systems |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.5904430150985718 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C23123220 |
| concepts[1].level | 1 |
| concepts[1].score | 0.49140465259552 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q816826 |
| concepts[1].display_name | Information retrieval |
| concepts[2].id | https://openalex.org/C154945302 |
| concepts[2].level | 1 |
| concepts[2].score | 0.407698392868042 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[2].display_name | Artificial intelligence |
| concepts[3].id | https://openalex.org/C204321447 |
| concepts[3].level | 1 |
| concepts[3].score | 0.36968499422073364 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[3].display_name | Natural language processing |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.5904430150985718 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/information-retrieval |
| keywords[1].score | 0.49140465259552 |
| keywords[1].display_name | Information retrieval |
| keywords[2].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[2].score | 0.407698392868042 |
| keywords[2].display_name | Artificial intelligence |
| keywords[3].id | https://openalex.org/keywords/natural-language-processing |
| keywords[3].score | 0.36968499422073364 |
| keywords[3].display_name | Natural language processing |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2411.16077 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2411.16077 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2411.16077 |
| locations[1].id | doi:10.48550/arxiv.2411.16077 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2411.16077 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5019507987 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-1814-2133 |
| authorships[0].author.display_name | Reshmi Ghosh |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Ghosh, Reshmi |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5103122148 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-8380-2866 |
| authorships[1].author.display_name | Tianyi Yao |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Yao, Tianyi |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5113334610 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | L. Chen |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Chen, Lizzy |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5103789387 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-5665-7752 |
| authorships[3].author.display_name | Sadid A. Hasan |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Hasan, Sadid |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5020041714 |
| authorships[4].author.orcid | https://orcid.org/0000-0001-8225-3244 |
| authorships[4].author.display_name | Tianwei Chen |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Chen, Tianwei |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5114987714 |
| authorships[5].author.orcid | |
| authorships[5].author.display_name | Dario Bernal |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Bernal, Dario |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5109612501 |
| authorships[6].author.orcid | |
| authorships[6].author.display_name | H. Jiao |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Jiao, Huitian |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5027555291 |
| authorships[7].author.orcid | https://orcid.org/0000-0001-9847-6493 |
| authorships[7].author.display_name | H M Sajjad Hossain |
| authorships[7].author_position | last |
| authorships[7].raw_author_name | Hossain, H M Sajjad |
| authorships[7].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2411.16077 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for reference-free open-ended text |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T13083 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.8234999775886536 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Advanced Text Analysis Techniques |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2899084033, https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W4391913857, https://openalex.org/W2358668433, https://openalex.org/W4396701345, https://openalex.org/W2376932109, https://openalex.org/W2001405890, https://openalex.org/W3204019825 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2411.16077 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2411.16077 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2411.16077 |
| primary_location.id | pmh:oai:arXiv.org:2411.16077 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2411.16077 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2411.16077 |
| publication_date | 2024-11-25 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 81, 88 |
| abstract_inverted_index.We | 100 |
| abstract_inverted_index.as | 30 |
| abstract_inverted_index.by | 97 |
| abstract_inverted_index.in | 24, 114, 140 |
| abstract_inverted_index.is | 38, 106 |
| abstract_inverted_index.it | 37 |
| abstract_inverted_index.of | 45, 116, 135 |
| abstract_inverted_index.on | 94 |
| abstract_inverted_index.or | 74 |
| abstract_inverted_index.to | 21, 40, 60, 91, 108 |
| abstract_inverted_index.But | 29 |
| abstract_inverted_index.LLM | 98, 112 |
| abstract_inverted_index.NLG | 129 |
| abstract_inverted_index.and | 10, 26, 53 |
| abstract_inverted_index.are | 51 |
| abstract_inverted_index.for | 13, 55, 65, 123, 127 |
| abstract_inverted_index.has | 19 |
| abstract_inverted_index.led | 20 |
| abstract_inverted_index.the | 43, 48, 58, 103, 121, 133 |
| abstract_inverted_index.able | 107 |
| abstract_inverted_index.data | 125 |
| abstract_inverted_index.etc. | 18, 151 |
| abstract_inverted_index.even | 126 |
| abstract_inverted_index.from | 47, 111 |
| abstract_inverted_index.into | 5 |
| abstract_inverted_index.like | 7, 132, 143 |
| abstract_inverted_index.more | 34, 35 |
| abstract_inverted_index.need | 59, 122 |
| abstract_inverted_index.show | 101 |
| abstract_inverted_index.that | 42, 102 |
| abstract_inverted_index.this | 78 |
| abstract_inverted_index.time | 27 |
| abstract_inverted_index.use. | 56 |
| abstract_inverted_index.with | 138 |
| abstract_inverted_index.(LLM) | 3 |
| abstract_inverted_index.Agent | 90, 105 |
| abstract_inverted_index.Large | 0 |
| abstract_inverted_index.Model | 2 |
| abstract_inverted_index.amply | 76 |
| abstract_inverted_index.exist | 73 |
| abstract_inverted_index.isn't | 75 |
| abstract_inverted_index.novel | 82 |
| abstract_inverted_index.paper | 79 |
| abstract_inverted_index.suite | 9 |
| abstract_inverted_index.these | 31 |
| abstract_inverted_index.which | 86 |
| abstract_inverted_index.Google | 11 |
| abstract_inverted_index.become | 33 |
| abstract_inverted_index.called | 84 |
| abstract_inverted_index.choice | 149 |
| abstract_inverted_index.ensure | 41 |
| abstract_inverted_index.labels | 71 |
| abstract_inverted_index.likert | 146 |
| abstract_inverted_index.output | 46 |
| abstract_inverted_index.robust | 62 |
| abstract_inverted_index.scores | 95, 110 |
| abstract_inverted_index.single | 148 |
| abstract_inverted_index.styles | 142 |
| abstract_inverted_index.absence | 115 |
| abstract_inverted_index.choice, | 145 |
| abstract_inverted_index.complex | 128 |
| abstract_inverted_index.develop | 61 |
| abstract_inverted_index.doesn't | 72 |
| abstract_inverted_index.emails, | 16 |
| abstract_inverted_index.labeled | 124 |
| abstract_inverted_index.labels, | 118 |
| abstract_inverted_index.natural | 66 |
| abstract_inverted_index.provide | 92 |
| abstract_inverted_index.quality | 44 |
| abstract_inverted_index.rectify | 109 |
| abstract_inverted_index.thereby | 119 |
| abstract_inverted_index.wherein | 69 |
| abstract_inverted_index.Language | 1 |
| abstract_inverted_index.complex, | 36 |
| abstract_inverted_index.feedback | 93 |
| abstract_inverted_index.language | 67 |
| abstract_inverted_index.multiple | 144 |
| abstract_inverted_index.ratings, | 147 |
| abstract_inverted_index.reducing | 120 |
| abstract_inverted_index.relevant | 52 |
| abstract_inverted_index.savings. | 28 |
| abstract_inverted_index.utilizes | 87 |
| abstract_inverted_index."SAGEval" | 85 |
| abstract_inverted_index.Workspace | 12 |
| abstract_inverted_index.different | 141 |
| abstract_inverted_index.framework | 83 |
| abstract_inverted_index.generated | 96 |
| abstract_inverted_index.paramount | 39 |
| abstract_inverted_index.responses | 139 |
| abstract_inverted_index.approaches | 64 |
| abstract_inverted_index.available, | 77 |
| abstract_inverted_index.critiquing | 89, 104 |
| abstract_inverted_index.documents, | 15 |
| abstract_inverted_index.evaluation | 63, 130 |
| abstract_inverted_index.generation | 134 |
| abstract_inverted_index.introduces | 80 |
| abstract_inverted_index.questions, | 150 |
| abstract_inverted_index.scenarios, | 131 |
| abstract_inverted_index.Identifying | 57 |
| abstract_inverted_index.appropriate | 54 |
| abstract_inverted_index.evaluators, | 113 |
| abstract_inverted_index.evaluators. | 99 |
| abstract_inverted_index.generation, | 68 |
| abstract_inverted_index.Microsoft365 | 8 |
| abstract_inverted_index.applications | 6, 50 |
| abstract_inverted_index.considerable | 22 |
| abstract_inverted_index.enhancements | 23 |
| abstract_inverted_index.integrations | 4, 32 |
| abstract_inverted_index.productivity | 25 |
| abstract_inverted_index.forms/surveys | 137 |
| abstract_inverted_index.LLM-integrated | 49 |
| abstract_inverted_index.presentations, | 17 |
| abstract_inverted_index.JSON-structured | 136 |
| abstract_inverted_index.references/ground | 70 |
| abstract_inverted_index.creating/processing | 14 |
| abstract_inverted_index.references/ground-truth | 117 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 8 |
| citation_normalized_percentile |