Multi-group Uncertainty Quantification for Long-form Text Generation Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2407.21057
While past works have shown how uncertainty quantification can be applied to large language model (LLM) outputs, the question of whether resulting uncertainty guarantees still hold within sub-groupings of data remains open. In our work, given some long-form text generated by an LLM, we study uncertainty at both the level of individual claims contained within the output (via calibration) and across the entire output itself (via conformal prediction). Using biography generation as a testbed for this study, we derive a set of (demographic) attributes (e.g., whether some text describes a man or woman) for each generation to form such "subgroups" of data. We find that although canonical methods for both types of uncertainty quantification perform well when measuring across the entire dataset, such guarantees break down when examining particular subgroups. Having established this issue, we invoke group-conditional methods for uncertainty quantification -- multicalibration and multivalid conformal prediction -- and find that across a variety of approaches, additional subgroup information consistently improves calibration and conformal prediction within subgroups (while crucially retaining guarantees across the entire dataset). As the problems of calibration, conformal prediction, and their multi-group counterparts have not been extensively explored in the context of long-form text generation, we consider these results to form a benchmark for this setting.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2407.21057
- https://arxiv.org/pdf/2407.21057
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4401306282
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4401306282Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2407.21057Digital Object Identifier
- Title
-
Multi-group Uncertainty Quantification for Long-form Text GenerationWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-07-25Full publication date if available
- Authors
-
Tongxuan Liu, Zhiwei Steven WuList of authors in order
- Landing page
-
https://arxiv.org/abs/2407.21057Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2407.21057Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2407.21057Direct OA link when available
- Concepts
-
Group (periodic table), Uncertainty quantification, Computer science, Chemistry, Machine learning, Organic chemistryTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4401306282 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2407.21057 |
| ids.doi | https://doi.org/10.48550/arxiv.2407.21057 |
| ids.openalex | https://openalex.org/W4401306282 |
| fwci | |
| type | preprint |
| title | Multi-group Uncertainty Quantification for Long-form Text Generation |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10181 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9549000263214111 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Natural Language Processing Techniques |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C2781311116 |
| concepts[0].level | 2 |
| concepts[0].score | 0.5596073269844055 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q83306 |
| concepts[0].display_name | Group (periodic table) |
| concepts[1].id | https://openalex.org/C32230216 |
| concepts[1].level | 2 |
| concepts[1].score | 0.419118732213974 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q7882499 |
| concepts[1].display_name | Uncertainty quantification |
| concepts[2].id | https://openalex.org/C41008148 |
| concepts[2].level | 0 |
| concepts[2].score | 0.4097161293029785 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[2].display_name | Computer science |
| concepts[3].id | https://openalex.org/C185592680 |
| concepts[3].level | 0 |
| concepts[3].score | 0.16544640064239502 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q2329 |
| concepts[3].display_name | Chemistry |
| concepts[4].id | https://openalex.org/C119857082 |
| concepts[4].level | 1 |
| concepts[4].score | 0.109456866979599 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q2539 |
| concepts[4].display_name | Machine learning |
| concepts[5].id | https://openalex.org/C178790620 |
| concepts[5].level | 1 |
| concepts[5].score | 0.056021809577941895 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q11351 |
| concepts[5].display_name | Organic chemistry |
| keywords[0].id | https://openalex.org/keywords/group |
| keywords[0].score | 0.5596073269844055 |
| keywords[0].display_name | Group (periodic table) |
| keywords[1].id | https://openalex.org/keywords/uncertainty-quantification |
| keywords[1].score | 0.419118732213974 |
| keywords[1].display_name | Uncertainty quantification |
| keywords[2].id | https://openalex.org/keywords/computer-science |
| keywords[2].score | 0.4097161293029785 |
| keywords[2].display_name | Computer science |
| keywords[3].id | https://openalex.org/keywords/chemistry |
| keywords[3].score | 0.16544640064239502 |
| keywords[3].display_name | Chemistry |
| keywords[4].id | https://openalex.org/keywords/machine-learning |
| keywords[4].score | 0.109456866979599 |
| keywords[4].display_name | Machine learning |
| keywords[5].id | https://openalex.org/keywords/organic-chemistry |
| keywords[5].score | 0.056021809577941895 |
| keywords[5].display_name | Organic chemistry |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2407.21057 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2407.21057 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2407.21057 |
| locations[1].id | doi:10.48550/arxiv.2407.21057 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2407.21057 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5072530962 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Tongxuan Liu |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Liu, Terrance |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5001070941 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-8125-8227 |
| authorships[1].author.display_name | Zhiwei Steven Wu |
| authorships[1].author_position | last |
| authorships[1].raw_author_name | Wu, Zhiwei Steven |
| authorships[1].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2407.21057 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Multi-group Uncertainty Quantification for Long-form Text Generation |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10181 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9549000263214111 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Natural Language Processing Techniques |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W2358668433, https://openalex.org/W4396701345, https://openalex.org/W2376932109, https://openalex.org/W2001405890, https://openalex.org/W1991093342, https://openalex.org/W4396696052, https://openalex.org/W2382290278 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2407.21057 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2407.21057 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2407.21057 |
| primary_location.id | pmh:oai:arXiv.org:2407.21057 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2407.21057 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2407.21057 |
| publication_date | 2024-07-25 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 72, 79, 89, 152, 204 |
| abstract_inverted_index.-- | 141, 147 |
| abstract_inverted_index.As | 175 |
| abstract_inverted_index.In | 32 |
| abstract_inverted_index.We | 102 |
| abstract_inverted_index.an | 41 |
| abstract_inverted_index.as | 71 |
| abstract_inverted_index.at | 46 |
| abstract_inverted_index.be | 9 |
| abstract_inverted_index.by | 40 |
| abstract_inverted_index.in | 191 |
| abstract_inverted_index.of | 19, 28, 50, 81, 100, 111, 154, 178, 194 |
| abstract_inverted_index.or | 91 |
| abstract_inverted_index.to | 11, 96, 202 |
| abstract_inverted_index.we | 43, 77, 134, 198 |
| abstract_inverted_index.and | 59, 143, 148, 162, 182 |
| abstract_inverted_index.can | 8 |
| abstract_inverted_index.for | 74, 93, 108, 138, 206 |
| abstract_inverted_index.how | 5 |
| abstract_inverted_index.man | 90 |
| abstract_inverted_index.not | 187 |
| abstract_inverted_index.our | 33 |
| abstract_inverted_index.set | 80 |
| abstract_inverted_index.the | 17, 48, 55, 61, 119, 172, 176, 192 |
| abstract_inverted_index.(via | 57, 65 |
| abstract_inverted_index.LLM, | 42 |
| abstract_inverted_index.been | 188 |
| abstract_inverted_index.both | 47, 109 |
| abstract_inverted_index.data | 29 |
| abstract_inverted_index.down | 125 |
| abstract_inverted_index.each | 94 |
| abstract_inverted_index.find | 103, 149 |
| abstract_inverted_index.form | 97, 203 |
| abstract_inverted_index.have | 3, 186 |
| abstract_inverted_index.hold | 25 |
| abstract_inverted_index.past | 1 |
| abstract_inverted_index.some | 36, 86 |
| abstract_inverted_index.such | 98, 122 |
| abstract_inverted_index.text | 38, 87, 196 |
| abstract_inverted_index.that | 104, 150 |
| abstract_inverted_index.this | 75, 132, 207 |
| abstract_inverted_index.well | 115 |
| abstract_inverted_index.when | 116, 126 |
| abstract_inverted_index.(LLM) | 15 |
| abstract_inverted_index.Using | 68 |
| abstract_inverted_index.While | 0 |
| abstract_inverted_index.break | 124 |
| abstract_inverted_index.data. | 101 |
| abstract_inverted_index.given | 35 |
| abstract_inverted_index.large | 12 |
| abstract_inverted_index.level | 49 |
| abstract_inverted_index.model | 14 |
| abstract_inverted_index.open. | 31 |
| abstract_inverted_index.shown | 4 |
| abstract_inverted_index.still | 24 |
| abstract_inverted_index.study | 44 |
| abstract_inverted_index.their | 183 |
| abstract_inverted_index.these | 200 |
| abstract_inverted_index.types | 110 |
| abstract_inverted_index.work, | 34 |
| abstract_inverted_index.works | 2 |
| abstract_inverted_index.(e.g., | 84 |
| abstract_inverted_index.(while | 167 |
| abstract_inverted_index.Having | 130 |
| abstract_inverted_index.across | 60, 118, 151, 171 |
| abstract_inverted_index.claims | 52 |
| abstract_inverted_index.derive | 78 |
| abstract_inverted_index.entire | 62, 120, 173 |
| abstract_inverted_index.invoke | 135 |
| abstract_inverted_index.issue, | 133 |
| abstract_inverted_index.itself | 64 |
| abstract_inverted_index.output | 56, 63 |
| abstract_inverted_index.study, | 76 |
| abstract_inverted_index.within | 26, 54, 165 |
| abstract_inverted_index.woman) | 92 |
| abstract_inverted_index.applied | 10 |
| abstract_inverted_index.context | 193 |
| abstract_inverted_index.methods | 107, 137 |
| abstract_inverted_index.perform | 114 |
| abstract_inverted_index.remains | 30 |
| abstract_inverted_index.results | 201 |
| abstract_inverted_index.testbed | 73 |
| abstract_inverted_index.variety | 153 |
| abstract_inverted_index.whether | 20, 85 |
| abstract_inverted_index.although | 105 |
| abstract_inverted_index.consider | 199 |
| abstract_inverted_index.dataset, | 121 |
| abstract_inverted_index.explored | 190 |
| abstract_inverted_index.improves | 160 |
| abstract_inverted_index.language | 13 |
| abstract_inverted_index.outputs, | 16 |
| abstract_inverted_index.problems | 177 |
| abstract_inverted_index.question | 18 |
| abstract_inverted_index.setting. | 208 |
| abstract_inverted_index.subgroup | 157 |
| abstract_inverted_index.benchmark | 205 |
| abstract_inverted_index.biography | 69 |
| abstract_inverted_index.canonical | 106 |
| abstract_inverted_index.conformal | 66, 145, 163, 180 |
| abstract_inverted_index.contained | 53 |
| abstract_inverted_index.crucially | 168 |
| abstract_inverted_index.dataset). | 174 |
| abstract_inverted_index.describes | 88 |
| abstract_inverted_index.examining | 127 |
| abstract_inverted_index.generated | 39 |
| abstract_inverted_index.long-form | 37, 195 |
| abstract_inverted_index.measuring | 117 |
| abstract_inverted_index.resulting | 21 |
| abstract_inverted_index.retaining | 169 |
| abstract_inverted_index.subgroups | 166 |
| abstract_inverted_index.additional | 156 |
| abstract_inverted_index.attributes | 83 |
| abstract_inverted_index.generation | 70, 95 |
| abstract_inverted_index.guarantees | 23, 123, 170 |
| abstract_inverted_index.individual | 51 |
| abstract_inverted_index.multivalid | 144 |
| abstract_inverted_index.particular | 128 |
| abstract_inverted_index.prediction | 146, 164 |
| abstract_inverted_index.subgroups. | 129 |
| abstract_inverted_index."subgroups" | 99 |
| abstract_inverted_index.approaches, | 155 |
| abstract_inverted_index.calibration | 161 |
| abstract_inverted_index.established | 131 |
| abstract_inverted_index.extensively | 189 |
| abstract_inverted_index.generation, | 197 |
| abstract_inverted_index.information | 158 |
| abstract_inverted_index.multi-group | 184 |
| abstract_inverted_index.prediction, | 181 |
| abstract_inverted_index.uncertainty | 6, 22, 45, 112, 139 |
| abstract_inverted_index.calibration) | 58 |
| abstract_inverted_index.calibration, | 179 |
| abstract_inverted_index.consistently | 159 |
| abstract_inverted_index.counterparts | 185 |
| abstract_inverted_index.prediction). | 67 |
| abstract_inverted_index.(demographic) | 82 |
| abstract_inverted_index.sub-groupings | 27 |
| abstract_inverted_index.quantification | 7, 113, 140 |
| abstract_inverted_index.multicalibration | 142 |
| abstract_inverted_index.group-conditional | 136 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 2 |
| citation_normalized_percentile |