TarGEN: Targeted Data Generation with Large Language Models Article Swipe
YOU?
·
· 2023
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2310.17876
The rapid advancement of large language models (LLMs) has sparked interest in data synthesis techniques, aiming to generate diverse and high-quality synthetic datasets. However, these synthetic datasets often suffer from a lack of diversity and added noise. In this paper, we present TarGEN, a multi-step prompting strategy for generating high-quality synthetic datasets utilizing a LLM. An advantage of TarGEN is its seedless nature; it does not require specific task instances, broadening its applicability beyond task replication. We augment TarGEN with a method known as self-correction empowering LLMs to rectify inaccurately labeled instances during dataset creation, ensuring reliable labels. To assess our technique's effectiveness, we emulate 8 tasks from the SuperGLUE benchmark and finetune various language models, including encoder-only, encoder-decoder, and decoder-only models on both synthetic and original training sets. Evaluation on the original test set reveals that models trained on datasets generated by TarGEN perform approximately 1-2% points better than those trained on original datasets (82.84% via syn. vs. 81.12% on og. using Flan-T5). When incorporating instruction tuning, the performance increases to 84.54% on synthetic data vs. 81.49% on original data by Flan-T5. A comprehensive analysis of the synthetic dataset compared to the original dataset reveals that the synthetic dataset demonstrates similar or higher levels of dataset complexity and diversity. Furthermore, the synthetic dataset displays a bias level that aligns closely with the original dataset. Finally, when pre-finetuned on our synthetic SuperGLUE dataset, T5-3B yields impressive results on the OpenLLM leaderboard, surpassing the model trained on the Self-Instruct dataset by 4.14% points. We hope that TarGEN can be helpful for quality data generation and reducing the human efforts to create complex benchmarks.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2310.17876
- https://arxiv.org/pdf/2310.17876
- OA Status
- green
- Cited By
- 3
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4388032353
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4388032353Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2310.17876Digital Object Identifier
- Title
-
TarGEN: Targeted Data Generation with Large Language ModelsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2023Year of publication
- Publication date
-
2023-10-27Full publication date if available
- Authors
-
Himanshu Gupta, Kevin Scaria, Ujjwala Anantheswaran, Shreyas Verma, Mihir Parmar, Saurabh Arjun Sawant, Swaroop Mishra, Chitta BaralList of authors in order
- Landing page
-
https://arxiv.org/abs/2310.17876Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2310.17876Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2310.17876Direct OA link when available
- Concepts
-
Computer science, Synthetic data, Benchmark (surveying), Task (project management), Set (abstract data type), Machine learning, Encoder, Artificial intelligence, Language model, Quality (philosophy), Noise (video), Data mining, Epistemology, Geography, Economics, Image (mathematics), Management, Geodesy, Programming language, Operating system, PhilosophyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
3Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 2, 2024: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4388032353 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2310.17876 |
| ids.doi | https://doi.org/10.48550/arxiv.2310.17876 |
| ids.openalex | https://openalex.org/W4388032353 |
| fwci | |
| type | preprint |
| title | TarGEN: Targeted Data Generation with Large Language Models |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10028 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9936000108718872 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Topic Modeling |
| topics[1].id | https://openalex.org/T10181 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9936000108718872 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Natural Language Processing Techniques |
| topics[2].id | https://openalex.org/T10201 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9158999919891357 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Speech Recognition and Synthesis |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.8323259353637695 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C160920958 |
| concepts[1].level | 2 |
| concepts[1].score | 0.726253867149353 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q7662746 |
| concepts[1].display_name | Synthetic data |
| concepts[2].id | https://openalex.org/C185798385 |
| concepts[2].level | 2 |
| concepts[2].score | 0.6890455484390259 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q1161707 |
| concepts[2].display_name | Benchmark (surveying) |
| concepts[3].id | https://openalex.org/C2780451532 |
| concepts[3].level | 2 |
| concepts[3].score | 0.6218836307525635 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q759676 |
| concepts[3].display_name | Task (project management) |
| concepts[4].id | https://openalex.org/C177264268 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5546471476554871 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q1514741 |
| concepts[4].display_name | Set (abstract data type) |
| concepts[5].id | https://openalex.org/C119857082 |
| concepts[5].level | 1 |
| concepts[5].score | 0.49037933349609375 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q2539 |
| concepts[5].display_name | Machine learning |
| concepts[6].id | https://openalex.org/C118505674 |
| concepts[6].level | 2 |
| concepts[6].score | 0.48942580819129944 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q42586063 |
| concepts[6].display_name | Encoder |
| concepts[7].id | https://openalex.org/C154945302 |
| concepts[7].level | 1 |
| concepts[7].score | 0.48319971561431885 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[7].display_name | Artificial intelligence |
| concepts[8].id | https://openalex.org/C137293760 |
| concepts[8].level | 2 |
| concepts[8].score | 0.47921139001846313 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q3621696 |
| concepts[8].display_name | Language model |
| concepts[9].id | https://openalex.org/C2779530757 |
| concepts[9].level | 2 |
| concepts[9].score | 0.45935678482055664 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q1207505 |
| concepts[9].display_name | Quality (philosophy) |
| concepts[10].id | https://openalex.org/C99498987 |
| concepts[10].level | 3 |
| concepts[10].score | 0.42818483710289 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q2210247 |
| concepts[10].display_name | Noise (video) |
| concepts[11].id | https://openalex.org/C124101348 |
| concepts[11].level | 1 |
| concepts[11].score | 0.3689226508140564 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q172491 |
| concepts[11].display_name | Data mining |
| concepts[12].id | https://openalex.org/C111472728 |
| concepts[12].level | 1 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q9471 |
| concepts[12].display_name | Epistemology |
| concepts[13].id | https://openalex.org/C205649164 |
| concepts[13].level | 0 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q1071 |
| concepts[13].display_name | Geography |
| concepts[14].id | https://openalex.org/C162324750 |
| concepts[14].level | 0 |
| concepts[14].score | 0.0 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q8134 |
| concepts[14].display_name | Economics |
| concepts[15].id | https://openalex.org/C115961682 |
| concepts[15].level | 2 |
| concepts[15].score | 0.0 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q860623 |
| concepts[15].display_name | Image (mathematics) |
| concepts[16].id | https://openalex.org/C187736073 |
| concepts[16].level | 1 |
| concepts[16].score | 0.0 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q2920921 |
| concepts[16].display_name | Management |
| concepts[17].id | https://openalex.org/C13280743 |
| concepts[17].level | 1 |
| concepts[17].score | 0.0 |
| concepts[17].wikidata | https://www.wikidata.org/wiki/Q131089 |
| concepts[17].display_name | Geodesy |
| concepts[18].id | https://openalex.org/C199360897 |
| concepts[18].level | 1 |
| concepts[18].score | 0.0 |
| concepts[18].wikidata | https://www.wikidata.org/wiki/Q9143 |
| concepts[18].display_name | Programming language |
| concepts[19].id | https://openalex.org/C111919701 |
| concepts[19].level | 1 |
| concepts[19].score | 0.0 |
| concepts[19].wikidata | https://www.wikidata.org/wiki/Q9135 |
| concepts[19].display_name | Operating system |
| concepts[20].id | https://openalex.org/C138885662 |
| concepts[20].level | 0 |
| concepts[20].score | 0.0 |
| concepts[20].wikidata | https://www.wikidata.org/wiki/Q5891 |
| concepts[20].display_name | Philosophy |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.8323259353637695 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/synthetic-data |
| keywords[1].score | 0.726253867149353 |
| keywords[1].display_name | Synthetic data |
| keywords[2].id | https://openalex.org/keywords/benchmark |
| keywords[2].score | 0.6890455484390259 |
| keywords[2].display_name | Benchmark (surveying) |
| keywords[3].id | https://openalex.org/keywords/task |
| keywords[3].score | 0.6218836307525635 |
| keywords[3].display_name | Task (project management) |
| keywords[4].id | https://openalex.org/keywords/set |
| keywords[4].score | 0.5546471476554871 |
| keywords[4].display_name | Set (abstract data type) |
| keywords[5].id | https://openalex.org/keywords/machine-learning |
| keywords[5].score | 0.49037933349609375 |
| keywords[5].display_name | Machine learning |
| keywords[6].id | https://openalex.org/keywords/encoder |
| keywords[6].score | 0.48942580819129944 |
| keywords[6].display_name | Encoder |
| keywords[7].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[7].score | 0.48319971561431885 |
| keywords[7].display_name | Artificial intelligence |
| keywords[8].id | https://openalex.org/keywords/language-model |
| keywords[8].score | 0.47921139001846313 |
| keywords[8].display_name | Language model |
| keywords[9].id | https://openalex.org/keywords/quality |
| keywords[9].score | 0.45935678482055664 |
| keywords[9].display_name | Quality (philosophy) |
| keywords[10].id | https://openalex.org/keywords/noise |
| keywords[10].score | 0.42818483710289 |
| keywords[10].display_name | Noise (video) |
| keywords[11].id | https://openalex.org/keywords/data-mining |
| keywords[11].score | 0.3689226508140564 |
| keywords[11].display_name | Data mining |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2310.17876 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2310.17876 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2310.17876 |
| locations[1].id | doi:10.48550/arxiv.2310.17876 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2310.17876 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5101643160 |
| authorships[0].author.orcid | https://orcid.org/0000-0003-3271-3032 |
| authorships[0].author.display_name | Himanshu Gupta |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Gupta, Himanshu |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5038616311 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Kevin Scaria |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Scaria, Kevin |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5059844013 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Ujjwala Anantheswaran |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Anantheswaran, Ujjwala |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5013095307 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Shreyas Verma |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Verma, Shreyas |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5106524687 |
| authorships[4].author.orcid | |
| authorships[4].author.display_name | Mihir Parmar |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Parmar, Mihir |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5080454694 |
| authorships[5].author.orcid | |
| authorships[5].author.display_name | Saurabh Arjun Sawant |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Sawant, Saurabh Arjun |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5063722751 |
| authorships[6].author.orcid | https://orcid.org/0009-0001-6413-7001 |
| authorships[6].author.display_name | Swaroop Mishra |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Mishra, Swaroop |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5083735830 |
| authorships[7].author.orcid | https://orcid.org/0000-0002-7549-723X |
| authorships[7].author.display_name | Chitta Baral |
| authorships[7].author_position | last |
| authorships[7].raw_author_name | Baral, Chitta |
| authorships[7].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2310.17876 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | TarGEN: Targeted Data Generation with Large Language Models |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10028 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9936000108718872 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Topic Modeling |
| related_works | https://openalex.org/W2378211422, https://openalex.org/W2745001401, https://openalex.org/W4321353415, https://openalex.org/W2130974462, https://openalex.org/W2028665553, https://openalex.org/W2086519370, https://openalex.org/W972276598, https://openalex.org/W2087343574, https://openalex.org/W4246352526, https://openalex.org/W2121910908 |
| cited_by_count | 3 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 2 |
| counts_by_year[1].year | 2024 |
| counts_by_year[1].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2310.17876 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2310.17876 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2310.17876 |
| primary_location.id | pmh:oai:arXiv.org:2310.17876 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2310.17876 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2310.17876 |
| publication_date | 2023-10-27 |
| publication_year | 2023 |
| referenced_works_count | 0 |
| abstract_inverted_index.8 | 105 |
| abstract_inverted_index.A | 183 |
| abstract_inverted_index.a | 30, 43, 53, 80, 215 |
| abstract_inverted_index.An | 55 |
| abstract_inverted_index.In | 37 |
| abstract_inverted_index.To | 98 |
| abstract_inverted_index.We | 76, 252 |
| abstract_inverted_index.as | 83 |
| abstract_inverted_index.be | 257 |
| abstract_inverted_index.by | 142, 181, 249 |
| abstract_inverted_index.in | 11 |
| abstract_inverted_index.is | 59 |
| abstract_inverted_index.it | 63 |
| abstract_inverted_index.of | 3, 32, 57, 186, 205 |
| abstract_inverted_index.on | 122, 130, 139, 152, 160, 173, 178, 228, 237, 245 |
| abstract_inverted_index.or | 202 |
| abstract_inverted_index.to | 16, 87, 171, 191, 268 |
| abstract_inverted_index.we | 40, 103 |
| abstract_inverted_index.The | 0 |
| abstract_inverted_index.and | 19, 34, 111, 119, 125, 208, 263 |
| abstract_inverted_index.can | 256 |
| abstract_inverted_index.for | 47, 259 |
| abstract_inverted_index.has | 8 |
| abstract_inverted_index.its | 60, 71 |
| abstract_inverted_index.not | 65 |
| abstract_inverted_index.og. | 161 |
| abstract_inverted_index.our | 100, 229 |
| abstract_inverted_index.set | 134 |
| abstract_inverted_index.the | 108, 131, 168, 187, 192, 197, 211, 222, 238, 242, 246, 265 |
| abstract_inverted_index.via | 156 |
| abstract_inverted_index.vs. | 158, 176 |
| abstract_inverted_index.1-2% | 146 |
| abstract_inverted_index.LLM. | 54 |
| abstract_inverted_index.LLMs | 86 |
| abstract_inverted_index.When | 164 |
| abstract_inverted_index.bias | 216 |
| abstract_inverted_index.both | 123 |
| abstract_inverted_index.data | 12, 175, 180, 261 |
| abstract_inverted_index.does | 64 |
| abstract_inverted_index.from | 29, 107 |
| abstract_inverted_index.hope | 253 |
| abstract_inverted_index.lack | 31 |
| abstract_inverted_index.syn. | 157 |
| abstract_inverted_index.task | 68, 74 |
| abstract_inverted_index.test | 133 |
| abstract_inverted_index.than | 149 |
| abstract_inverted_index.that | 136, 196, 218, 254 |
| abstract_inverted_index.this | 38 |
| abstract_inverted_index.when | 226 |
| abstract_inverted_index.with | 79, 221 |
| abstract_inverted_index.4.14% | 250 |
| abstract_inverted_index.T5-3B | 233 |
| abstract_inverted_index.added | 35 |
| abstract_inverted_index.human | 266 |
| abstract_inverted_index.known | 82 |
| abstract_inverted_index.large | 4 |
| abstract_inverted_index.level | 217 |
| abstract_inverted_index.model | 243 |
| abstract_inverted_index.often | 27 |
| abstract_inverted_index.rapid | 1 |
| abstract_inverted_index.sets. | 128 |
| abstract_inverted_index.tasks | 106 |
| abstract_inverted_index.these | 24 |
| abstract_inverted_index.those | 150 |
| abstract_inverted_index.using | 162 |
| abstract_inverted_index.(LLMs) | 7 |
| abstract_inverted_index.81.12% | 159 |
| abstract_inverted_index.81.49% | 177 |
| abstract_inverted_index.84.54% | 172 |
| abstract_inverted_index.TarGEN | 58, 78, 143, 255 |
| abstract_inverted_index.aiming | 15 |
| abstract_inverted_index.aligns | 219 |
| abstract_inverted_index.assess | 99 |
| abstract_inverted_index.better | 148 |
| abstract_inverted_index.beyond | 73 |
| abstract_inverted_index.create | 269 |
| abstract_inverted_index.during | 92 |
| abstract_inverted_index.higher | 203 |
| abstract_inverted_index.levels | 204 |
| abstract_inverted_index.method | 81 |
| abstract_inverted_index.models | 6, 121, 137 |
| abstract_inverted_index.noise. | 36 |
| abstract_inverted_index.paper, | 39 |
| abstract_inverted_index.points | 147 |
| abstract_inverted_index.suffer | 28 |
| abstract_inverted_index.yields | 234 |
| abstract_inverted_index.(82.84% | 155 |
| abstract_inverted_index.OpenLLM | 239 |
| abstract_inverted_index.TarGEN, | 42 |
| abstract_inverted_index.augment | 77 |
| abstract_inverted_index.closely | 220 |
| abstract_inverted_index.complex | 270 |
| abstract_inverted_index.dataset | 93, 189, 194, 199, 206, 213, 248 |
| abstract_inverted_index.diverse | 18 |
| abstract_inverted_index.efforts | 267 |
| abstract_inverted_index.emulate | 104 |
| abstract_inverted_index.helpful | 258 |
| abstract_inverted_index.labeled | 90 |
| abstract_inverted_index.labels. | 97 |
| abstract_inverted_index.models, | 115 |
| abstract_inverted_index.nature; | 62 |
| abstract_inverted_index.perform | 144 |
| abstract_inverted_index.points. | 251 |
| abstract_inverted_index.present | 41 |
| abstract_inverted_index.quality | 260 |
| abstract_inverted_index.rectify | 88 |
| abstract_inverted_index.require | 66 |
| abstract_inverted_index.results | 236 |
| abstract_inverted_index.reveals | 135, 195 |
| abstract_inverted_index.similar | 201 |
| abstract_inverted_index.sparked | 9 |
| abstract_inverted_index.trained | 138, 151, 244 |
| abstract_inverted_index.tuning, | 167 |
| abstract_inverted_index.various | 113 |
| abstract_inverted_index.Finally, | 225 |
| abstract_inverted_index.Flan-T5. | 182 |
| abstract_inverted_index.However, | 23 |
| abstract_inverted_index.analysis | 185 |
| abstract_inverted_index.compared | 190 |
| abstract_inverted_index.dataset, | 232 |
| abstract_inverted_index.dataset. | 224 |
| abstract_inverted_index.datasets | 26, 51, 140, 154 |
| abstract_inverted_index.displays | 214 |
| abstract_inverted_index.ensuring | 95 |
| abstract_inverted_index.finetune | 112 |
| abstract_inverted_index.generate | 17 |
| abstract_inverted_index.interest | 10 |
| abstract_inverted_index.language | 5, 114 |
| abstract_inverted_index.original | 126, 132, 153, 179, 193, 223 |
| abstract_inverted_index.reducing | 264 |
| abstract_inverted_index.reliable | 96 |
| abstract_inverted_index.seedless | 61 |
| abstract_inverted_index.specific | 67 |
| abstract_inverted_index.strategy | 46 |
| abstract_inverted_index.training | 127 |
| abstract_inverted_index.Flan-T5). | 163 |
| abstract_inverted_index.SuperGLUE | 109, 231 |
| abstract_inverted_index.advantage | 56 |
| abstract_inverted_index.benchmark | 110 |
| abstract_inverted_index.creation, | 94 |
| abstract_inverted_index.datasets. | 22 |
| abstract_inverted_index.diversity | 33 |
| abstract_inverted_index.generated | 141 |
| abstract_inverted_index.including | 116 |
| abstract_inverted_index.increases | 170 |
| abstract_inverted_index.instances | 91 |
| abstract_inverted_index.prompting | 45 |
| abstract_inverted_index.synthesis | 13 |
| abstract_inverted_index.synthetic | 21, 25, 50, 124, 174, 188, 198, 212, 230 |
| abstract_inverted_index.utilizing | 52 |
| abstract_inverted_index.Evaluation | 129 |
| abstract_inverted_index.broadening | 70 |
| abstract_inverted_index.complexity | 207 |
| abstract_inverted_index.diversity. | 209 |
| abstract_inverted_index.empowering | 85 |
| abstract_inverted_index.generating | 48 |
| abstract_inverted_index.generation | 262 |
| abstract_inverted_index.impressive | 235 |
| abstract_inverted_index.instances, | 69 |
| abstract_inverted_index.multi-step | 44 |
| abstract_inverted_index.surpassing | 241 |
| abstract_inverted_index.advancement | 2 |
| abstract_inverted_index.benchmarks. | 271 |
| abstract_inverted_index.instruction | 166 |
| abstract_inverted_index.performance | 169 |
| abstract_inverted_index.technique's | 101 |
| abstract_inverted_index.techniques, | 14 |
| abstract_inverted_index.Furthermore, | 210 |
| abstract_inverted_index.decoder-only | 120 |
| abstract_inverted_index.demonstrates | 200 |
| abstract_inverted_index.high-quality | 20, 49 |
| abstract_inverted_index.inaccurately | 89 |
| abstract_inverted_index.leaderboard, | 240 |
| abstract_inverted_index.replication. | 75 |
| abstract_inverted_index.Self-Instruct | 247 |
| abstract_inverted_index.applicability | 72 |
| abstract_inverted_index.approximately | 145 |
| abstract_inverted_index.comprehensive | 184 |
| abstract_inverted_index.encoder-only, | 117 |
| abstract_inverted_index.incorporating | 165 |
| abstract_inverted_index.pre-finetuned | 227 |
| abstract_inverted_index.effectiveness, | 102 |
| abstract_inverted_index.self-correction | 84 |
| abstract_inverted_index.encoder-decoder, | 118 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 8 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/4 |
| sustainable_development_goals[0].score | 0.5600000023841858 |
| sustainable_development_goals[0].display_name | Quality Education |
| citation_normalized_percentile |