Noisy Exemplars Make Large Language Models More Robust: A Domain-Agnostic Behavioral Analysis Article Swipe
YOU?
·
· 2023
· Open Access
·
· DOI: https://doi.org/10.18653/v1/2023.emnlp-main.277
Recent advances in prompt engineering enable large language models (LLMs) to solve multi-hop logical reasoning problems with impressive accuracy. However, there is little existing work investigating the robustness of LLMs with few-shot prompting techniques. Therefore, we introduce a systematic approach to test the robustness of LLMs in multi-hop reasoning tasks via domain-agnostic perturbations. We include perturbations at multiple levels of abstractions (e.g. lexical perturbations such as typos, and semantic perturbations such as the inclusion of intermediate reasoning steps in the questions) to conduct behavioral analysis on the LLMs. Throughout our experiments, we find that models are more sensitive to certain perturbations such as replacing words with their synonyms. We also demonstrate that increasing the proportion of perturbed exemplars in the prompts improves the robustness of few-shot prompting methods.
Related Topics
- Type
- article
- Language
- en
- Landing Page
- https://doi.org/10.18653/v1/2023.emnlp-main.277
- https://aclanthology.org/2023.emnlp-main.277.pdf
- OA Status
- gold
- Cited By
- 2
- References
- 22
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4389524026
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4389524026Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.18653/v1/2023.emnlp-main.277Digital Object Identifier
- Title
-
Noisy Exemplars Make Large Language Models More Robust: A Domain-Agnostic Behavioral AnalysisWork title
- Type
-
articleOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2023Year of publication
- Publication date
-
2023-01-01Full publication date if available
- Authors
-
Hong‐Yi Zheng, Abulhair SaparovList of authors in order
- Landing page
-
https://doi.org/10.18653/v1/2023.emnlp-main.277Publisher landing page
- PDF URL
-
https://aclanthology.org/2023.emnlp-main.277.pdfDirect link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
goldOpen access status per OpenAlex
- OA URL
-
https://aclanthology.org/2023.emnlp-main.277.pdfDirect OA link when available
- Concepts
-
Robustness (evolution), Computer science, Artificial intelligence, Natural language processing, Biochemistry, Gene, ChemistryTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
2Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 2Per-year citation counts (last 5 years)
- References (count)
-
22Number of works referenced by this work
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4389524026 |
|---|---|
| doi | https://doi.org/10.18653/v1/2023.emnlp-main.277 |
| ids.doi | https://doi.org/10.18653/v1/2023.emnlp-main.277 |
| ids.openalex | https://openalex.org/W4389524026 |
| fwci | 0.51088578 |
| type | article |
| title | Noisy Exemplars Make Large Language Models More Robust: A Domain-Agnostic Behavioral Analysis |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | 4568 |
| biblio.first_page | 4560 |
| topics[0].id | https://openalex.org/T10181 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9998000264167786 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Natural Language Processing Techniques |
| topics[1].id | https://openalex.org/T10028 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9997000098228455 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Topic Modeling |
| topics[2].id | https://openalex.org/T10260 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9950000047683716 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1710 |
| topics[2].subfield.display_name | Information Systems |
| topics[2].display_name | Software Engineering Research |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C63479239 |
| concepts[0].level | 3 |
| concepts[0].score | 0.7760473489761353 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q7353546 |
| concepts[0].display_name | Robustness (evolution) |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.6740607619285583 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C154945302 |
| concepts[2].level | 1 |
| concepts[2].score | 0.4469095766544342 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[2].display_name | Artificial intelligence |
| concepts[3].id | https://openalex.org/C204321447 |
| concepts[3].level | 1 |
| concepts[3].score | 0.3565889000892639 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[3].display_name | Natural language processing |
| concepts[4].id | https://openalex.org/C55493867 |
| concepts[4].level | 1 |
| concepts[4].score | 0.0 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q7094 |
| concepts[4].display_name | Biochemistry |
| concepts[5].id | https://openalex.org/C104317684 |
| concepts[5].level | 2 |
| concepts[5].score | 0.0 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q7187 |
| concepts[5].display_name | Gene |
| concepts[6].id | https://openalex.org/C185592680 |
| concepts[6].level | 0 |
| concepts[6].score | 0.0 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q2329 |
| concepts[6].display_name | Chemistry |
| keywords[0].id | https://openalex.org/keywords/robustness |
| keywords[0].score | 0.7760473489761353 |
| keywords[0].display_name | Robustness (evolution) |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.6740607619285583 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[2].score | 0.4469095766544342 |
| keywords[2].display_name | Artificial intelligence |
| keywords[3].id | https://openalex.org/keywords/natural-language-processing |
| keywords[3].score | 0.3565889000892639 |
| keywords[3].display_name | Natural language processing |
| language | en |
| locations[0].id | doi:10.18653/v1/2023.emnlp-main.277 |
| locations[0].is_oa | True |
| locations[0].source | |
| locations[0].license | cc-by |
| locations[0].pdf_url | https://aclanthology.org/2023.emnlp-main.277.pdf |
| locations[0].version | publishedVersion |
| locations[0].raw_type | proceedings-article |
| locations[0].license_id | https://openalex.org/licenses/cc-by |
| locations[0].is_accepted | True |
| locations[0].is_published | True |
| locations[0].raw_source_name | Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing |
| locations[0].landing_page_url | https://doi.org/10.18653/v1/2023.emnlp-main.277 |
| indexed_in | crossref |
| authorships[0].author.id | https://openalex.org/A5107843482 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Hong‐Yi Zheng |
| authorships[0].countries | US |
| authorships[0].affiliations[0].institution_ids | https://openalex.org/I57206974 |
| authorships[0].affiliations[0].raw_affiliation_string | New York University |
| authorships[0].institutions[0].id | https://openalex.org/I57206974 |
| authorships[0].institutions[0].ror | https://ror.org/0190ak572 |
| authorships[0].institutions[0].type | education |
| authorships[0].institutions[0].lineage | https://openalex.org/I57206974 |
| authorships[0].institutions[0].country_code | US |
| authorships[0].institutions[0].display_name | New York University |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Hongyi Zheng |
| authorships[0].is_corresponding | False |
| authorships[0].raw_affiliation_strings | New York University |
| authorships[1].author.id | https://openalex.org/A5062813195 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Abulhair Saparov |
| authorships[1].countries | US |
| authorships[1].affiliations[0].institution_ids | https://openalex.org/I57206974 |
| authorships[1].affiliations[0].raw_affiliation_string | New York University |
| authorships[1].institutions[0].id | https://openalex.org/I57206974 |
| authorships[1].institutions[0].ror | https://ror.org/0190ak572 |
| authorships[1].institutions[0].type | education |
| authorships[1].institutions[0].lineage | https://openalex.org/I57206974 |
| authorships[1].institutions[0].country_code | US |
| authorships[1].institutions[0].display_name | New York University |
| authorships[1].author_position | last |
| authorships[1].raw_author_name | Abulhair Saparov |
| authorships[1].is_corresponding | False |
| authorships[1].raw_affiliation_strings | New York University |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://aclanthology.org/2023.emnlp-main.277.pdf |
| open_access.oa_status | gold |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Noisy Exemplars Make Large Language Models More Robust: A Domain-Agnostic Behavioral Analysis |
| has_fulltext | True |
| is_retracted | False |
| updated_date | 2025-11-06T03:46:38.306776 |
| primary_topic.id | https://openalex.org/T10181 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9998000264167786 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Natural Language Processing Techniques |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W2358668433, https://openalex.org/W4396701345, https://openalex.org/W2376932109, https://openalex.org/W2001405890, https://openalex.org/W4396696052, https://openalex.org/W4402327032, https://openalex.org/W3204019825 |
| cited_by_count | 2 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 2 |
| locations_count | 1 |
| best_oa_location.id | doi:10.18653/v1/2023.emnlp-main.277 |
| best_oa_location.is_oa | True |
| best_oa_location.source | |
| best_oa_location.license | cc-by |
| best_oa_location.pdf_url | https://aclanthology.org/2023.emnlp-main.277.pdf |
| best_oa_location.version | publishedVersion |
| best_oa_location.raw_type | proceedings-article |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by |
| best_oa_location.is_accepted | True |
| best_oa_location.is_published | True |
| best_oa_location.raw_source_name | Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing |
| best_oa_location.landing_page_url | https://doi.org/10.18653/v1/2023.emnlp-main.277 |
| primary_location.id | doi:10.18653/v1/2023.emnlp-main.277 |
| primary_location.is_oa | True |
| primary_location.source | |
| primary_location.license | cc-by |
| primary_location.pdf_url | https://aclanthology.org/2023.emnlp-main.277.pdf |
| primary_location.version | publishedVersion |
| primary_location.raw_type | proceedings-article |
| primary_location.license_id | https://openalex.org/licenses/cc-by |
| primary_location.is_accepted | True |
| primary_location.is_published | True |
| primary_location.raw_source_name | Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing |
| primary_location.landing_page_url | https://doi.org/10.18653/v1/2023.emnlp-main.277 |
| publication_date | 2023-01-01 |
| publication_year | 2023 |
| referenced_works | https://openalex.org/W4302305823, https://openalex.org/W4221143046, https://openalex.org/W4281557260, https://openalex.org/W2076253536, https://openalex.org/W2906152891, https://openalex.org/W4385570291, https://openalex.org/W4221161695, https://openalex.org/W4292779060, https://openalex.org/W2963969878, https://openalex.org/W3159959439, https://openalex.org/W4281483047, https://openalex.org/W4376654357, https://openalex.org/W4319049323, https://openalex.org/W4286892945, https://openalex.org/W4386506836, https://openalex.org/W4281250694, https://openalex.org/W4306294746, https://openalex.org/W2962800603, https://openalex.org/W2964048171, https://openalex.org/W3001279689, https://openalex.org/W3035507081, https://openalex.org/W4378510422 |
| referenced_works_count | 22 |
| abstract_inverted_index.a | 37 |
| abstract_inverted_index.We | 53, 108 |
| abstract_inverted_index.as | 65, 71, 102 |
| abstract_inverted_index.at | 56 |
| abstract_inverted_index.in | 2, 46, 78, 118 |
| abstract_inverted_index.is | 21 |
| abstract_inverted_index.of | 28, 44, 59, 74, 115, 124 |
| abstract_inverted_index.on | 85 |
| abstract_inverted_index.to | 10, 40, 81, 98 |
| abstract_inverted_index.we | 35, 91 |
| abstract_inverted_index.and | 67 |
| abstract_inverted_index.are | 95 |
| abstract_inverted_index.our | 89 |
| abstract_inverted_index.the | 26, 42, 72, 79, 86, 113, 119, 122 |
| abstract_inverted_index.via | 50 |
| abstract_inverted_index.LLMs | 29, 45 |
| abstract_inverted_index.also | 109 |
| abstract_inverted_index.find | 92 |
| abstract_inverted_index.more | 96 |
| abstract_inverted_index.such | 64, 70, 101 |
| abstract_inverted_index.test | 41 |
| abstract_inverted_index.that | 93, 111 |
| abstract_inverted_index.with | 16, 30, 105 |
| abstract_inverted_index.work | 24 |
| abstract_inverted_index.(e.g. | 61 |
| abstract_inverted_index.LLMs. | 87 |
| abstract_inverted_index.large | 6 |
| abstract_inverted_index.solve | 11 |
| abstract_inverted_index.steps | 77 |
| abstract_inverted_index.tasks | 49 |
| abstract_inverted_index.their | 106 |
| abstract_inverted_index.there | 20 |
| abstract_inverted_index.words | 104 |
| abstract_inverted_index.(LLMs) | 9 |
| abstract_inverted_index.Recent | 0 |
| abstract_inverted_index.enable | 5 |
| abstract_inverted_index.levels | 58 |
| abstract_inverted_index.little | 22 |
| abstract_inverted_index.models | 8, 94 |
| abstract_inverted_index.prompt | 3 |
| abstract_inverted_index.typos, | 66 |
| abstract_inverted_index.certain | 99 |
| abstract_inverted_index.conduct | 82 |
| abstract_inverted_index.include | 54 |
| abstract_inverted_index.lexical | 62 |
| abstract_inverted_index.logical | 13 |
| abstract_inverted_index.prompts | 120 |
| abstract_inverted_index.However, | 19 |
| abstract_inverted_index.advances | 1 |
| abstract_inverted_index.analysis | 84 |
| abstract_inverted_index.approach | 39 |
| abstract_inverted_index.existing | 23 |
| abstract_inverted_index.few-shot | 31, 125 |
| abstract_inverted_index.improves | 121 |
| abstract_inverted_index.language | 7 |
| abstract_inverted_index.methods. | 127 |
| abstract_inverted_index.multiple | 57 |
| abstract_inverted_index.problems | 15 |
| abstract_inverted_index.semantic | 68 |
| abstract_inverted_index.accuracy. | 18 |
| abstract_inverted_index.exemplars | 117 |
| abstract_inverted_index.inclusion | 73 |
| abstract_inverted_index.introduce | 36 |
| abstract_inverted_index.multi-hop | 12, 47 |
| abstract_inverted_index.perturbed | 116 |
| abstract_inverted_index.prompting | 32, 126 |
| abstract_inverted_index.reasoning | 14, 48, 76 |
| abstract_inverted_index.replacing | 103 |
| abstract_inverted_index.sensitive | 97 |
| abstract_inverted_index.synonyms. | 107 |
| abstract_inverted_index.Therefore, | 34 |
| abstract_inverted_index.Throughout | 88 |
| abstract_inverted_index.behavioral | 83 |
| abstract_inverted_index.impressive | 17 |
| abstract_inverted_index.increasing | 112 |
| abstract_inverted_index.proportion | 114 |
| abstract_inverted_index.questions) | 80 |
| abstract_inverted_index.robustness | 27, 43, 123 |
| abstract_inverted_index.systematic | 38 |
| abstract_inverted_index.demonstrate | 110 |
| abstract_inverted_index.engineering | 4 |
| abstract_inverted_index.techniques. | 33 |
| abstract_inverted_index.abstractions | 60 |
| abstract_inverted_index.experiments, | 90 |
| abstract_inverted_index.intermediate | 75 |
| abstract_inverted_index.investigating | 25 |
| abstract_inverted_index.perturbations | 55, 63, 69, 100 |
| abstract_inverted_index.perturbations. | 52 |
| abstract_inverted_index.domain-agnostic | 51 |
| cited_by_percentile_year.max | 97 |
| cited_by_percentile_year.min | 95 |
| countries_distinct_count | 1 |
| institutions_distinct_count | 2 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/10 |
| sustainable_development_goals[0].score | 0.6000000238418579 |
| sustainable_development_goals[0].display_name | Reduced inequalities |
| citation_normalized_percentile.value | 0.69930781 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | False |