Bring the Apple, Not the Sofa: Impact of Irrelevant Context in Embodied AI Commands on VLA Models Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2510.07067
Vision Language Action (VLA) models are widely used in Embodied AI, enabling robots to interpret and execute language instructions. However, their robustness to natural language variability in real-world scenarios has not been thoroughly investigated. In this work, we present a novel systematic study of the robustness of state-of-the-art VLA models under linguistic perturbations. Specifically, we evaluate model performance under two types of instruction noise: (1) human-generated paraphrasing and (2) the addition of irrelevant context. We further categorize irrelevant contexts into two groups according to their length and their semantic and lexical proximity to robot commands. In this study, we observe consistent performance degradation as context size expands. We also demonstrate that the model can exhibit relative robustness to random context, with a performance drop within 10%, while semantically and lexically similar context of the same length can trigger a quality decline of around 50%. Human paraphrases of instructions lead to a drop of nearly 20%. To mitigate this, we propose an LLM-based filtering framework that extracts core commands from noisy inputs. Incorporating our filtering step allows models to recover up to 98.5% of their original performance under noisy conditions.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2510.07067
- https://arxiv.org/pdf/2510.07067
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4415317969
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4415317969Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2510.07067Digital Object Identifier
- Title
-
Bring the Apple, Not the Sofa: Impact of Irrelevant Context in Embodied AI Commands on VLA ModelsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-10-08Full publication date if available
- Authors
-
Daria Pugacheva, Andrey Moskalenko, Denis Shepelev, A. Kuznetsov, Vlad Shakhuro, Elena TutubalinaList of authors in order
- Landing page
-
https://arxiv.org/abs/2510.07067Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2510.07067Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2510.07067Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4415317969 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2510.07067 |
| ids.doi | https://doi.org/10.48550/arxiv.2510.07067 |
| ids.openalex | https://openalex.org/W4415317969 |
| fwci | |
| type | preprint |
| title | Bring the Apple, Not the Sofa: Impact of Irrelevant Context in Embodied AI Commands on VLA Models |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T12026 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9811000227928162 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Explainable Artificial Intelligence (XAI) |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2510.07067 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | cc-by-nc-sa |
| locations[0].pdf_url | https://arxiv.org/pdf/2510.07067 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | https://openalex.org/licenses/cc-by-nc-sa |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2510.07067 |
| locations[1].id | doi:10.48550/arxiv.2510.07067 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2510.07067 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5120050366 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Daria Pugacheva |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Pugacheva, Daria |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5087983729 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-4965-0867 |
| authorships[1].author.display_name | Andrey Moskalenko |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Moskalenko, Andrey |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5035906827 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-9170-3064 |
| authorships[2].author.display_name | Denis Shepelev |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Shepelev, Denis |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5002910559 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-1782-6584 |
| authorships[3].author.display_name | A. Kuznetsov |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Kuznetsov, Andrey |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5069058935 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-1586-9257 |
| authorships[4].author.display_name | Vlad Shakhuro |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Shakhuro, Vlad |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5012311258 |
| authorships[5].author.orcid | https://orcid.org/0000-0001-7936-0284 |
| authorships[5].author.display_name | Elena Tutubalina |
| authorships[5].author_position | last |
| authorships[5].raw_author_name | Tutubalina, Elena |
| authorships[5].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2510.07067 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-18T00:00:00 |
| display_name | Bring the Apple, Not the Sofa: Impact of Irrelevant Context in Embodied AI Commands on VLA Models |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T12026 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9811000227928162 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Explainable Artificial Intelligence (XAI) |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2510.07067 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | cc-by-nc-sa |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2510.07067 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by-nc-sa |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2510.07067 |
| primary_location.id | pmh:oai:arXiv.org:2510.07067 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | cc-by-nc-sa |
| primary_location.pdf_url | https://arxiv.org/pdf/2510.07067 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | https://openalex.org/licenses/cc-by-nc-sa |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2510.07067 |
| publication_date | 2025-10-08 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 39, 121, 138, 150 |
| abstract_inverted_index.In | 34, 95 |
| abstract_inverted_index.To | 155 |
| abstract_inverted_index.We | 74, 107 |
| abstract_inverted_index.an | 160 |
| abstract_inverted_index.as | 103 |
| abstract_inverted_index.in | 8, 26 |
| abstract_inverted_index.of | 43, 46, 61, 71, 132, 141, 146, 152, 182 |
| abstract_inverted_index.to | 13, 22, 83, 92, 117, 149, 177, 180 |
| abstract_inverted_index.up | 179 |
| abstract_inverted_index.we | 37, 54, 98, 158 |
| abstract_inverted_index.(1) | 64 |
| abstract_inverted_index.(2) | 68 |
| abstract_inverted_index.AI, | 10 |
| abstract_inverted_index.VLA | 48 |
| abstract_inverted_index.and | 15, 67, 86, 89, 128 |
| abstract_inverted_index.are | 5 |
| abstract_inverted_index.can | 113, 136 |
| abstract_inverted_index.has | 29 |
| abstract_inverted_index.not | 30 |
| abstract_inverted_index.our | 172 |
| abstract_inverted_index.the | 44, 69, 111, 133 |
| abstract_inverted_index.two | 59, 80 |
| abstract_inverted_index.10%, | 125 |
| abstract_inverted_index.20%. | 154 |
| abstract_inverted_index.50%. | 143 |
| abstract_inverted_index.also | 108 |
| abstract_inverted_index.been | 31 |
| abstract_inverted_index.core | 166 |
| abstract_inverted_index.drop | 123, 151 |
| abstract_inverted_index.from | 168 |
| abstract_inverted_index.into | 79 |
| abstract_inverted_index.lead | 148 |
| abstract_inverted_index.same | 134 |
| abstract_inverted_index.size | 105 |
| abstract_inverted_index.step | 174 |
| abstract_inverted_index.that | 110, 164 |
| abstract_inverted_index.this | 35, 96 |
| abstract_inverted_index.used | 7 |
| abstract_inverted_index.with | 120 |
| abstract_inverted_index.(VLA) | 3 |
| abstract_inverted_index.98.5% | 181 |
| abstract_inverted_index.Human | 144 |
| abstract_inverted_index.model | 56, 112 |
| abstract_inverted_index.noisy | 169, 187 |
| abstract_inverted_index.novel | 40 |
| abstract_inverted_index.robot | 93 |
| abstract_inverted_index.study | 42 |
| abstract_inverted_index.their | 20, 84, 87, 183 |
| abstract_inverted_index.this, | 157 |
| abstract_inverted_index.types | 60 |
| abstract_inverted_index.under | 50, 58, 186 |
| abstract_inverted_index.while | 126 |
| abstract_inverted_index.work, | 36 |
| abstract_inverted_index.Action | 2 |
| abstract_inverted_index.Vision | 0 |
| abstract_inverted_index.allows | 175 |
| abstract_inverted_index.around | 142 |
| abstract_inverted_index.groups | 81 |
| abstract_inverted_index.length | 85, 135 |
| abstract_inverted_index.models | 4, 49, 176 |
| abstract_inverted_index.nearly | 153 |
| abstract_inverted_index.noise: | 63 |
| abstract_inverted_index.random | 118 |
| abstract_inverted_index.robots | 12 |
| abstract_inverted_index.study, | 97 |
| abstract_inverted_index.widely | 6 |
| abstract_inverted_index.within | 124 |
| abstract_inverted_index.context | 104, 131 |
| abstract_inverted_index.decline | 140 |
| abstract_inverted_index.execute | 16 |
| abstract_inverted_index.exhibit | 114 |
| abstract_inverted_index.further | 75 |
| abstract_inverted_index.inputs. | 170 |
| abstract_inverted_index.lexical | 90 |
| abstract_inverted_index.natural | 23 |
| abstract_inverted_index.observe | 99 |
| abstract_inverted_index.present | 38 |
| abstract_inverted_index.propose | 159 |
| abstract_inverted_index.quality | 139 |
| abstract_inverted_index.recover | 178 |
| abstract_inverted_index.similar | 130 |
| abstract_inverted_index.trigger | 137 |
| abstract_inverted_index.Embodied | 9 |
| abstract_inverted_index.However, | 19 |
| abstract_inverted_index.Language | 1 |
| abstract_inverted_index.addition | 70 |
| abstract_inverted_index.commands | 167 |
| abstract_inverted_index.context, | 119 |
| abstract_inverted_index.context. | 73 |
| abstract_inverted_index.contexts | 78 |
| abstract_inverted_index.enabling | 11 |
| abstract_inverted_index.evaluate | 55 |
| abstract_inverted_index.expands. | 106 |
| abstract_inverted_index.extracts | 165 |
| abstract_inverted_index.language | 17, 24 |
| abstract_inverted_index.mitigate | 156 |
| abstract_inverted_index.original | 184 |
| abstract_inverted_index.relative | 115 |
| abstract_inverted_index.semantic | 88 |
| abstract_inverted_index.LLM-based | 161 |
| abstract_inverted_index.according | 82 |
| abstract_inverted_index.commands. | 94 |
| abstract_inverted_index.filtering | 162, 173 |
| abstract_inverted_index.framework | 163 |
| abstract_inverted_index.interpret | 14 |
| abstract_inverted_index.lexically | 129 |
| abstract_inverted_index.proximity | 91 |
| abstract_inverted_index.scenarios | 28 |
| abstract_inverted_index.categorize | 76 |
| abstract_inverted_index.consistent | 100 |
| abstract_inverted_index.irrelevant | 72, 77 |
| abstract_inverted_index.linguistic | 51 |
| abstract_inverted_index.real-world | 27 |
| abstract_inverted_index.robustness | 21, 45, 116 |
| abstract_inverted_index.systematic | 41 |
| abstract_inverted_index.thoroughly | 32 |
| abstract_inverted_index.conditions. | 188 |
| abstract_inverted_index.degradation | 102 |
| abstract_inverted_index.demonstrate | 109 |
| abstract_inverted_index.instruction | 62 |
| abstract_inverted_index.paraphrases | 145 |
| abstract_inverted_index.performance | 57, 101, 122, 185 |
| abstract_inverted_index.variability | 25 |
| abstract_inverted_index.instructions | 147 |
| abstract_inverted_index.paraphrasing | 66 |
| abstract_inverted_index.semantically | 127 |
| abstract_inverted_index.Incorporating | 171 |
| abstract_inverted_index.Specifically, | 53 |
| abstract_inverted_index.instructions. | 18 |
| abstract_inverted_index.investigated. | 33 |
| abstract_inverted_index.perturbations. | 52 |
| abstract_inverted_index.human-generated | 65 |
| abstract_inverted_index.state-of-the-art | 47 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 6 |
| citation_normalized_percentile |