WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2412.09951
The emergence of general human knowledge and impressive logical reasoning capacity in rapidly progressed vision-language models (VLMs) have driven increasing interest in applying VLMs to high-level autonomous driving tasks, such as scene understanding and decision-making. However, an in-depth study on the relationship between knowledge proficiency, especially essential driving expertise, and closed-loop autonomous driving performance requires further exploration. In this paper, we investigate the effects of the depth and breadth of fundamental driving knowledge on closed-loop trajectory planning and introduce WiseAD, a specialized VLM tailored for end-to-end autonomous driving capable of driving reasoning, action justification, object recognition, risk analysis, driving suggestions, and trajectory planning across diverse scenarios. We employ joint training on driving knowledge and planning datasets, enabling the model to perform knowledge-aligned trajectory planning accordingly. Extensive experiments indicate that as the diversity of driving knowledge extends, critical accidents are notably reduced, contributing 11.9% and 12.4% improvements in the driving score and route completion on the Carla closed-loop evaluations, achieving state-of-the-art performance. Moreover, WiseAD also demonstrates remarkable performance in knowledge evaluations on both in-domain and out-of-domain datasets.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2412.09951
- https://arxiv.org/pdf/2412.09951
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4405433131
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4405433131Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2412.09951Digital Object Identifier
- Title
-
WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language ModelWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-12-13Full publication date if available
- Authors
-
Songyan Zhang, Wenhui Huang, Zihui Gao, Hao Chen, Chen LvList of authors in order
- Landing page
-
https://arxiv.org/abs/2412.09951Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2412.09951Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2412.09951Direct OA link when available
- Concepts
-
End-to-end principle, End of history, Computer science, Artificial intelligence, Computer vision, Political science, Law, PoliticsTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4405433131 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2412.09951 |
| ids.doi | https://doi.org/10.48550/arxiv.2412.09951 |
| ids.openalex | https://openalex.org/W4405433131 |
| fwci | |
| type | preprint |
| title | WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10036 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9549999833106995 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Advanced Neural Network Applications |
| topics[1].id | https://openalex.org/T10627 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.90829998254776 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1707 |
| topics[1].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[1].display_name | Advanced Image and Video Retrieval Techniques |
| topics[2].id | https://openalex.org/T11099 |
| topics[2].field.id | https://openalex.org/fields/22 |
| topics[2].field.display_name | Engineering |
| topics[2].score | 0.904699981212616 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/2203 |
| topics[2].subfield.display_name | Automotive Engineering |
| topics[2].display_name | Autonomous Vehicle Technology and Safety |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C74296488 |
| concepts[0].level | 2 |
| concepts[0].score | 0.8249378204345703 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q2527392 |
| concepts[0].display_name | End-to-end principle |
| concepts[1].id | https://openalex.org/C2778935963 |
| concepts[1].level | 3 |
| concepts[1].score | 0.5768283009529114 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q13218530 |
| concepts[1].display_name | End of history |
| concepts[2].id | https://openalex.org/C41008148 |
| concepts[2].level | 0 |
| concepts[2].score | 0.5133932828903198 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[2].display_name | Computer science |
| concepts[3].id | https://openalex.org/C154945302 |
| concepts[3].level | 1 |
| concepts[3].score | 0.3566986918449402 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[3].display_name | Artificial intelligence |
| concepts[4].id | https://openalex.org/C31972630 |
| concepts[4].level | 1 |
| concepts[4].score | 0.3416728079319 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q844240 |
| concepts[4].display_name | Computer vision |
| concepts[5].id | https://openalex.org/C17744445 |
| concepts[5].level | 0 |
| concepts[5].score | 0.14132723212242126 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q36442 |
| concepts[5].display_name | Political science |
| concepts[6].id | https://openalex.org/C199539241 |
| concepts[6].level | 1 |
| concepts[6].score | 0.0 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q7748 |
| concepts[6].display_name | Law |
| concepts[7].id | https://openalex.org/C94625758 |
| concepts[7].level | 2 |
| concepts[7].score | 0.0 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q7163 |
| concepts[7].display_name | Politics |
| keywords[0].id | https://openalex.org/keywords/end-to-end-principle |
| keywords[0].score | 0.8249378204345703 |
| keywords[0].display_name | End-to-end principle |
| keywords[1].id | https://openalex.org/keywords/end-of-history |
| keywords[1].score | 0.5768283009529114 |
| keywords[1].display_name | End of history |
| keywords[2].id | https://openalex.org/keywords/computer-science |
| keywords[2].score | 0.5133932828903198 |
| keywords[2].display_name | Computer science |
| keywords[3].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[3].score | 0.3566986918449402 |
| keywords[3].display_name | Artificial intelligence |
| keywords[4].id | https://openalex.org/keywords/computer-vision |
| keywords[4].score | 0.3416728079319 |
| keywords[4].display_name | Computer vision |
| keywords[5].id | https://openalex.org/keywords/political-science |
| keywords[5].score | 0.14132723212242126 |
| keywords[5].display_name | Political science |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2412.09951 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2412.09951 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2412.09951 |
| locations[1].id | doi:10.48550/arxiv.2412.09951 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2412.09951 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5104805610 |
| authorships[0].author.orcid | https://orcid.org/0009-0006-2853-8875 |
| authorships[0].author.display_name | Songyan Zhang |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Zhang, Songyan |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5100724377 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-5435-8775 |
| authorships[1].author.display_name | Wenhui Huang |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Huang, Wenhui |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5026036197 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Zihui Gao |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Gao, Zihui |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5100353595 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-8295-4566 |
| authorships[3].author.display_name | Hao Chen |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Chen, Hao |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5072073374 |
| authorships[4].author.orcid | https://orcid.org/0000-0001-6897-4512 |
| authorships[4].author.display_name | Chen Lv |
| authorships[4].author_position | last |
| authorships[4].raw_author_name | Lv, Chen |
| authorships[4].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2412.09951 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2024-12-17T00:00:00 |
| display_name | WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10036 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9549999833106995 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Advanced Neural Network Applications |
| related_works | https://openalex.org/W3016188207, https://openalex.org/W2772917594, https://openalex.org/W2036807459, https://openalex.org/W2058170566, https://openalex.org/W2755342338, https://openalex.org/W2166024367, https://openalex.org/W3116076068, https://openalex.org/W2229312674, https://openalex.org/W2951359407, https://openalex.org/W2079911747 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2412.09951 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2412.09951 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2412.09951 |
| primary_location.id | pmh:oai:arXiv.org:2412.09951 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2412.09951 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2412.09951 |
| publication_date | 2024-12-13 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 80 |
| abstract_inverted_index.In | 57 |
| abstract_inverted_index.We | 106 |
| abstract_inverted_index.an | 36 |
| abstract_inverted_index.as | 30, 129 |
| abstract_inverted_index.in | 11, 21, 146, 167 |
| abstract_inverted_index.of | 2, 64, 69, 89, 132 |
| abstract_inverted_index.on | 39, 73, 110, 153, 170 |
| abstract_inverted_index.to | 24, 119 |
| abstract_inverted_index.we | 60 |
| abstract_inverted_index.The | 0 |
| abstract_inverted_index.VLM | 82 |
| abstract_inverted_index.and | 6, 33, 49, 67, 77, 100, 113, 143, 150, 173 |
| abstract_inverted_index.are | 138 |
| abstract_inverted_index.for | 84 |
| abstract_inverted_index.the | 40, 62, 65, 117, 130, 147, 154 |
| abstract_inverted_index.VLMs | 23 |
| abstract_inverted_index.also | 163 |
| abstract_inverted_index.both | 171 |
| abstract_inverted_index.have | 17 |
| abstract_inverted_index.risk | 96 |
| abstract_inverted_index.such | 29 |
| abstract_inverted_index.that | 128 |
| abstract_inverted_index.this | 58 |
| abstract_inverted_index.11.9% | 142 |
| abstract_inverted_index.12.4% | 144 |
| abstract_inverted_index.Carla | 155 |
| abstract_inverted_index.depth | 66 |
| abstract_inverted_index.human | 4 |
| abstract_inverted_index.joint | 108 |
| abstract_inverted_index.model | 118 |
| abstract_inverted_index.route | 151 |
| abstract_inverted_index.scene | 31 |
| abstract_inverted_index.score | 149 |
| abstract_inverted_index.study | 38 |
| abstract_inverted_index.(VLMs) | 16 |
| abstract_inverted_index.WiseAD | 162 |
| abstract_inverted_index.across | 103 |
| abstract_inverted_index.action | 92 |
| abstract_inverted_index.driven | 18 |
| abstract_inverted_index.employ | 107 |
| abstract_inverted_index.models | 15 |
| abstract_inverted_index.object | 94 |
| abstract_inverted_index.paper, | 59 |
| abstract_inverted_index.tasks, | 28 |
| abstract_inverted_index.WiseAD, | 79 |
| abstract_inverted_index.between | 42 |
| abstract_inverted_index.breadth | 68 |
| abstract_inverted_index.capable | 88 |
| abstract_inverted_index.diverse | 104 |
| abstract_inverted_index.driving | 27, 47, 52, 71, 87, 90, 98, 111, 133, 148 |
| abstract_inverted_index.effects | 63 |
| abstract_inverted_index.further | 55 |
| abstract_inverted_index.general | 3 |
| abstract_inverted_index.logical | 8 |
| abstract_inverted_index.notably | 139 |
| abstract_inverted_index.perform | 120 |
| abstract_inverted_index.rapidly | 12 |
| abstract_inverted_index.However, | 35 |
| abstract_inverted_index.applying | 22 |
| abstract_inverted_index.capacity | 10 |
| abstract_inverted_index.critical | 136 |
| abstract_inverted_index.enabling | 116 |
| abstract_inverted_index.extends, | 135 |
| abstract_inverted_index.in-depth | 37 |
| abstract_inverted_index.indicate | 127 |
| abstract_inverted_index.interest | 20 |
| abstract_inverted_index.planning | 76, 102, 114, 123 |
| abstract_inverted_index.reduced, | 140 |
| abstract_inverted_index.requires | 54 |
| abstract_inverted_index.tailored | 83 |
| abstract_inverted_index.training | 109 |
| abstract_inverted_index.Extensive | 125 |
| abstract_inverted_index.Moreover, | 161 |
| abstract_inverted_index.accidents | 137 |
| abstract_inverted_index.achieving | 158 |
| abstract_inverted_index.analysis, | 97 |
| abstract_inverted_index.datasets, | 115 |
| abstract_inverted_index.datasets. | 175 |
| abstract_inverted_index.diversity | 131 |
| abstract_inverted_index.emergence | 1 |
| abstract_inverted_index.essential | 46 |
| abstract_inverted_index.in-domain | 172 |
| abstract_inverted_index.introduce | 78 |
| abstract_inverted_index.knowledge | 5, 43, 72, 112, 134, 168 |
| abstract_inverted_index.reasoning | 9 |
| abstract_inverted_index.autonomous | 26, 51, 86 |
| abstract_inverted_index.completion | 152 |
| abstract_inverted_index.end-to-end | 85 |
| abstract_inverted_index.especially | 45 |
| abstract_inverted_index.expertise, | 48 |
| abstract_inverted_index.high-level | 25 |
| abstract_inverted_index.impressive | 7 |
| abstract_inverted_index.increasing | 19 |
| abstract_inverted_index.progressed | 13 |
| abstract_inverted_index.reasoning, | 91 |
| abstract_inverted_index.remarkable | 165 |
| abstract_inverted_index.scenarios. | 105 |
| abstract_inverted_index.trajectory | 75, 101, 122 |
| abstract_inverted_index.closed-loop | 50, 74, 156 |
| abstract_inverted_index.evaluations | 169 |
| abstract_inverted_index.experiments | 126 |
| abstract_inverted_index.fundamental | 70 |
| abstract_inverted_index.investigate | 61 |
| abstract_inverted_index.performance | 53, 166 |
| abstract_inverted_index.specialized | 81 |
| abstract_inverted_index.accordingly. | 124 |
| abstract_inverted_index.contributing | 141 |
| abstract_inverted_index.demonstrates | 164 |
| abstract_inverted_index.evaluations, | 157 |
| abstract_inverted_index.exploration. | 56 |
| abstract_inverted_index.improvements | 145 |
| abstract_inverted_index.performance. | 160 |
| abstract_inverted_index.proficiency, | 44 |
| abstract_inverted_index.recognition, | 95 |
| abstract_inverted_index.relationship | 41 |
| abstract_inverted_index.suggestions, | 99 |
| abstract_inverted_index.out-of-domain | 174 |
| abstract_inverted_index.understanding | 32 |
| abstract_inverted_index.justification, | 93 |
| abstract_inverted_index.vision-language | 14 |
| abstract_inverted_index.decision-making. | 34 |
| abstract_inverted_index.state-of-the-art | 159 |
| abstract_inverted_index.knowledge-aligned | 121 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 5 |
| citation_normalized_percentile |