Learning High-resolution Vector Representation from Multi-Camera Images for 3D Object Detection Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2407.15354
The Bird's-Eye-View (BEV) representation is a critical factor that directly impacts the 3D object detection performance, but the traditional BEV grid representation induces quadratic computational cost as the spatial resolution grows. To address this limitation, we present a new camera-based 3D object detector with high-resolution vector representation: VectorFormer. The presented high-resolution vector representation is combined with the lower-resolution BEV representation to efficiently exploit 3D geometry from multi-camera images at a high resolution through our two novel modules: vector scattering and gathering. To this end, the learned vector representation with richer scene contexts can serve as the decoding query for final predictions. We conduct extensive experiments on the nuScenes dataset and demonstrate state-of-the-art performance in NDS and inference time. Furthermore, we investigate query-BEV-based methods incorporated with our proposed vector representation and observe a consistent performance improvement.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2407.15354
- https://arxiv.org/pdf/2407.15354
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4406073071
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4406073071Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2407.15354Digital Object Identifier
- Title
-
Learning High-resolution Vector Representation from Multi-Camera Images for 3D Object DetectionWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-07-22Full publication date if available
- Authors
-
Zhili Chen, Shuangjie Xu, Maosheng Ye, Zian Qian, Xiaoyi Zou, Dit‐Yan Yeung, Qifeng ChenList of authors in order
- Landing page
-
https://arxiv.org/abs/2407.15354Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2407.15354Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2407.15354Direct OA link when available
- Concepts
-
Artificial intelligence, Computer vision, Representation (politics), Computer science, Object (grammar), Object detection, Resolution (logic), High resolution, Pattern recognition (psychology), Remote sensing, Geography, Political science, Law, PoliticsTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4406073071 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2407.15354 |
| ids.doi | https://doi.org/10.48550/arxiv.2407.15354 |
| ids.openalex | https://openalex.org/W4406073071 |
| fwci | |
| type | preprint |
| title | Learning High-resolution Vector Representation from Multi-Camera Images for 3D Object Detection |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T12111 |
| topics[0].field.id | https://openalex.org/fields/22 |
| topics[0].field.display_name | Engineering |
| topics[0].score | 0.9685999751091003 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/2209 |
| topics[0].subfield.display_name | Industrial and Manufacturing Engineering |
| topics[0].display_name | Industrial Vision Systems and Defect Detection |
| topics[1].id | https://openalex.org/T10036 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9648000001907349 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1707 |
| topics[1].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[1].display_name | Advanced Neural Network Applications |
| topics[2].id | https://openalex.org/T10627 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9483000040054321 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1707 |
| topics[2].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[2].display_name | Advanced Image and Video Retrieval Techniques |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C154945302 |
| concepts[0].level | 1 |
| concepts[0].score | 0.7368059754371643 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[0].display_name | Artificial intelligence |
| concepts[1].id | https://openalex.org/C31972630 |
| concepts[1].level | 1 |
| concepts[1].score | 0.6954893469810486 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q844240 |
| concepts[1].display_name | Computer vision |
| concepts[2].id | https://openalex.org/C2776359362 |
| concepts[2].level | 3 |
| concepts[2].score | 0.6522197723388672 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q2145286 |
| concepts[2].display_name | Representation (politics) |
| concepts[3].id | https://openalex.org/C41008148 |
| concepts[3].level | 0 |
| concepts[3].score | 0.6250434517860413 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[3].display_name | Computer science |
| concepts[4].id | https://openalex.org/C2781238097 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5484963655471802 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q175026 |
| concepts[4].display_name | Object (grammar) |
| concepts[5].id | https://openalex.org/C2776151529 |
| concepts[5].level | 3 |
| concepts[5].score | 0.45439815521240234 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q3045304 |
| concepts[5].display_name | Object detection |
| concepts[6].id | https://openalex.org/C138268822 |
| concepts[6].level | 2 |
| concepts[6].score | 0.43230748176574707 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q1051925 |
| concepts[6].display_name | Resolution (logic) |
| concepts[7].id | https://openalex.org/C3020199158 |
| concepts[7].level | 2 |
| concepts[7].score | 0.4241112470626831 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q210521 |
| concepts[7].display_name | High resolution |
| concepts[8].id | https://openalex.org/C153180895 |
| concepts[8].level | 2 |
| concepts[8].score | 0.41566234827041626 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q7148389 |
| concepts[8].display_name | Pattern recognition (psychology) |
| concepts[9].id | https://openalex.org/C62649853 |
| concepts[9].level | 1 |
| concepts[9].score | 0.18725329637527466 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q199687 |
| concepts[9].display_name | Remote sensing |
| concepts[10].id | https://openalex.org/C205649164 |
| concepts[10].level | 0 |
| concepts[10].score | 0.16095814108848572 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q1071 |
| concepts[10].display_name | Geography |
| concepts[11].id | https://openalex.org/C17744445 |
| concepts[11].level | 0 |
| concepts[11].score | 0.0 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q36442 |
| concepts[11].display_name | Political science |
| concepts[12].id | https://openalex.org/C199539241 |
| concepts[12].level | 1 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q7748 |
| concepts[12].display_name | Law |
| concepts[13].id | https://openalex.org/C94625758 |
| concepts[13].level | 2 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q7163 |
| concepts[13].display_name | Politics |
| keywords[0].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[0].score | 0.7368059754371643 |
| keywords[0].display_name | Artificial intelligence |
| keywords[1].id | https://openalex.org/keywords/computer-vision |
| keywords[1].score | 0.6954893469810486 |
| keywords[1].display_name | Computer vision |
| keywords[2].id | https://openalex.org/keywords/representation |
| keywords[2].score | 0.6522197723388672 |
| keywords[2].display_name | Representation (politics) |
| keywords[3].id | https://openalex.org/keywords/computer-science |
| keywords[3].score | 0.6250434517860413 |
| keywords[3].display_name | Computer science |
| keywords[4].id | https://openalex.org/keywords/object |
| keywords[4].score | 0.5484963655471802 |
| keywords[4].display_name | Object (grammar) |
| keywords[5].id | https://openalex.org/keywords/object-detection |
| keywords[5].score | 0.45439815521240234 |
| keywords[5].display_name | Object detection |
| keywords[6].id | https://openalex.org/keywords/resolution |
| keywords[6].score | 0.43230748176574707 |
| keywords[6].display_name | Resolution (logic) |
| keywords[7].id | https://openalex.org/keywords/high-resolution |
| keywords[7].score | 0.4241112470626831 |
| keywords[7].display_name | High resolution |
| keywords[8].id | https://openalex.org/keywords/pattern-recognition |
| keywords[8].score | 0.41566234827041626 |
| keywords[8].display_name | Pattern recognition (psychology) |
| keywords[9].id | https://openalex.org/keywords/remote-sensing |
| keywords[9].score | 0.18725329637527466 |
| keywords[9].display_name | Remote sensing |
| keywords[10].id | https://openalex.org/keywords/geography |
| keywords[10].score | 0.16095814108848572 |
| keywords[10].display_name | Geography |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2407.15354 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2407.15354 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2407.15354 |
| locations[1].id | doi:10.48550/arxiv.2407.15354 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2407.15354 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5081842988 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-8272-156X |
| authorships[0].author.display_name | Zhili Chen |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Chen, Zhili |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5041668587 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-0150-7068 |
| authorships[1].author.display_name | Shuangjie Xu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Xu, Shuangjie |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5050109730 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Maosheng Ye |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Ye, Maosheng |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5046711084 |
| authorships[3].author.orcid | https://orcid.org/0000-0001-5147-9689 |
| authorships[3].author.display_name | Zian Qian |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Qian, Zian |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5100932903 |
| authorships[4].author.orcid | https://orcid.org/0000-0003-0074-1135 |
| authorships[4].author.display_name | Xiaoyi Zou |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Zou, Xiaoyi |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5073139380 |
| authorships[5].author.orcid | https://orcid.org/0000-0003-3716-8125 |
| authorships[5].author.display_name | Dit‐Yan Yeung |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Yeung, Dit-Yan |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5100719529 |
| authorships[6].author.orcid | https://orcid.org/0000-0003-2199-3948 |
| authorships[6].author.display_name | Qifeng Chen |
| authorships[6].author_position | last |
| authorships[6].raw_author_name | Chen, Qifeng |
| authorships[6].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2407.15354 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Learning High-resolution Vector Representation from Multi-Camera Images for 3D Object Detection |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T12111 |
| primary_topic.field.id | https://openalex.org/fields/22 |
| primary_topic.field.display_name | Engineering |
| primary_topic.score | 0.9685999751091003 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/2209 |
| primary_topic.subfield.display_name | Industrial and Manufacturing Engineering |
| primary_topic.display_name | Industrial Vision Systems and Defect Detection |
| related_works | https://openalex.org/W2062195135, https://openalex.org/W1517180214, https://openalex.org/W2082780921, https://openalex.org/W2737719445, https://openalex.org/W1834370135, https://openalex.org/W4292830139, https://openalex.org/W4319309705, https://openalex.org/W4212954839, https://openalex.org/W3190051883, https://openalex.org/W4401570279 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2407.15354 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2407.15354 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2407.15354 |
| primary_location.id | pmh:oai:arXiv.org:2407.15354 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2407.15354 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2407.15354 |
| publication_date | 2024-07-22 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 5, 37, 69, 131 |
| abstract_inverted_index.3D | 12, 40, 63 |
| abstract_inverted_index.To | 31, 81 |
| abstract_inverted_index.We | 101 |
| abstract_inverted_index.as | 26, 94 |
| abstract_inverted_index.at | 68 |
| abstract_inverted_index.in | 113 |
| abstract_inverted_index.is | 4, 53 |
| abstract_inverted_index.on | 105 |
| abstract_inverted_index.to | 60 |
| abstract_inverted_index.we | 35, 119 |
| abstract_inverted_index.BEV | 19, 58 |
| abstract_inverted_index.NDS | 114 |
| abstract_inverted_index.The | 0, 48 |
| abstract_inverted_index.and | 79, 109, 115, 129 |
| abstract_inverted_index.but | 16 |
| abstract_inverted_index.can | 92 |
| abstract_inverted_index.for | 98 |
| abstract_inverted_index.new | 38 |
| abstract_inverted_index.our | 73, 125 |
| abstract_inverted_index.the | 11, 17, 27, 56, 84, 95, 106 |
| abstract_inverted_index.two | 74 |
| abstract_inverted_index.cost | 25 |
| abstract_inverted_index.end, | 83 |
| abstract_inverted_index.from | 65 |
| abstract_inverted_index.grid | 20 |
| abstract_inverted_index.high | 70 |
| abstract_inverted_index.that | 8 |
| abstract_inverted_index.this | 33, 82 |
| abstract_inverted_index.with | 43, 55, 88, 124 |
| abstract_inverted_index.(BEV) | 2 |
| abstract_inverted_index.final | 99 |
| abstract_inverted_index.novel | 75 |
| abstract_inverted_index.query | 97 |
| abstract_inverted_index.scene | 90 |
| abstract_inverted_index.serve | 93 |
| abstract_inverted_index.time. | 117 |
| abstract_inverted_index.factor | 7 |
| abstract_inverted_index.grows. | 30 |
| abstract_inverted_index.images | 67 |
| abstract_inverted_index.object | 13, 41 |
| abstract_inverted_index.richer | 89 |
| abstract_inverted_index.vector | 45, 51, 77, 86, 127 |
| abstract_inverted_index.address | 32 |
| abstract_inverted_index.conduct | 102 |
| abstract_inverted_index.dataset | 108 |
| abstract_inverted_index.exploit | 62 |
| abstract_inverted_index.impacts | 10 |
| abstract_inverted_index.induces | 22 |
| abstract_inverted_index.learned | 85 |
| abstract_inverted_index.methods | 122 |
| abstract_inverted_index.observe | 130 |
| abstract_inverted_index.present | 36 |
| abstract_inverted_index.spatial | 28 |
| abstract_inverted_index.through | 72 |
| abstract_inverted_index.combined | 54 |
| abstract_inverted_index.contexts | 91 |
| abstract_inverted_index.critical | 6 |
| abstract_inverted_index.decoding | 96 |
| abstract_inverted_index.detector | 42 |
| abstract_inverted_index.directly | 9 |
| abstract_inverted_index.geometry | 64 |
| abstract_inverted_index.modules: | 76 |
| abstract_inverted_index.nuScenes | 107 |
| abstract_inverted_index.proposed | 126 |
| abstract_inverted_index.detection | 14 |
| abstract_inverted_index.extensive | 103 |
| abstract_inverted_index.inference | 116 |
| abstract_inverted_index.presented | 49 |
| abstract_inverted_index.quadratic | 23 |
| abstract_inverted_index.consistent | 132 |
| abstract_inverted_index.gathering. | 80 |
| abstract_inverted_index.resolution | 29, 71 |
| abstract_inverted_index.scattering | 78 |
| abstract_inverted_index.demonstrate | 110 |
| abstract_inverted_index.efficiently | 61 |
| abstract_inverted_index.experiments | 104 |
| abstract_inverted_index.investigate | 120 |
| abstract_inverted_index.limitation, | 34 |
| abstract_inverted_index.performance | 112, 133 |
| abstract_inverted_index.traditional | 18 |
| abstract_inverted_index.Furthermore, | 118 |
| abstract_inverted_index.camera-based | 39 |
| abstract_inverted_index.improvement. | 134 |
| abstract_inverted_index.incorporated | 123 |
| abstract_inverted_index.multi-camera | 66 |
| abstract_inverted_index.performance, | 15 |
| abstract_inverted_index.predictions. | 100 |
| abstract_inverted_index.VectorFormer. | 47 |
| abstract_inverted_index.computational | 24 |
| abstract_inverted_index.representation | 3, 21, 52, 59, 87, 128 |
| abstract_inverted_index.Bird's-Eye-View | 1 |
| abstract_inverted_index.high-resolution | 44, 50 |
| abstract_inverted_index.query-BEV-based | 121 |
| abstract_inverted_index.representation: | 46 |
| abstract_inverted_index.lower-resolution | 57 |
| abstract_inverted_index.state-of-the-art | 111 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 7 |
| citation_normalized_percentile |