Learning High-resolution Vector Representation from Multi-Camera Images for 3D Object Detection Article Swipe

PDF

Zhili Chen , Shuangjie Xu , Maosheng Ye , Zian Qian , Xiaoyi Zou , Dit‐Yan Yeung , Qifeng Chen ·

YOU? · · 2024 · Open Access · · DOI: https://doi.org/10.48550/arxiv.2407.15354

The Bird's-Eye-View (BEV) representation is a critical factor that directly impacts the 3D object detection performance, but the traditional BEV grid representation induces quadratic computational cost as the spatial resolution grows. To address this limitation, we present a new camera-based 3D object detector with high-resolution vector representation: VectorFormer. The presented high-resolution vector representation is combined with the lower-resolution BEV representation to efficiently exploit 3D geometry from multi-camera images at a high resolution through our two novel modules: vector scattering and gathering. To this end, the learned vector representation with richer scene contexts can serve as the decoding query for final predictions. We conduct extensive experiments on the nuScenes dataset and demonstrate state-of-the-art performance in NDS and inference time. Furthermore, we investigate query-BEV-based methods incorporated with our proposed vector representation and observe a consistent performance improvement.

Related Topics

Artificial Intelligence

Concepts

Artificial intelligence Computer vision Representation (politics) Computer science Object (grammar) Object detection Resolution (logic) High resolution Pattern recognition (psychology) Remote sensing Geography Political science Law Politics

Metadata

Type: preprint
Language: en
Landing Page: http://arxiv.org/abs/2407.15354
PDF: https://arxiv.org/pdf/2407.15354
OA Status: green
Related Works: 10
OpenAlex ID: https://openalex.org/W4406073071

All OpenAlex metadata

Raw OpenAlex JSON

OpenAlex ID: https://openalex.org/W4406073071

Canonical identifier for this work in OpenAlex
DOI: https://doi.org/10.48550/arxiv.2407.15354

Digital Object Identifier
Title: Learning High-resolution Vector Representation from Multi-Camera Images for 3D Object Detection

Work title
Type: preprint

OpenAlex work type
Language: en

Primary language
Publication year: 2024

Year of publication
Publication date: 2024-07-22

Full publication date if available
Authors: Zhili Chen, Shuangjie Xu, Maosheng Ye, Zian Qian, Xiaoyi Zou, Dit‐Yan Yeung, Qifeng Chen

List of authors in order
Landing page: https://arxiv.org/abs/2407.15354

Publisher landing page
PDF URL: https://arxiv.org/pdf/2407.15354

Direct link to full text PDF
Open access: Yes

Whether a free full text is available
OA status: green

Open access status per OpenAlex
OA URL: https://arxiv.org/pdf/2407.15354

Direct OA link when available
Concepts: Artificial intelligence, Computer vision, Representation (politics), Computer science, Object (grammar), Object detection, Resolution (logic), High resolution, Pattern recognition (psychology), Remote sensing, Geography, Political science, Law, Politics

Top concepts (fields/topics) attached by OpenAlex
Cited by: 0

Total citation count in OpenAlex
Related works (count): 10

Other works algorithmically related by OpenAlex

Full payload

id	https://openalex.org/W4406073071
doi	https://doi.org/10.48550/arxiv.2407.15354
ids.doi	https://doi.org/10.48550/arxiv.2407.15354
ids.openalex	https://openalex.org/W4406073071
fwci
type	preprint
title	Learning High-resolution Vector Representation from Multi-Camera Images for 3D Object Detection
biblio.issue
biblio.volume
biblio.last_page
biblio.first_page
topics[0].id	https://openalex.org/T12111
topics[0].field.id	https://openalex.org/fields/22
topics[0].field.display_name	Engineering
topics[0].score	0.9685999751091003
topics[0].domain.id	https://openalex.org/domains/3
topics[0].domain.display_name	Physical Sciences
topics[0].subfield.id	https://openalex.org/subfields/2209
topics[0].subfield.display_name	Industrial and Manufacturing Engineering
topics[0].display_name	Industrial Vision Systems and Defect Detection
topics[1].id	https://openalex.org/T10036
topics[1].field.id	https://openalex.org/fields/17
topics[1].field.display_name	Computer Science
topics[1].score	0.9648000001907349
topics[1].domain.id	https://openalex.org/domains/3
topics[1].domain.display_name	Physical Sciences
topics[1].subfield.id	https://openalex.org/subfields/1707
topics[1].subfield.display_name	Computer Vision and Pattern Recognition
topics[1].display_name	Advanced Neural Network Applications
topics[2].id	https://openalex.org/T10627
topics[2].field.id	https://openalex.org/fields/17
topics[2].field.display_name	Computer Science
topics[2].score	0.9483000040054321
topics[2].domain.id	https://openalex.org/domains/3
topics[2].domain.display_name	Physical Sciences
topics[2].subfield.id	https://openalex.org/subfields/1707
topics[2].subfield.display_name	Computer Vision and Pattern Recognition
topics[2].display_name	Advanced Image and Video Retrieval Techniques
is_xpac	False
apc_list
apc_paid
concepts[0].id	https://openalex.org/C154945302
concepts[0].level	1
concepts[0].score	0.7368059754371643
concepts[0].wikidata	https://www.wikidata.org/wiki/Q11660
concepts[0].display_name	Artificial intelligence
concepts[1].id	https://openalex.org/C31972630
concepts[1].level	1
concepts[1].score	0.6954893469810486
concepts[1].wikidata	https://www.wikidata.org/wiki/Q844240
concepts[1].display_name	Computer vision
concepts[2].id	https://openalex.org/C2776359362
concepts[2].level	3
concepts[2].score	0.6522197723388672
concepts[2].wikidata	https://www.wikidata.org/wiki/Q2145286
concepts[2].display_name	Representation (politics)
concepts[3].id	https://openalex.org/C41008148
concepts[3].level	0
concepts[3].score	0.6250434517860413
concepts[3].wikidata	https://www.wikidata.org/wiki/Q21198
concepts[3].display_name	Computer science
concepts[4].id	https://openalex.org/C2781238097
concepts[4].level	2
concepts[4].score	0.5484963655471802
concepts[4].wikidata	https://www.wikidata.org/wiki/Q175026
concepts[4].display_name	Object (grammar)
concepts[5].id	https://openalex.org/C2776151529
concepts[5].level	3
concepts[5].score	0.45439815521240234
concepts[5].wikidata	https://www.wikidata.org/wiki/Q3045304
concepts[5].display_name	Object detection
concepts[6].id	https://openalex.org/C138268822
concepts[6].level	2
concepts[6].score	0.43230748176574707
concepts[6].wikidata	https://www.wikidata.org/wiki/Q1051925
concepts[6].display_name	Resolution (logic)
concepts[7].id	https://openalex.org/C3020199158
concepts[7].level	2
concepts[7].score	0.4241112470626831
concepts[7].wikidata	https://www.wikidata.org/wiki/Q210521
concepts[7].display_name	High resolution
concepts[8].id	https://openalex.org/C153180895
concepts[8].level	2
concepts[8].score	0.41566234827041626
concepts[8].wikidata	https://www.wikidata.org/wiki/Q7148389
concepts[8].display_name	Pattern recognition (psychology)
concepts[9].id	https://openalex.org/C62649853
concepts[9].level	1
concepts[9].score	0.18725329637527466
concepts[9].wikidata	https://www.wikidata.org/wiki/Q199687
concepts[9].display_name	Remote sensing
concepts[10].id	https://openalex.org/C205649164
concepts[10].level	0
concepts[10].score	0.16095814108848572
concepts[10].wikidata	https://www.wikidata.org/wiki/Q1071
concepts[10].display_name	Geography
concepts[11].id	https://openalex.org/C17744445
concepts[11].level	0
concepts[11].score	0.0
concepts[11].wikidata	https://www.wikidata.org/wiki/Q36442
concepts[11].display_name	Political science
concepts[12].id	https://openalex.org/C199539241
concepts[12].level	1
concepts[12].score	0.0
concepts[12].wikidata	https://www.wikidata.org/wiki/Q7748
concepts[12].display_name	Law
concepts[13].id	https://openalex.org/C94625758
concepts[13].level	2
concepts[13].score	0.0
concepts[13].wikidata	https://www.wikidata.org/wiki/Q7163
concepts[13].display_name	Politics
keywords[0].id	https://openalex.org/keywords/artificial-intelligence
keywords[0].score	0.7368059754371643
keywords[0].display_name	Artificial intelligence
keywords[1].id	https://openalex.org/keywords/computer-vision
keywords[1].score	0.6954893469810486
keywords[1].display_name	Computer vision
keywords[2].id	https://openalex.org/keywords/representation
keywords[2].score	0.6522197723388672
keywords[2].display_name	Representation (politics)
keywords[3].id	https://openalex.org/keywords/computer-science
keywords[3].score	0.6250434517860413
keywords[3].display_name	Computer science
keywords[4].id	https://openalex.org/keywords/object
keywords[4].score	0.5484963655471802
keywords[4].display_name	Object (grammar)
keywords[5].id	https://openalex.org/keywords/object-detection
keywords[5].score	0.45439815521240234
keywords[5].display_name	Object detection
keywords[6].id	https://openalex.org/keywords/resolution
keywords[6].score	0.43230748176574707
keywords[6].display_name	Resolution (logic)
keywords[7].id	https://openalex.org/keywords/high-resolution
keywords[7].score	0.4241112470626831
keywords[7].display_name	High resolution
keywords[8].id	https://openalex.org/keywords/pattern-recognition
keywords[8].score	0.41566234827041626
keywords[8].display_name	Pattern recognition (psychology)
keywords[9].id	https://openalex.org/keywords/remote-sensing
keywords[9].score	0.18725329637527466
keywords[9].display_name	Remote sensing
keywords[10].id	https://openalex.org/keywords/geography
keywords[10].score	0.16095814108848572
keywords[10].display_name	Geography
language	en
locations[0].id	pmh:oai:arXiv.org:2407.15354
locations[0].is_oa	True
locations[0].source.id	https://openalex.org/S4306400194
locations[0].source.issn
locations[0].source.type	repository
locations[0].source.is_oa	True
locations[0].source.issn_l
locations[0].source.is_core	False
locations[0].source.is_in_doaj	False
locations[0].source.display_name	arXiv (Cornell University)
locations[0].source.host_organization	https://openalex.org/I205783295
locations[0].source.host_organization_name	Cornell University
locations[0].source.host_organization_lineage	https://openalex.org/I205783295
locations[0].license
locations[0].pdf_url	https://arxiv.org/pdf/2407.15354
locations[0].version	submittedVersion
locations[0].raw_type
locations[0].license_id
locations[0].is_accepted	False
locations[0].is_published	False
locations[0].raw_source_name
locations[0].landing_page_url	http://arxiv.org/abs/2407.15354
locations[1].id	doi:10.48550/arxiv.2407.15354
locations[1].is_oa	True
locations[1].source.id	https://openalex.org/S4306400194
locations[1].source.issn
locations[1].source.type	repository
locations[1].source.is_oa	True
locations[1].source.issn_l
locations[1].source.is_core	False
locations[1].source.is_in_doaj	False
locations[1].source.display_name	arXiv (Cornell University)
locations[1].source.host_organization	https://openalex.org/I205783295
locations[1].source.host_organization_name	Cornell University
locations[1].source.host_organization_lineage	https://openalex.org/I205783295
locations[1].license
locations[1].pdf_url
locations[1].version
locations[1].raw_type	article
locations[1].license_id
locations[1].is_accepted	False
locations[1].is_published
locations[1].raw_source_name
locations[1].landing_page_url	https://doi.org/10.48550/arxiv.2407.15354
indexed_in	arxiv, datacite
authorships[0].author.id	https://openalex.org/A5081842988
authorships[0].author.orcid	https://orcid.org/0000-0002-8272-156X
authorships[0].author.display_name	Zhili Chen
authorships[0].author_position	first
authorships[0].raw_author_name	Chen, Zhili
authorships[0].is_corresponding	False
authorships[1].author.id	https://openalex.org/A5041668587
authorships[1].author.orcid	https://orcid.org/0000-0003-0150-7068
authorships[1].author.display_name	Shuangjie Xu
authorships[1].author_position	middle
authorships[1].raw_author_name	Xu, Shuangjie
authorships[1].is_corresponding	False
authorships[2].author.id	https://openalex.org/A5050109730
authorships[2].author.orcid
authorships[2].author.display_name	Maosheng Ye
authorships[2].author_position	middle
authorships[2].raw_author_name	Ye, Maosheng
authorships[2].is_corresponding	False
authorships[3].author.id	https://openalex.org/A5046711084
authorships[3].author.orcid	https://orcid.org/0000-0001-5147-9689
authorships[3].author.display_name	Zian Qian
authorships[3].author_position	middle
authorships[3].raw_author_name	Qian, Zian
authorships[3].is_corresponding	False
authorships[4].author.id	https://openalex.org/A5100932903
authorships[4].author.orcid	https://orcid.org/0000-0003-0074-1135
authorships[4].author.display_name	Xiaoyi Zou
authorships[4].author_position	middle
authorships[4].raw_author_name	Zou, Xiaoyi
authorships[4].is_corresponding	False
authorships[5].author.id	https://openalex.org/A5073139380
authorships[5].author.orcid	https://orcid.org/0000-0003-3716-8125
authorships[5].author.display_name	Dit‐Yan Yeung
authorships[5].author_position	middle
authorships[5].raw_author_name	Yeung, Dit-Yan
authorships[5].is_corresponding	False
authorships[6].author.id	https://openalex.org/A5100719529
authorships[6].author.orcid	https://orcid.org/0000-0003-2199-3948
authorships[6].author.display_name	Qifeng Chen
authorships[6].author_position	last
authorships[6].raw_author_name	Chen, Qifeng
authorships[6].is_corresponding	False
has_content.pdf	False
has_content.grobid_xml	False
is_paratext	False
open_access.is_oa	True
open_access.oa_url	https://arxiv.org/pdf/2407.15354
open_access.oa_status	green
open_access.any_repository_has_fulltext	False
created_date	2025-10-10T00:00:00
display_name	Learning High-resolution Vector Representation from Multi-Camera Images for 3D Object Detection
has_fulltext	False
is_retracted	False
updated_date	2025-11-06T06:51:31.235846
primary_topic.id	https://openalex.org/T12111
primary_topic.field.id	https://openalex.org/fields/22
primary_topic.field.display_name	Engineering
primary_topic.score	0.9685999751091003
primary_topic.domain.id	https://openalex.org/domains/3
primary_topic.domain.display_name	Physical Sciences
primary_topic.subfield.id	https://openalex.org/subfields/2209
primary_topic.subfield.display_name	Industrial and Manufacturing Engineering
primary_topic.display_name	Industrial Vision Systems and Defect Detection
related_works	https://openalex.org/W2062195135, https://openalex.org/W1517180214, https://openalex.org/W2082780921, https://openalex.org/W2737719445, https://openalex.org/W1834370135, https://openalex.org/W4292830139, https://openalex.org/W4319309705, https://openalex.org/W4212954839, https://openalex.org/W3190051883, https://openalex.org/W4401570279
cited_by_count	0
locations_count	2
best_oa_location.id	pmh:oai:arXiv.org:2407.15354
best_oa_location.is_oa	True
best_oa_location.source.id	https://openalex.org/S4306400194
best_oa_location.source.issn
best_oa_location.source.type	repository
best_oa_location.source.is_oa	True
best_oa_location.source.issn_l
best_oa_location.source.is_core	False
best_oa_location.source.is_in_doaj	False
best_oa_location.source.display_name	arXiv (Cornell University)
best_oa_location.source.host_organization	https://openalex.org/I205783295
best_oa_location.source.host_organization_name	Cornell University
best_oa_location.source.host_organization_lineage	https://openalex.org/I205783295
best_oa_location.license
best_oa_location.pdf_url	https://arxiv.org/pdf/2407.15354
best_oa_location.version	submittedVersion
best_oa_location.raw_type
best_oa_location.license_id
best_oa_location.is_accepted	False
best_oa_location.is_published	False
best_oa_location.raw_source_name
best_oa_location.landing_page_url	http://arxiv.org/abs/2407.15354
primary_location.id	pmh:oai:arXiv.org:2407.15354
primary_location.is_oa	True
primary_location.source.id	https://openalex.org/S4306400194
primary_location.source.issn
primary_location.source.type	repository
primary_location.source.is_oa	True
primary_location.source.issn_l
primary_location.source.is_core	False
primary_location.source.is_in_doaj	False
primary_location.source.display_name	arXiv (Cornell University)
primary_location.source.host_organization	https://openalex.org/I205783295
primary_location.source.host_organization_name	Cornell University
primary_location.source.host_organization_lineage	https://openalex.org/I205783295
primary_location.license
primary_location.pdf_url	https://arxiv.org/pdf/2407.15354
primary_location.version	submittedVersion
primary_location.raw_type
primary_location.license_id
primary_location.is_accepted	False
primary_location.is_published	False
primary_location.raw_source_name
primary_location.landing_page_url	http://arxiv.org/abs/2407.15354
publication_date	2024-07-22
publication_year	2024
referenced_works_count	0
abstract_inverted_index.a	5, 37, 69, 131
abstract_inverted_index.3D	12, 40, 63
abstract_inverted_index.To	31, 81
abstract_inverted_index.We	101
abstract_inverted_index.as	26, 94
abstract_inverted_index.at	68
abstract_inverted_index.in	113
abstract_inverted_index.is	4, 53
abstract_inverted_index.on	105
abstract_inverted_index.to	60
abstract_inverted_index.we	35, 119
abstract_inverted_index.BEV	19, 58
abstract_inverted_index.NDS	114
abstract_inverted_index.The	0, 48
abstract_inverted_index.and	79, 109, 115, 129
abstract_inverted_index.but	16
abstract_inverted_index.can	92
abstract_inverted_index.for	98
abstract_inverted_index.new	38
abstract_inverted_index.our	73, 125
abstract_inverted_index.the	11, 17, 27, 56, 84, 95, 106
abstract_inverted_index.two	74
abstract_inverted_index.cost	25
abstract_inverted_index.end,	83
abstract_inverted_index.from	65
abstract_inverted_index.grid	20
abstract_inverted_index.high	70
abstract_inverted_index.that	8
abstract_inverted_index.this	33, 82
abstract_inverted_index.with	43, 55, 88, 124
abstract_inverted_index.(BEV)	2
abstract_inverted_index.final	99
abstract_inverted_index.novel	75
abstract_inverted_index.query	97
abstract_inverted_index.scene	90
abstract_inverted_index.serve	93
abstract_inverted_index.time.	117
abstract_inverted_index.factor	7
abstract_inverted_index.grows.	30
abstract_inverted_index.images	67
abstract_inverted_index.object	13, 41
abstract_inverted_index.richer	89
abstract_inverted_index.vector	45, 51, 77, 86, 127
abstract_inverted_index.address	32
abstract_inverted_index.conduct	102
abstract_inverted_index.dataset	108
abstract_inverted_index.exploit	62
abstract_inverted_index.impacts	10
abstract_inverted_index.induces	22
abstract_inverted_index.learned	85
abstract_inverted_index.methods	122
abstract_inverted_index.observe	130
abstract_inverted_index.present	36
abstract_inverted_index.spatial	28
abstract_inverted_index.through	72
abstract_inverted_index.combined	54
abstract_inverted_index.contexts	91
abstract_inverted_index.critical	6
abstract_inverted_index.decoding	96
abstract_inverted_index.detector	42
abstract_inverted_index.directly	9
abstract_inverted_index.geometry	64
abstract_inverted_index.modules:	76
abstract_inverted_index.nuScenes	107
abstract_inverted_index.proposed	126
abstract_inverted_index.detection	14
abstract_inverted_index.extensive	103
abstract_inverted_index.inference	116
abstract_inverted_index.presented	49
abstract_inverted_index.quadratic	23
abstract_inverted_index.consistent	132
abstract_inverted_index.gathering.	80
abstract_inverted_index.resolution	29, 71
abstract_inverted_index.scattering	78
abstract_inverted_index.demonstrate	110
abstract_inverted_index.efficiently	61
abstract_inverted_index.experiments	104
abstract_inverted_index.investigate	120
abstract_inverted_index.limitation,	34
abstract_inverted_index.performance	112, 133
abstract_inverted_index.traditional	18
abstract_inverted_index.Furthermore,	118
abstract_inverted_index.camera-based	39
abstract_inverted_index.improvement.	134
abstract_inverted_index.incorporated	123
abstract_inverted_index.multi-camera	66
abstract_inverted_index.performance,	15
abstract_inverted_index.predictions.	100
abstract_inverted_index.VectorFormer.	47
abstract_inverted_index.computational	24
abstract_inverted_index.representation	3, 21, 52, 59, 87, 128
abstract_inverted_index.Bird's-Eye-View	1
abstract_inverted_index.high-resolution	44, 50
abstract_inverted_index.query-BEV-based	121
abstract_inverted_index.representation:	46
abstract_inverted_index.lower-resolution	57
abstract_inverted_index.state-of-the-art	111
cited_by_percentile_year
countries_distinct_count	0
institutions_distinct_count	7
citation_normalized_percentile