More Pictures Say More: Visual Intersection Network for Open Set Object Detection Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2408.14032
Open Set Object Detection has seen rapid development recently, but it continues to pose significant challenges. Language-based methods, grappling with the substantial modal disparity between textual and visual modalities, require extensive computational resources to bridge this gap. Although integrating visual prompts into these frameworks shows promise for enhancing performance, it always comes with constraints related to textual semantics. In contrast, viusal-only methods suffer from the low-quality fusion of multiple visual prompts. In response, we introduce a strong DETR-based model, Visual Intersection Network for Open Set Object Detection (VINO), which constructs a multi-image visual bank to preserve the semantic intersections of each category across all time steps. Our innovative multi-image visual updating mechanism learns to identify the semantic intersections from various visual prompts, enabling the flexible incorporation of new information and continuous optimization of feature representations. Our approach guarantees a more precise alignment between target category semantics and region semantics, while significantly reducing pre-training time and resource demands compared to language-based methods. Furthermore, the integration of a segmentation head illustrates the broad applicability of visual intersection in various visual tasks. VINO, which requires only 7 RTX4090 GPU days to complete one epoch on the Objects365v1 dataset, achieves competitive performance on par with vision-language models on benchmarks such as LVIS and ODinW35.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2408.14032
- https://arxiv.org/pdf/2408.14032
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4402701946
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4402701946Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2408.14032Digital Object Identifier
- Title
-
More Pictures Say More: Visual Intersection Network for Open Set Object DetectionWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-08-26Full publication date if available
- Authors
-
Bingcheng Dong, Yuning Ding, Jinrong Zhang, Sifan Zhang, Shenglan LiuList of authors in order
- Landing page
-
https://arxiv.org/abs/2408.14032Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2408.14032Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2408.14032Direct OA link when available
- Concepts
-
Intersection (aeronautics), Computer science, Artificial intelligence, Computer vision, Set (abstract data type), Object (grammar), Cartography, Geography, Programming languageTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4402701946 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2408.14032 |
| ids.doi | https://doi.org/10.48550/arxiv.2408.14032 |
| ids.openalex | https://openalex.org/W4402701946 |
| fwci | |
| type | preprint |
| title | More Pictures Say More: Visual Intersection Network for Open Set Object Detection |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10036 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9900000095367432 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Advanced Neural Network Applications |
| topics[1].id | https://openalex.org/T10627 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9796000123023987 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1707 |
| topics[1].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[1].display_name | Advanced Image and Video Retrieval Techniques |
| topics[2].id | https://openalex.org/T10531 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9789999723434448 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1707 |
| topics[2].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[2].display_name | Advanced Vision and Imaging |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C64543145 |
| concepts[0].level | 2 |
| concepts[0].score | 0.7805237770080566 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q162942 |
| concepts[0].display_name | Intersection (aeronautics) |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.6517740488052368 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C154945302 |
| concepts[2].level | 1 |
| concepts[2].score | 0.6031495928764343 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[2].display_name | Artificial intelligence |
| concepts[3].id | https://openalex.org/C31972630 |
| concepts[3].level | 1 |
| concepts[3].score | 0.5819099545478821 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q844240 |
| concepts[3].display_name | Computer vision |
| concepts[4].id | https://openalex.org/C177264268 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5661059021949768 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q1514741 |
| concepts[4].display_name | Set (abstract data type) |
| concepts[5].id | https://openalex.org/C2781238097 |
| concepts[5].level | 2 |
| concepts[5].score | 0.5637830495834351 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q175026 |
| concepts[5].display_name | Object (grammar) |
| concepts[6].id | https://openalex.org/C58640448 |
| concepts[6].level | 1 |
| concepts[6].score | 0.11463180184364319 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q42515 |
| concepts[6].display_name | Cartography |
| concepts[7].id | https://openalex.org/C205649164 |
| concepts[7].level | 0 |
| concepts[7].score | 0.09791037440299988 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q1071 |
| concepts[7].display_name | Geography |
| concepts[8].id | https://openalex.org/C199360897 |
| concepts[8].level | 1 |
| concepts[8].score | 0.06277486681938171 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q9143 |
| concepts[8].display_name | Programming language |
| keywords[0].id | https://openalex.org/keywords/intersection |
| keywords[0].score | 0.7805237770080566 |
| keywords[0].display_name | Intersection (aeronautics) |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.6517740488052368 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[2].score | 0.6031495928764343 |
| keywords[2].display_name | Artificial intelligence |
| keywords[3].id | https://openalex.org/keywords/computer-vision |
| keywords[3].score | 0.5819099545478821 |
| keywords[3].display_name | Computer vision |
| keywords[4].id | https://openalex.org/keywords/set |
| keywords[4].score | 0.5661059021949768 |
| keywords[4].display_name | Set (abstract data type) |
| keywords[5].id | https://openalex.org/keywords/object |
| keywords[5].score | 0.5637830495834351 |
| keywords[5].display_name | Object (grammar) |
| keywords[6].id | https://openalex.org/keywords/cartography |
| keywords[6].score | 0.11463180184364319 |
| keywords[6].display_name | Cartography |
| keywords[7].id | https://openalex.org/keywords/geography |
| keywords[7].score | 0.09791037440299988 |
| keywords[7].display_name | Geography |
| keywords[8].id | https://openalex.org/keywords/programming-language |
| keywords[8].score | 0.06277486681938171 |
| keywords[8].display_name | Programming language |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2408.14032 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2408.14032 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2408.14032 |
| locations[1].id | doi:10.48550/arxiv.2408.14032 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2408.14032 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5114230977 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Bingcheng Dong |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Dong, Bingcheng |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5081915723 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Yuning Ding |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Ding, Yuning |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5089295072 |
| authorships[2].author.orcid | https://orcid.org/0000-0003-0774-8479 |
| authorships[2].author.display_name | Jinrong Zhang |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Zhang, Jinrong |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5101638977 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-6086-9678 |
| authorships[3].author.display_name | Sifan Zhang |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Zhang, Sifan |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5101525691 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-2250-2365 |
| authorships[4].author.display_name | Shenglan Liu |
| authorships[4].author_position | last |
| authorships[4].raw_author_name | Liu, Shenglan |
| authorships[4].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2408.14032 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2024-09-21T00:00:00 |
| display_name | More Pictures Say More: Visual Intersection Network for Open Set Object Detection |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10036 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9900000095367432 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Advanced Neural Network Applications |
| related_works | https://openalex.org/W2058170566, https://openalex.org/W2755342338, https://openalex.org/W2772917594, https://openalex.org/W2775347418, https://openalex.org/W2166024367, https://openalex.org/W3116076068, https://openalex.org/W2229312674, https://openalex.org/W2951359407, https://openalex.org/W2079911747, https://openalex.org/W1969923398 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2408.14032 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2408.14032 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2408.14032 |
| primary_location.id | pmh:oai:arXiv.org:2408.14032 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2408.14032 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2408.14032 |
| publication_date | 2024-08-26 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.7 | 183 |
| abstract_inverted_index.a | 75, 90, 138, 165 |
| abstract_inverted_index.In | 58, 71 |
| abstract_inverted_index.as | 206 |
| abstract_inverted_index.in | 175 |
| abstract_inverted_index.it | 10, 49 |
| abstract_inverted_index.of | 67, 99, 126, 132, 164, 172 |
| abstract_inverted_index.on | 191, 198, 203 |
| abstract_inverted_index.to | 12, 33, 55, 94, 113, 158, 187 |
| abstract_inverted_index.we | 73 |
| abstract_inverted_index.GPU | 185 |
| abstract_inverted_index.Our | 106, 135 |
| abstract_inverted_index.Set | 1, 84 |
| abstract_inverted_index.all | 103 |
| abstract_inverted_index.and | 26, 129, 146, 154, 208 |
| abstract_inverted_index.but | 9 |
| abstract_inverted_index.for | 46, 82 |
| abstract_inverted_index.has | 4 |
| abstract_inverted_index.new | 127 |
| abstract_inverted_index.one | 189 |
| abstract_inverted_index.par | 199 |
| abstract_inverted_index.the | 20, 64, 96, 115, 123, 162, 169, 192 |
| abstract_inverted_index.LVIS | 207 |
| abstract_inverted_index.Open | 0, 83 |
| abstract_inverted_index.bank | 93 |
| abstract_inverted_index.days | 186 |
| abstract_inverted_index.each | 100 |
| abstract_inverted_index.from | 63, 118 |
| abstract_inverted_index.gap. | 36 |
| abstract_inverted_index.head | 167 |
| abstract_inverted_index.into | 41 |
| abstract_inverted_index.more | 139 |
| abstract_inverted_index.only | 182 |
| abstract_inverted_index.pose | 13 |
| abstract_inverted_index.seen | 5 |
| abstract_inverted_index.such | 205 |
| abstract_inverted_index.this | 35 |
| abstract_inverted_index.time | 104, 153 |
| abstract_inverted_index.with | 19, 52, 200 |
| abstract_inverted_index.VINO, | 179 |
| abstract_inverted_index.broad | 170 |
| abstract_inverted_index.comes | 51 |
| abstract_inverted_index.epoch | 190 |
| abstract_inverted_index.modal | 22 |
| abstract_inverted_index.rapid | 6 |
| abstract_inverted_index.shows | 44 |
| abstract_inverted_index.these | 42 |
| abstract_inverted_index.which | 88, 180 |
| abstract_inverted_index.while | 149 |
| abstract_inverted_index.Object | 2, 85 |
| abstract_inverted_index.Visual | 79 |
| abstract_inverted_index.across | 102 |
| abstract_inverted_index.always | 50 |
| abstract_inverted_index.bridge | 34 |
| abstract_inverted_index.fusion | 66 |
| abstract_inverted_index.learns | 112 |
| abstract_inverted_index.model, | 78 |
| abstract_inverted_index.models | 202 |
| abstract_inverted_index.region | 147 |
| abstract_inverted_index.steps. | 105 |
| abstract_inverted_index.strong | 76 |
| abstract_inverted_index.suffer | 62 |
| abstract_inverted_index.target | 143 |
| abstract_inverted_index.tasks. | 178 |
| abstract_inverted_index.visual | 27, 39, 69, 92, 109, 120, 173, 177 |
| abstract_inverted_index.(VINO), | 87 |
| abstract_inverted_index.Network | 81 |
| abstract_inverted_index.RTX4090 | 184 |
| abstract_inverted_index.between | 24, 142 |
| abstract_inverted_index.demands | 156 |
| abstract_inverted_index.feature | 133 |
| abstract_inverted_index.methods | 61 |
| abstract_inverted_index.precise | 140 |
| abstract_inverted_index.promise | 45 |
| abstract_inverted_index.prompts | 40 |
| abstract_inverted_index.related | 54 |
| abstract_inverted_index.require | 29 |
| abstract_inverted_index.textual | 25, 56 |
| abstract_inverted_index.various | 119, 176 |
| abstract_inverted_index.Although | 37 |
| abstract_inverted_index.ODinW35. | 209 |
| abstract_inverted_index.achieves | 195 |
| abstract_inverted_index.approach | 136 |
| abstract_inverted_index.category | 101, 144 |
| abstract_inverted_index.compared | 157 |
| abstract_inverted_index.complete | 188 |
| abstract_inverted_index.dataset, | 194 |
| abstract_inverted_index.enabling | 122 |
| abstract_inverted_index.flexible | 124 |
| abstract_inverted_index.identify | 114 |
| abstract_inverted_index.methods, | 17 |
| abstract_inverted_index.methods. | 160 |
| abstract_inverted_index.multiple | 68 |
| abstract_inverted_index.preserve | 95 |
| abstract_inverted_index.prompts, | 121 |
| abstract_inverted_index.prompts. | 70 |
| abstract_inverted_index.reducing | 151 |
| abstract_inverted_index.requires | 181 |
| abstract_inverted_index.resource | 155 |
| abstract_inverted_index.semantic | 97, 116 |
| abstract_inverted_index.updating | 110 |
| abstract_inverted_index.Detection | 3, 86 |
| abstract_inverted_index.alignment | 141 |
| abstract_inverted_index.continues | 11 |
| abstract_inverted_index.contrast, | 59 |
| abstract_inverted_index.disparity | 23 |
| abstract_inverted_index.enhancing | 47 |
| abstract_inverted_index.extensive | 30 |
| abstract_inverted_index.grappling | 18 |
| abstract_inverted_index.introduce | 74 |
| abstract_inverted_index.mechanism | 111 |
| abstract_inverted_index.recently, | 8 |
| abstract_inverted_index.resources | 32 |
| abstract_inverted_index.response, | 72 |
| abstract_inverted_index.semantics | 145 |
| abstract_inverted_index.DETR-based | 77 |
| abstract_inverted_index.benchmarks | 204 |
| abstract_inverted_index.constructs | 89 |
| abstract_inverted_index.continuous | 130 |
| abstract_inverted_index.frameworks | 43 |
| abstract_inverted_index.guarantees | 137 |
| abstract_inverted_index.innovative | 107 |
| abstract_inverted_index.semantics, | 148 |
| abstract_inverted_index.semantics. | 57 |
| abstract_inverted_index.challenges. | 15 |
| abstract_inverted_index.competitive | 196 |
| abstract_inverted_index.constraints | 53 |
| abstract_inverted_index.development | 7 |
| abstract_inverted_index.illustrates | 168 |
| abstract_inverted_index.information | 128 |
| abstract_inverted_index.integrating | 38 |
| abstract_inverted_index.integration | 163 |
| abstract_inverted_index.low-quality | 65 |
| abstract_inverted_index.modalities, | 28 |
| abstract_inverted_index.multi-image | 91, 108 |
| abstract_inverted_index.performance | 197 |
| abstract_inverted_index.significant | 14 |
| abstract_inverted_index.substantial | 21 |
| abstract_inverted_index.viusal-only | 60 |
| abstract_inverted_index.Furthermore, | 161 |
| abstract_inverted_index.Intersection | 80 |
| abstract_inverted_index.Objects365v1 | 193 |
| abstract_inverted_index.intersection | 174 |
| abstract_inverted_index.optimization | 131 |
| abstract_inverted_index.performance, | 48 |
| abstract_inverted_index.pre-training | 152 |
| abstract_inverted_index.segmentation | 166 |
| abstract_inverted_index.applicability | 171 |
| abstract_inverted_index.computational | 31 |
| abstract_inverted_index.incorporation | 125 |
| abstract_inverted_index.intersections | 98, 117 |
| abstract_inverted_index.significantly | 150 |
| abstract_inverted_index.Language-based | 16 |
| abstract_inverted_index.language-based | 159 |
| abstract_inverted_index.vision-language | 201 |
| abstract_inverted_index.representations. | 134 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 5 |
| citation_normalized_percentile |