OVT-B: A New Large-Scale Benchmark for Open-Vocabulary Multi-Object Tracking Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2410.17534
Open-vocabulary object perception has become an important topic in artificial intelligence, which aims to identify objects with novel classes that have not been seen during training. Under this setting, open-vocabulary object detection (OVD) in a single image has been studied in many literature. However, open-vocabulary object tracking (OVT) from a video has been studied less, and one reason is the shortage of benchmarks. In this work, we have built a new large-scale benchmark for open-vocabulary multi-object tracking namely OVT-B. OVT-B contains 1,048 categories of objects and 1,973 videos with 637,608 bounding box annotations, which is much larger than the sole open-vocabulary tracking dataset, i.e., OVTAO-val dataset (200+ categories, 900+ videos). The proposed OVT-B can be used as a new benchmark to pave the way for OVT research. We also develop a simple yet effective baseline method for OVT. It integrates the motion features for object tracking, which is an important feature for MOT but is ignored in previous OVT methods. Experimental results have verified the usefulness of the proposed benchmark and the effectiveness of our method. We have released the benchmark to the public at https://github.com/Coo1Sea/OVT-B-Dataset.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2410.17534
- https://arxiv.org/pdf/2410.17534
- OA Status
- green
- Cited By
- 1
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4404305053
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4404305053Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2410.17534Digital Object Identifier
- Title
-
OVT-B: A New Large-Scale Benchmark for Open-Vocabulary Multi-Object TrackingWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-10-23Full publication date if available
- Authors
-
Hai‐Wei Liang, Ruize HanList of authors in order
- Landing page
-
https://arxiv.org/abs/2410.17534Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2410.17534Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2410.17534Direct OA link when available
- Concepts
-
Benchmark (surveying), Computer science, Scale (ratio), Vocabulary, Object (grammar), Tracking (education), Artificial intelligence, Video tracking, Psychology, Linguistics, Cartography, Geography, Pedagogy, PhilosophyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4404305053 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2410.17534 |
| ids.doi | https://doi.org/10.48550/arxiv.2410.17534 |
| ids.openalex | https://openalex.org/W4404305053 |
| fwci | |
| type | preprint |
| title | OVT-B: A New Large-Scale Benchmark for Open-Vocabulary Multi-Object Tracking |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T12031 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9358999729156494 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Speech and dialogue systems |
| topics[1].id | https://openalex.org/T10181 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9218999743461609 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Natural Language Processing Techniques |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C185798385 |
| concepts[0].level | 2 |
| concepts[0].score | 0.7977039813995361 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q1161707 |
| concepts[0].display_name | Benchmark (surveying) |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.6593190431594849 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C2778755073 |
| concepts[2].level | 2 |
| concepts[2].score | 0.6323819160461426 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q10858537 |
| concepts[2].display_name | Scale (ratio) |
| concepts[3].id | https://openalex.org/C2777601683 |
| concepts[3].level | 2 |
| concepts[3].score | 0.6054797172546387 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q6499736 |
| concepts[3].display_name | Vocabulary |
| concepts[4].id | https://openalex.org/C2781238097 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5886268019676208 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q175026 |
| concepts[4].display_name | Object (grammar) |
| concepts[5].id | https://openalex.org/C2775936607 |
| concepts[5].level | 2 |
| concepts[5].score | 0.5579951405525208 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q466845 |
| concepts[5].display_name | Tracking (education) |
| concepts[6].id | https://openalex.org/C154945302 |
| concepts[6].level | 1 |
| concepts[6].score | 0.4906700849533081 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[6].display_name | Artificial intelligence |
| concepts[7].id | https://openalex.org/C202474056 |
| concepts[7].level | 3 |
| concepts[7].score | 0.4228476583957672 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q1931635 |
| concepts[7].display_name | Video tracking |
| concepts[8].id | https://openalex.org/C15744967 |
| concepts[8].level | 0 |
| concepts[8].score | 0.13462486863136292 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q9418 |
| concepts[8].display_name | Psychology |
| concepts[9].id | https://openalex.org/C41895202 |
| concepts[9].level | 1 |
| concepts[9].score | 0.10357967019081116 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q8162 |
| concepts[9].display_name | Linguistics |
| concepts[10].id | https://openalex.org/C58640448 |
| concepts[10].level | 1 |
| concepts[10].score | 0.08700567483901978 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q42515 |
| concepts[10].display_name | Cartography |
| concepts[11].id | https://openalex.org/C205649164 |
| concepts[11].level | 0 |
| concepts[11].score | 0.08036667108535767 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q1071 |
| concepts[11].display_name | Geography |
| concepts[12].id | https://openalex.org/C19417346 |
| concepts[12].level | 1 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q7922 |
| concepts[12].display_name | Pedagogy |
| concepts[13].id | https://openalex.org/C138885662 |
| concepts[13].level | 0 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q5891 |
| concepts[13].display_name | Philosophy |
| keywords[0].id | https://openalex.org/keywords/benchmark |
| keywords[0].score | 0.7977039813995361 |
| keywords[0].display_name | Benchmark (surveying) |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.6593190431594849 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/scale |
| keywords[2].score | 0.6323819160461426 |
| keywords[2].display_name | Scale (ratio) |
| keywords[3].id | https://openalex.org/keywords/vocabulary |
| keywords[3].score | 0.6054797172546387 |
| keywords[3].display_name | Vocabulary |
| keywords[4].id | https://openalex.org/keywords/object |
| keywords[4].score | 0.5886268019676208 |
| keywords[4].display_name | Object (grammar) |
| keywords[5].id | https://openalex.org/keywords/tracking |
| keywords[5].score | 0.5579951405525208 |
| keywords[5].display_name | Tracking (education) |
| keywords[6].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[6].score | 0.4906700849533081 |
| keywords[6].display_name | Artificial intelligence |
| keywords[7].id | https://openalex.org/keywords/video-tracking |
| keywords[7].score | 0.4228476583957672 |
| keywords[7].display_name | Video tracking |
| keywords[8].id | https://openalex.org/keywords/psychology |
| keywords[8].score | 0.13462486863136292 |
| keywords[8].display_name | Psychology |
| keywords[9].id | https://openalex.org/keywords/linguistics |
| keywords[9].score | 0.10357967019081116 |
| keywords[9].display_name | Linguistics |
| keywords[10].id | https://openalex.org/keywords/cartography |
| keywords[10].score | 0.08700567483901978 |
| keywords[10].display_name | Cartography |
| keywords[11].id | https://openalex.org/keywords/geography |
| keywords[11].score | 0.08036667108535767 |
| keywords[11].display_name | Geography |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2410.17534 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2410.17534 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2410.17534 |
| locations[1].id | doi:10.48550/arxiv.2410.17534 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2410.17534 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5026669428 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-0128-0222 |
| authorships[0].author.display_name | Hai‐Wei Liang |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Liang, Haiji |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5004806106 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-6587-8936 |
| authorships[1].author.display_name | Ruize Han |
| authorships[1].author_position | last |
| authorships[1].raw_author_name | Han, Ruize |
| authorships[1].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2410.17534 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2024-11-13T00:00:00 |
| display_name | OVT-B: A New Large-Scale Benchmark for Open-Vocabulary Multi-Object Tracking |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T12031 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9358999729156494 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Speech and dialogue systems |
| related_works | https://openalex.org/W2378211422, https://openalex.org/W4285271403, https://openalex.org/W2542007731, https://openalex.org/W2968379562, https://openalex.org/W2091015105, https://openalex.org/W4388689193, https://openalex.org/W2110899030, https://openalex.org/W29633852, https://openalex.org/W2985362983, https://openalex.org/W4327670844 |
| cited_by_count | 1 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2410.17534 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2410.17534 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2410.17534 |
| primary_location.id | pmh:oai:arXiv.org:2410.17534 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2410.17534 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2410.17534 |
| publication_date | 2024-10-23 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 34, 49, 69, 117, 130 |
| abstract_inverted_index.In | 63 |
| abstract_inverted_index.It | 138 |
| abstract_inverted_index.We | 127, 176 |
| abstract_inverted_index.an | 5, 148 |
| abstract_inverted_index.as | 116 |
| abstract_inverted_index.at | 184 |
| abstract_inverted_index.be | 114 |
| abstract_inverted_index.in | 8, 33, 40, 156 |
| abstract_inverted_index.is | 58, 94, 147, 154 |
| abstract_inverted_index.of | 61, 83, 166, 173 |
| abstract_inverted_index.to | 13, 120, 181 |
| abstract_inverted_index.we | 66 |
| abstract_inverted_index.MOT | 152 |
| abstract_inverted_index.OVT | 125, 158 |
| abstract_inverted_index.The | 110 |
| abstract_inverted_index.and | 55, 85, 170 |
| abstract_inverted_index.box | 91 |
| abstract_inverted_index.but | 153 |
| abstract_inverted_index.can | 113 |
| abstract_inverted_index.for | 73, 124, 136, 143, 151 |
| abstract_inverted_index.has | 3, 37, 51 |
| abstract_inverted_index.new | 70, 118 |
| abstract_inverted_index.not | 21 |
| abstract_inverted_index.one | 56 |
| abstract_inverted_index.our | 174 |
| abstract_inverted_index.the | 59, 98, 122, 140, 164, 167, 171, 179, 182 |
| abstract_inverted_index.way | 123 |
| abstract_inverted_index.yet | 132 |
| abstract_inverted_index.900+ | 108 |
| abstract_inverted_index.OVT. | 137 |
| abstract_inverted_index.aims | 12 |
| abstract_inverted_index.also | 128 |
| abstract_inverted_index.been | 22, 38, 52 |
| abstract_inverted_index.from | 48 |
| abstract_inverted_index.have | 20, 67, 162, 177 |
| abstract_inverted_index.many | 41 |
| abstract_inverted_index.much | 95 |
| abstract_inverted_index.pave | 121 |
| abstract_inverted_index.seen | 23 |
| abstract_inverted_index.sole | 99 |
| abstract_inverted_index.than | 97 |
| abstract_inverted_index.that | 19 |
| abstract_inverted_index.this | 27, 64 |
| abstract_inverted_index.used | 115 |
| abstract_inverted_index.with | 16, 88 |
| abstract_inverted_index.(200+ | 106 |
| abstract_inverted_index.(OVD) | 32 |
| abstract_inverted_index.(OVT) | 47 |
| abstract_inverted_index.1,048 | 81 |
| abstract_inverted_index.1,973 | 86 |
| abstract_inverted_index.OVT-B | 79, 112 |
| abstract_inverted_index.Under | 26 |
| abstract_inverted_index.built | 68 |
| abstract_inverted_index.i.e., | 103 |
| abstract_inverted_index.image | 36 |
| abstract_inverted_index.less, | 54 |
| abstract_inverted_index.novel | 17 |
| abstract_inverted_index.topic | 7 |
| abstract_inverted_index.video | 50 |
| abstract_inverted_index.which | 11, 93, 146 |
| abstract_inverted_index.work, | 65 |
| abstract_inverted_index.OVT-B. | 78 |
| abstract_inverted_index.become | 4 |
| abstract_inverted_index.during | 24 |
| abstract_inverted_index.larger | 96 |
| abstract_inverted_index.method | 135 |
| abstract_inverted_index.motion | 141 |
| abstract_inverted_index.namely | 77 |
| abstract_inverted_index.object | 1, 30, 45, 144 |
| abstract_inverted_index.public | 183 |
| abstract_inverted_index.reason | 57 |
| abstract_inverted_index.simple | 131 |
| abstract_inverted_index.single | 35 |
| abstract_inverted_index.videos | 87 |
| abstract_inverted_index.637,608 | 89 |
| abstract_inverted_index.classes | 18 |
| abstract_inverted_index.dataset | 105 |
| abstract_inverted_index.develop | 129 |
| abstract_inverted_index.feature | 150 |
| abstract_inverted_index.ignored | 155 |
| abstract_inverted_index.method. | 175 |
| abstract_inverted_index.objects | 15, 84 |
| abstract_inverted_index.results | 161 |
| abstract_inverted_index.studied | 39, 53 |
| abstract_inverted_index.However, | 43 |
| abstract_inverted_index.baseline | 134 |
| abstract_inverted_index.bounding | 90 |
| abstract_inverted_index.contains | 80 |
| abstract_inverted_index.dataset, | 102 |
| abstract_inverted_index.features | 142 |
| abstract_inverted_index.identify | 14 |
| abstract_inverted_index.methods. | 159 |
| abstract_inverted_index.previous | 157 |
| abstract_inverted_index.proposed | 111, 168 |
| abstract_inverted_index.released | 178 |
| abstract_inverted_index.setting, | 28 |
| abstract_inverted_index.shortage | 60 |
| abstract_inverted_index.tracking | 46, 76, 101 |
| abstract_inverted_index.verified | 163 |
| abstract_inverted_index.videos). | 109 |
| abstract_inverted_index.OVTAO-val | 104 |
| abstract_inverted_index.benchmark | 72, 119, 169, 180 |
| abstract_inverted_index.detection | 31 |
| abstract_inverted_index.effective | 133 |
| abstract_inverted_index.important | 6, 149 |
| abstract_inverted_index.research. | 126 |
| abstract_inverted_index.tracking, | 145 |
| abstract_inverted_index.training. | 25 |
| abstract_inverted_index.artificial | 9 |
| abstract_inverted_index.categories | 82 |
| abstract_inverted_index.integrates | 139 |
| abstract_inverted_index.perception | 2 |
| abstract_inverted_index.usefulness | 165 |
| abstract_inverted_index.benchmarks. | 62 |
| abstract_inverted_index.categories, | 107 |
| abstract_inverted_index.large-scale | 71 |
| abstract_inverted_index.literature. | 42 |
| abstract_inverted_index.Experimental | 160 |
| abstract_inverted_index.annotations, | 92 |
| abstract_inverted_index.multi-object | 75 |
| abstract_inverted_index.effectiveness | 172 |
| abstract_inverted_index.intelligence, | 10 |
| abstract_inverted_index.Open-vocabulary | 0 |
| abstract_inverted_index.open-vocabulary | 29, 44, 74, 100 |
| abstract_inverted_index.https://github.com/Coo1Sea/OVT-B-Dataset. | 185 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 2 |
| citation_normalized_percentile |