VLMine: Long-Tail Data Mining with Vision Language Models Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2409.15486
Ensuring robust performance on long-tail examples is an important problem for many real-world applications of machine learning, such as autonomous driving. This work focuses on the problem of identifying rare examples within a corpus of unlabeled data. We propose a simple and scalable data mining approach that leverages the knowledge contained within a large vision language model (VLM). Our approach utilizes a VLM to summarize the content of an image into a set of keywords, and we identify rare examples based on keyword frequency. We find that the VLM offers a distinct signal for identifying long-tail examples when compared to conventional methods based on model uncertainty. Therefore, we propose a simple and general approach for integrating signals from multiple mining algorithms. We evaluate the proposed method on two diverse tasks: 2D image classification, in which inter-class variation is the primary source of data diversity, and on 3D object detection, where intra-class variation is the main concern. Furthermore, through the detection task, we demonstrate that the knowledge extracted from 2D images is transferable to the 3D domain. Our experiments consistently show large improvements (between 10\% and 50\%) over the baseline techniques on several representative benchmarks: ImageNet-LT, Places-LT, and the Waymo Open Dataset.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2409.15486
- https://arxiv.org/pdf/2409.15486
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4403786268
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4403786268Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2409.15486Digital Object Identifier
- Title
-
VLMine: Long-Tail Data Mining with Vision Language ModelsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-09-23Full publication date if available
- Authors
-
Mao Ye, Gregory P. Meyer, Zaiwei Zhang, Dennis Park, Siva Karthik Mustikovela, Yuning Chai, Eric M. WolffList of authors in order
- Landing page
-
https://arxiv.org/abs/2409.15486Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2409.15486Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2409.15486Direct OA link when available
- Concepts
-
Computer science, Artificial intelligence, Natural language processingTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4403786268 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2409.15486 |
| ids.doi | https://doi.org/10.48550/arxiv.2409.15486 |
| ids.openalex | https://openalex.org/W4403786268 |
| fwci | |
| type | preprint |
| title | VLMine: Long-Tail Data Mining with Vision Language Models |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10215 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.8371000289916992 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Semantic Web and Ontologies |
| topics[1].id | https://openalex.org/T11550 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.7135999798774719 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Text and Document Classification Technologies |
| topics[2].id | https://openalex.org/T11106 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.6980999708175659 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1711 |
| topics[2].subfield.display_name | Signal Processing |
| topics[2].display_name | Data Management and Algorithms |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.5186840295791626 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C154945302 |
| concepts[1].level | 1 |
| concepts[1].score | 0.4221844971179962 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[1].display_name | Artificial intelligence |
| concepts[2].id | https://openalex.org/C204321447 |
| concepts[2].level | 1 |
| concepts[2].score | 0.3527592420578003 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[2].display_name | Natural language processing |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.5186840295791626 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[1].score | 0.4221844971179962 |
| keywords[1].display_name | Artificial intelligence |
| keywords[2].id | https://openalex.org/keywords/natural-language-processing |
| keywords[2].score | 0.3527592420578003 |
| keywords[2].display_name | Natural language processing |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2409.15486 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2409.15486 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2409.15486 |
| locations[1].id | doi:10.48550/arxiv.2409.15486 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2409.15486 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5100682785 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-7078-2402 |
| authorships[0].author.display_name | Mao Ye |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Ye, Mao |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5067267836 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-9444-5577 |
| authorships[1].author.display_name | Gregory P. Meyer |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Meyer, Gregory P. |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5055489621 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Zaiwei Zhang |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Zhang, Zaiwei |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5077824676 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Dennis Park |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Park, Dennis |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5021827101 |
| authorships[4].author.orcid | |
| authorships[4].author.display_name | Siva Karthik Mustikovela |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Mustikovela, Siva Karthik |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5079961895 |
| authorships[5].author.orcid | |
| authorships[5].author.display_name | Yuning Chai |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Chai, Yuning |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5065869782 |
| authorships[6].author.orcid | |
| authorships[6].author.display_name | Eric M. Wolff |
| authorships[6].author_position | last |
| authorships[6].raw_author_name | Wolff, Eric M |
| authorships[6].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2409.15486 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | VLMine: Long-Tail Data Mining with Vision Language Models |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10215 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.8371000289916992 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Semantic Web and Ontologies |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2899084033, https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W4391913857, https://openalex.org/W2358668433, https://openalex.org/W4396701345, https://openalex.org/W2376932109, https://openalex.org/W2001405890, https://openalex.org/W3204019825 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2409.15486 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2409.15486 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2409.15486 |
| primary_location.id | pmh:oai:arXiv.org:2409.15486 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2409.15486 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2409.15486 |
| publication_date | 2024-09-23 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 32, 39, 52, 61, 71, 90, 109 |
| abstract_inverted_index.2D | 130, 168 |
| abstract_inverted_index.3D | 146, 174 |
| abstract_inverted_index.We | 37, 84, 121 |
| abstract_inverted_index.an | 7, 68 |
| abstract_inverted_index.as | 18 |
| abstract_inverted_index.in | 133 |
| abstract_inverted_index.is | 6, 137, 152, 170 |
| abstract_inverted_index.of | 14, 27, 34, 67, 73, 141 |
| abstract_inverted_index.on | 3, 24, 81, 103, 126, 145, 190 |
| abstract_inverted_index.to | 63, 99, 172 |
| abstract_inverted_index.we | 76, 107, 161 |
| abstract_inverted_index.Our | 58, 176 |
| abstract_inverted_index.VLM | 62, 88 |
| abstract_inverted_index.and | 41, 75, 111, 144, 184, 196 |
| abstract_inverted_index.for | 10, 93, 114 |
| abstract_inverted_index.set | 72 |
| abstract_inverted_index.the | 25, 48, 65, 87, 123, 138, 153, 158, 164, 173, 187, 197 |
| abstract_inverted_index.two | 127 |
| abstract_inverted_index.10\% | 183 |
| abstract_inverted_index.Open | 199 |
| abstract_inverted_index.This | 21 |
| abstract_inverted_index.data | 43, 142 |
| abstract_inverted_index.find | 85 |
| abstract_inverted_index.from | 117, 167 |
| abstract_inverted_index.into | 70 |
| abstract_inverted_index.main | 154 |
| abstract_inverted_index.many | 11 |
| abstract_inverted_index.over | 186 |
| abstract_inverted_index.rare | 29, 78 |
| abstract_inverted_index.show | 179 |
| abstract_inverted_index.such | 17 |
| abstract_inverted_index.that | 46, 86, 163 |
| abstract_inverted_index.when | 97 |
| abstract_inverted_index.work | 22 |
| abstract_inverted_index.50\%) | 185 |
| abstract_inverted_index.Waymo | 198 |
| abstract_inverted_index.based | 80, 102 |
| abstract_inverted_index.data. | 36 |
| abstract_inverted_index.image | 69, 131 |
| abstract_inverted_index.large | 53, 180 |
| abstract_inverted_index.model | 56, 104 |
| abstract_inverted_index.task, | 160 |
| abstract_inverted_index.where | 149 |
| abstract_inverted_index.which | 134 |
| abstract_inverted_index.(VLM). | 57 |
| abstract_inverted_index.corpus | 33 |
| abstract_inverted_index.images | 169 |
| abstract_inverted_index.method | 125 |
| abstract_inverted_index.mining | 44, 119 |
| abstract_inverted_index.object | 147 |
| abstract_inverted_index.offers | 89 |
| abstract_inverted_index.robust | 1 |
| abstract_inverted_index.signal | 92 |
| abstract_inverted_index.simple | 40, 110 |
| abstract_inverted_index.source | 140 |
| abstract_inverted_index.tasks: | 129 |
| abstract_inverted_index.vision | 54 |
| abstract_inverted_index.within | 31, 51 |
| abstract_inverted_index.content | 66 |
| abstract_inverted_index.diverse | 128 |
| abstract_inverted_index.domain. | 175 |
| abstract_inverted_index.focuses | 23 |
| abstract_inverted_index.general | 112 |
| abstract_inverted_index.keyword | 82 |
| abstract_inverted_index.machine | 15 |
| abstract_inverted_index.methods | 101 |
| abstract_inverted_index.primary | 139 |
| abstract_inverted_index.problem | 9, 26 |
| abstract_inverted_index.propose | 38, 108 |
| abstract_inverted_index.several | 191 |
| abstract_inverted_index.signals | 116 |
| abstract_inverted_index.through | 157 |
| abstract_inverted_index.(between | 182 |
| abstract_inverted_index.Dataset. | 200 |
| abstract_inverted_index.Ensuring | 0 |
| abstract_inverted_index.approach | 45, 59, 113 |
| abstract_inverted_index.baseline | 188 |
| abstract_inverted_index.compared | 98 |
| abstract_inverted_index.concern. | 155 |
| abstract_inverted_index.distinct | 91 |
| abstract_inverted_index.driving. | 20 |
| abstract_inverted_index.evaluate | 122 |
| abstract_inverted_index.examples | 5, 30, 79, 96 |
| abstract_inverted_index.identify | 77 |
| abstract_inverted_index.language | 55 |
| abstract_inverted_index.multiple | 118 |
| abstract_inverted_index.proposed | 124 |
| abstract_inverted_index.scalable | 42 |
| abstract_inverted_index.utilizes | 60 |
| abstract_inverted_index.contained | 50 |
| abstract_inverted_index.detection | 159 |
| abstract_inverted_index.extracted | 166 |
| abstract_inverted_index.important | 8 |
| abstract_inverted_index.keywords, | 74 |
| abstract_inverted_index.knowledge | 49, 165 |
| abstract_inverted_index.learning, | 16 |
| abstract_inverted_index.leverages | 47 |
| abstract_inverted_index.long-tail | 4, 95 |
| abstract_inverted_index.summarize | 64 |
| abstract_inverted_index.unlabeled | 35 |
| abstract_inverted_index.variation | 136, 151 |
| abstract_inverted_index.Places-LT, | 195 |
| abstract_inverted_index.Therefore, | 106 |
| abstract_inverted_index.autonomous | 19 |
| abstract_inverted_index.detection, | 148 |
| abstract_inverted_index.diversity, | 143 |
| abstract_inverted_index.frequency. | 83 |
| abstract_inverted_index.real-world | 12 |
| abstract_inverted_index.techniques | 189 |
| abstract_inverted_index.algorithms. | 120 |
| abstract_inverted_index.benchmarks: | 193 |
| abstract_inverted_index.demonstrate | 162 |
| abstract_inverted_index.experiments | 177 |
| abstract_inverted_index.identifying | 28, 94 |
| abstract_inverted_index.integrating | 115 |
| abstract_inverted_index.inter-class | 135 |
| abstract_inverted_index.intra-class | 150 |
| abstract_inverted_index.performance | 2 |
| abstract_inverted_index.Furthermore, | 156 |
| abstract_inverted_index.ImageNet-LT, | 194 |
| abstract_inverted_index.applications | 13 |
| abstract_inverted_index.consistently | 178 |
| abstract_inverted_index.conventional | 100 |
| abstract_inverted_index.improvements | 181 |
| abstract_inverted_index.transferable | 171 |
| abstract_inverted_index.uncertainty. | 105 |
| abstract_inverted_index.representative | 192 |
| abstract_inverted_index.classification, | 132 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 7 |
| citation_normalized_percentile |