SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance Article Swipe
YOU?
·
· 2023
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2311.16241
In semi-supervised semantic segmentation, a model is trained with a limited number of labeled images along with a large corpus of unlabeled images to reduce the high annotation effort. While previous methods are able to learn good segmentation boundaries, they are prone to confuse classes with similar visual appearance due to the limited supervision. On the other hand, vision-language models (VLMs) are able to learn diverse semantic knowledge from image-caption datasets but produce noisy segmentation due to the image-level training. In SemiVL, we propose to integrate rich priors from VLM pre-training into semi-supervised semantic segmentation to learn better semantic decision boundaries. To adapt the VLM from global to local reasoning, we introduce a spatial fine-tuning strategy for label-efficient learning. Further, we design a language-guided decoder to jointly reason over vision and language. Finally, we propose to handle inherent ambiguities in class labels by providing the model with language guidance in the form of class definitions. We evaluate SemiVL on 4 semantic segmentation datasets, where it significantly outperforms previous semi-supervised methods. For instance, SemiVL improves the state-of-the-art by +13.5 mIoU on COCO with 232 annotated images and by +6.1 mIoU on Pascal VOC with 92 labels. Project page: https://github.com/google-research/semivl
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2311.16241
- https://arxiv.org/pdf/2311.16241
- OA Status
- green
- Cited By
- 2
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4389157406
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4389157406Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2311.16241Digital Object Identifier
- Title
-
SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language GuidanceWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2023Year of publication
- Publication date
-
2023-11-27Full publication date if available
- Authors
-
Lukas Hoyer, David Joseph Tan, Muhammad Ferjad Naeem, Luc Van Gool, Federico TombariList of authors in order
- Landing page
-
https://arxiv.org/abs/2311.16241Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2311.16241Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2311.16241Direct OA link when available
- Concepts
-
Segmentation, Computer science, Pascal (unit), Artificial intelligence, Annotation, Class (philosophy), Supervised learning, Natural language processing, Prior probability, Machine learning, Pattern recognition (psychology), Artificial neural network, Bayesian probability, Programming languageTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
2Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 1, 2024: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4389157406 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2311.16241 |
| ids.doi | https://doi.org/10.48550/arxiv.2311.16241 |
| ids.openalex | https://openalex.org/W4389157406 |
| fwci | |
| type | preprint |
| title | SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11714 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9995999932289124 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Multimodal Machine Learning Applications |
| topics[1].id | https://openalex.org/T11307 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9925000071525574 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Domain Adaptation and Few-Shot Learning |
| topics[2].id | https://openalex.org/T10181 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9603000283241272 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Natural Language Processing Techniques |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C89600930 |
| concepts[0].level | 2 |
| concepts[0].score | 0.8105100989341736 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q1423946 |
| concepts[0].display_name | Segmentation |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.8031894564628601 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C75608658 |
| concepts[2].level | 2 |
| concepts[2].score | 0.7661548852920532 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q44395 |
| concepts[2].display_name | Pascal (unit) |
| concepts[3].id | https://openalex.org/C154945302 |
| concepts[3].level | 1 |
| concepts[3].score | 0.7317554950714111 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[3].display_name | Artificial intelligence |
| concepts[4].id | https://openalex.org/C2776321320 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5541738271713257 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q857525 |
| concepts[4].display_name | Annotation |
| concepts[5].id | https://openalex.org/C2777212361 |
| concepts[5].level | 2 |
| concepts[5].score | 0.46693792939186096 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q5127848 |
| concepts[5].display_name | Class (philosophy) |
| concepts[6].id | https://openalex.org/C136389625 |
| concepts[6].level | 3 |
| concepts[6].score | 0.4608457088470459 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q334384 |
| concepts[6].display_name | Supervised learning |
| concepts[7].id | https://openalex.org/C204321447 |
| concepts[7].level | 1 |
| concepts[7].score | 0.41368138790130615 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[7].display_name | Natural language processing |
| concepts[8].id | https://openalex.org/C177769412 |
| concepts[8].level | 3 |
| concepts[8].score | 0.41022202372550964 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q278090 |
| concepts[8].display_name | Prior probability |
| concepts[9].id | https://openalex.org/C119857082 |
| concepts[9].level | 1 |
| concepts[9].score | 0.4039109945297241 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q2539 |
| concepts[9].display_name | Machine learning |
| concepts[10].id | https://openalex.org/C153180895 |
| concepts[10].level | 2 |
| concepts[10].score | 0.37048250436782837 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q7148389 |
| concepts[10].display_name | Pattern recognition (psychology) |
| concepts[11].id | https://openalex.org/C50644808 |
| concepts[11].level | 2 |
| concepts[11].score | 0.07821473479270935 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q192776 |
| concepts[11].display_name | Artificial neural network |
| concepts[12].id | https://openalex.org/C107673813 |
| concepts[12].level | 2 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q812534 |
| concepts[12].display_name | Bayesian probability |
| concepts[13].id | https://openalex.org/C199360897 |
| concepts[13].level | 1 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q9143 |
| concepts[13].display_name | Programming language |
| keywords[0].id | https://openalex.org/keywords/segmentation |
| keywords[0].score | 0.8105100989341736 |
| keywords[0].display_name | Segmentation |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.8031894564628601 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/pascal |
| keywords[2].score | 0.7661548852920532 |
| keywords[2].display_name | Pascal (unit) |
| keywords[3].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[3].score | 0.7317554950714111 |
| keywords[3].display_name | Artificial intelligence |
| keywords[4].id | https://openalex.org/keywords/annotation |
| keywords[4].score | 0.5541738271713257 |
| keywords[4].display_name | Annotation |
| keywords[5].id | https://openalex.org/keywords/class |
| keywords[5].score | 0.46693792939186096 |
| keywords[5].display_name | Class (philosophy) |
| keywords[6].id | https://openalex.org/keywords/supervised-learning |
| keywords[6].score | 0.4608457088470459 |
| keywords[6].display_name | Supervised learning |
| keywords[7].id | https://openalex.org/keywords/natural-language-processing |
| keywords[7].score | 0.41368138790130615 |
| keywords[7].display_name | Natural language processing |
| keywords[8].id | https://openalex.org/keywords/prior-probability |
| keywords[8].score | 0.41022202372550964 |
| keywords[8].display_name | Prior probability |
| keywords[9].id | https://openalex.org/keywords/machine-learning |
| keywords[9].score | 0.4039109945297241 |
| keywords[9].display_name | Machine learning |
| keywords[10].id | https://openalex.org/keywords/pattern-recognition |
| keywords[10].score | 0.37048250436782837 |
| keywords[10].display_name | Pattern recognition (psychology) |
| keywords[11].id | https://openalex.org/keywords/artificial-neural-network |
| keywords[11].score | 0.07821473479270935 |
| keywords[11].display_name | Artificial neural network |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2311.16241 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2311.16241 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2311.16241 |
| locations[1].id | doi:10.48550/arxiv.2311.16241 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2311.16241 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5053328232 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-7391-0676 |
| authorships[0].author.display_name | Lukas Hoyer |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Hoyer, Lukas |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5051298870 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-7835-6280 |
| authorships[1].author.display_name | David Joseph Tan |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Tan, David Joseph |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5103091877 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-7455-7280 |
| authorships[2].author.display_name | Muhammad Ferjad Naeem |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Naeem, Muhammad Ferjad |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5001254143 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-3445-5711 |
| authorships[3].author.display_name | Luc Van Gool |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Van Gool, Luc |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5041092666 |
| authorships[4].author.orcid | https://orcid.org/0000-0001-5598-5212 |
| authorships[4].author.display_name | Federico Tombari |
| authorships[4].author_position | last |
| authorships[4].raw_author_name | Tombari, Federico |
| authorships[4].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2311.16241 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2023-11-30T00:00:00 |
| display_name | SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11714 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9995999932289124 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Multimodal Machine Learning Applications |
| related_works | https://openalex.org/W2361861616, https://openalex.org/W2263699433, https://openalex.org/W2580650124, https://openalex.org/W2377979023, https://openalex.org/W4386190339, https://openalex.org/W4286681602, https://openalex.org/W3209312100, https://openalex.org/W2787282551, https://openalex.org/W2963676873, https://openalex.org/W2020477327 |
| cited_by_count | 2 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 1 |
| counts_by_year[1].year | 2024 |
| counts_by_year[1].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2311.16241 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2311.16241 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2311.16241 |
| primary_location.id | pmh:oai:arXiv.org:2311.16241 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2311.16241 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2311.16241 |
| publication_date | 2023-11-27 |
| publication_year | 2023 |
| referenced_works_count | 0 |
| abstract_inverted_index.4 | 159 |
| abstract_inverted_index.a | 4, 9, 17, 112, 122 |
| abstract_inverted_index.92 | 193 |
| abstract_inverted_index.In | 0, 80 |
| abstract_inverted_index.On | 54 |
| abstract_inverted_index.To | 101 |
| abstract_inverted_index.We | 155 |
| abstract_inverted_index.by | 142, 176, 186 |
| abstract_inverted_index.in | 139, 149 |
| abstract_inverted_index.is | 6 |
| abstract_inverted_index.it | 164 |
| abstract_inverted_index.of | 12, 20, 152 |
| abstract_inverted_index.on | 158, 179, 189 |
| abstract_inverted_index.to | 23, 34, 42, 50, 63, 76, 84, 95, 107, 125, 135 |
| abstract_inverted_index.we | 82, 110, 120, 133 |
| abstract_inverted_index.232 | 182 |
| abstract_inverted_index.For | 170 |
| abstract_inverted_index.VLM | 89, 104 |
| abstract_inverted_index.VOC | 191 |
| abstract_inverted_index.and | 130, 185 |
| abstract_inverted_index.are | 32, 40, 61 |
| abstract_inverted_index.but | 71 |
| abstract_inverted_index.due | 49, 75 |
| abstract_inverted_index.for | 116 |
| abstract_inverted_index.the | 25, 51, 55, 77, 103, 144, 150, 174 |
| abstract_inverted_index.+6.1 | 187 |
| abstract_inverted_index.COCO | 180 |
| abstract_inverted_index.able | 33, 62 |
| abstract_inverted_index.form | 151 |
| abstract_inverted_index.from | 68, 88, 105 |
| abstract_inverted_index.good | 36 |
| abstract_inverted_index.high | 26 |
| abstract_inverted_index.into | 91 |
| abstract_inverted_index.mIoU | 178, 188 |
| abstract_inverted_index.over | 128 |
| abstract_inverted_index.rich | 86 |
| abstract_inverted_index.they | 39 |
| abstract_inverted_index.with | 8, 16, 45, 146, 181, 192 |
| abstract_inverted_index.+13.5 | 177 |
| abstract_inverted_index.While | 29 |
| abstract_inverted_index.adapt | 102 |
| abstract_inverted_index.along | 15 |
| abstract_inverted_index.class | 140, 153 |
| abstract_inverted_index.hand, | 57 |
| abstract_inverted_index.large | 18 |
| abstract_inverted_index.learn | 35, 64, 96 |
| abstract_inverted_index.local | 108 |
| abstract_inverted_index.model | 5, 145 |
| abstract_inverted_index.noisy | 73 |
| abstract_inverted_index.other | 56 |
| abstract_inverted_index.page: | 196 |
| abstract_inverted_index.prone | 41 |
| abstract_inverted_index.where | 163 |
| abstract_inverted_index.(VLMs) | 60 |
| abstract_inverted_index.Pascal | 190 |
| abstract_inverted_index.SemiVL | 157, 172 |
| abstract_inverted_index.better | 97 |
| abstract_inverted_index.corpus | 19 |
| abstract_inverted_index.design | 121 |
| abstract_inverted_index.global | 106 |
| abstract_inverted_index.handle | 136 |
| abstract_inverted_index.images | 14, 22, 184 |
| abstract_inverted_index.labels | 141 |
| abstract_inverted_index.models | 59 |
| abstract_inverted_index.number | 11 |
| abstract_inverted_index.priors | 87 |
| abstract_inverted_index.reason | 127 |
| abstract_inverted_index.reduce | 24 |
| abstract_inverted_index.vision | 129 |
| abstract_inverted_index.visual | 47 |
| abstract_inverted_index.Project | 195 |
| abstract_inverted_index.SemiVL, | 81 |
| abstract_inverted_index.classes | 44 |
| abstract_inverted_index.confuse | 43 |
| abstract_inverted_index.decoder | 124 |
| abstract_inverted_index.diverse | 65 |
| abstract_inverted_index.effort. | 28 |
| abstract_inverted_index.jointly | 126 |
| abstract_inverted_index.labeled | 13 |
| abstract_inverted_index.labels. | 194 |
| abstract_inverted_index.limited | 10, 52 |
| abstract_inverted_index.methods | 31 |
| abstract_inverted_index.produce | 72 |
| abstract_inverted_index.propose | 83, 134 |
| abstract_inverted_index.similar | 46 |
| abstract_inverted_index.spatial | 113 |
| abstract_inverted_index.trained | 7 |
| abstract_inverted_index.Finally, | 132 |
| abstract_inverted_index.Further, | 119 |
| abstract_inverted_index.datasets | 70 |
| abstract_inverted_index.decision | 99 |
| abstract_inverted_index.evaluate | 156 |
| abstract_inverted_index.guidance | 148 |
| abstract_inverted_index.improves | 173 |
| abstract_inverted_index.inherent | 137 |
| abstract_inverted_index.language | 147 |
| abstract_inverted_index.methods. | 169 |
| abstract_inverted_index.previous | 30, 167 |
| abstract_inverted_index.semantic | 2, 66, 93, 98, 160 |
| abstract_inverted_index.strategy | 115 |
| abstract_inverted_index.annotated | 183 |
| abstract_inverted_index.datasets, | 162 |
| abstract_inverted_index.instance, | 171 |
| abstract_inverted_index.integrate | 85 |
| abstract_inverted_index.introduce | 111 |
| abstract_inverted_index.knowledge | 67 |
| abstract_inverted_index.language. | 131 |
| abstract_inverted_index.learning. | 118 |
| abstract_inverted_index.providing | 143 |
| abstract_inverted_index.training. | 79 |
| abstract_inverted_index.unlabeled | 21 |
| abstract_inverted_index.annotation | 27 |
| abstract_inverted_index.appearance | 48 |
| abstract_inverted_index.reasoning, | 109 |
| abstract_inverted_index.ambiguities | 138 |
| abstract_inverted_index.boundaries, | 38 |
| abstract_inverted_index.boundaries. | 100 |
| abstract_inverted_index.fine-tuning | 114 |
| abstract_inverted_index.image-level | 78 |
| abstract_inverted_index.outperforms | 166 |
| abstract_inverted_index.definitions. | 154 |
| abstract_inverted_index.pre-training | 90 |
| abstract_inverted_index.segmentation | 37, 74, 94, 161 |
| abstract_inverted_index.supervision. | 53 |
| abstract_inverted_index.image-caption | 69 |
| abstract_inverted_index.segmentation, | 3 |
| abstract_inverted_index.significantly | 165 |
| abstract_inverted_index.label-efficient | 117 |
| abstract_inverted_index.language-guided | 123 |
| abstract_inverted_index.semi-supervised | 1, 92, 168 |
| abstract_inverted_index.vision-language | 58 |
| abstract_inverted_index.state-of-the-art | 175 |
| abstract_inverted_index.https://github.com/google-research/semivl | 197 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 5 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/16 |
| sustainable_development_goals[0].score | 0.46000000834465027 |
| sustainable_development_goals[0].display_name | Peace, Justice and strong institutions |
| citation_normalized_percentile |