Zero-shot Building Attribute Extraction from Large-Scale Vision and Language Models Article Swipe
YOU?
·
· 2023
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2312.12479
Existing building recognition methods, exemplified by BRAILS, utilize supervised learning to extract information from satellite and street-view images for classification and segmentation. However, each task module requires human-annotated data, hindering the scalability and robustness to regional variations and annotation imbalances. In response, we propose a new zero-shot workflow for building attribute extraction that utilizes large-scale vision and language models to mitigate reliance on external annotations. The proposed workflow contains two key components: image-level captioning and segment-level captioning for the building images based on the vocabularies pertinent to structural and civil engineering. These two components generate descriptive captions by computing feature representations of the image and the vocabularies, and facilitating a semantic match between the visual and textual representations. Consequently, our framework offers a promising avenue to enhance AI-driven captioning for building attribute extraction in the structural and civil engineering domains, ultimately reducing reliance on human annotations while bolstering performance and adaptability.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2312.12479
- https://arxiv.org/pdf/2312.12479
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4390091556
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4390091556Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2312.12479Digital Object Identifier
- Title
-
Zero-shot Building Attribute Extraction from Large-Scale Vision and Language ModelsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2023Year of publication
- Publication date
-
2023-12-19Full publication date if available
- Authors
-
Fei Pan, Sangryul Jeon, Brian Wang, Frank McKenna, Stella X. YuList of authors in order
- Landing page
-
https://arxiv.org/abs/2312.12479Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2312.12479Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2312.12479Direct OA link when available
- Concepts
-
Closed captioning, Computer science, Workflow, Robustness (evolution), Artificial intelligence, Scalability, Natural language processing, Feature extraction, Annotation, Information retrieval, Image (mathematics), Database, Chemistry, Gene, BiochemistryTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4390091556 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2312.12479 |
| ids.doi | https://doi.org/10.48550/arxiv.2312.12479 |
| ids.openalex | https://openalex.org/W4390091556 |
| fwci | |
| type | preprint |
| title | Zero-shot Building Attribute Extraction from Large-Scale Vision and Language Models |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11714 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9980000257492065 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Multimodal Machine Learning Applications |
| topics[1].id | https://openalex.org/T10627 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9779000282287598 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1707 |
| topics[1].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[1].display_name | Advanced Image and Video Retrieval Techniques |
| topics[2].id | https://openalex.org/T11307 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9555000066757202 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Domain Adaptation and Few-Shot Learning |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C157657479 |
| concepts[0].level | 3 |
| concepts[0].score | 0.9629337787628174 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q2367247 |
| concepts[0].display_name | Closed captioning |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.7990620136260986 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C177212765 |
| concepts[2].level | 2 |
| concepts[2].score | 0.7107390761375427 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q627335 |
| concepts[2].display_name | Workflow |
| concepts[3].id | https://openalex.org/C63479239 |
| concepts[3].level | 3 |
| concepts[3].score | 0.5860669016838074 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q7353546 |
| concepts[3].display_name | Robustness (evolution) |
| concepts[4].id | https://openalex.org/C154945302 |
| concepts[4].level | 1 |
| concepts[4].score | 0.5768523216247559 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[4].display_name | Artificial intelligence |
| concepts[5].id | https://openalex.org/C48044578 |
| concepts[5].level | 2 |
| concepts[5].score | 0.5717222094535828 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q727490 |
| concepts[5].display_name | Scalability |
| concepts[6].id | https://openalex.org/C204321447 |
| concepts[6].level | 1 |
| concepts[6].score | 0.5507630705833435 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[6].display_name | Natural language processing |
| concepts[7].id | https://openalex.org/C52622490 |
| concepts[7].level | 2 |
| concepts[7].score | 0.47364914417266846 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q1026626 |
| concepts[7].display_name | Feature extraction |
| concepts[8].id | https://openalex.org/C2776321320 |
| concepts[8].level | 2 |
| concepts[8].score | 0.4590649902820587 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q857525 |
| concepts[8].display_name | Annotation |
| concepts[9].id | https://openalex.org/C23123220 |
| concepts[9].level | 1 |
| concepts[9].score | 0.3750884234905243 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q816826 |
| concepts[9].display_name | Information retrieval |
| concepts[10].id | https://openalex.org/C115961682 |
| concepts[10].level | 2 |
| concepts[10].score | 0.25876569747924805 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q860623 |
| concepts[10].display_name | Image (mathematics) |
| concepts[11].id | https://openalex.org/C77088390 |
| concepts[11].level | 1 |
| concepts[11].score | 0.17680829763412476 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q8513 |
| concepts[11].display_name | Database |
| concepts[12].id | https://openalex.org/C185592680 |
| concepts[12].level | 0 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q2329 |
| concepts[12].display_name | Chemistry |
| concepts[13].id | https://openalex.org/C104317684 |
| concepts[13].level | 2 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q7187 |
| concepts[13].display_name | Gene |
| concepts[14].id | https://openalex.org/C55493867 |
| concepts[14].level | 1 |
| concepts[14].score | 0.0 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q7094 |
| concepts[14].display_name | Biochemistry |
| keywords[0].id | https://openalex.org/keywords/closed-captioning |
| keywords[0].score | 0.9629337787628174 |
| keywords[0].display_name | Closed captioning |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.7990620136260986 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/workflow |
| keywords[2].score | 0.7107390761375427 |
| keywords[2].display_name | Workflow |
| keywords[3].id | https://openalex.org/keywords/robustness |
| keywords[3].score | 0.5860669016838074 |
| keywords[3].display_name | Robustness (evolution) |
| keywords[4].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[4].score | 0.5768523216247559 |
| keywords[4].display_name | Artificial intelligence |
| keywords[5].id | https://openalex.org/keywords/scalability |
| keywords[5].score | 0.5717222094535828 |
| keywords[5].display_name | Scalability |
| keywords[6].id | https://openalex.org/keywords/natural-language-processing |
| keywords[6].score | 0.5507630705833435 |
| keywords[6].display_name | Natural language processing |
| keywords[7].id | https://openalex.org/keywords/feature-extraction |
| keywords[7].score | 0.47364914417266846 |
| keywords[7].display_name | Feature extraction |
| keywords[8].id | https://openalex.org/keywords/annotation |
| keywords[8].score | 0.4590649902820587 |
| keywords[8].display_name | Annotation |
| keywords[9].id | https://openalex.org/keywords/information-retrieval |
| keywords[9].score | 0.3750884234905243 |
| keywords[9].display_name | Information retrieval |
| keywords[10].id | https://openalex.org/keywords/image |
| keywords[10].score | 0.25876569747924805 |
| keywords[10].display_name | Image (mathematics) |
| keywords[11].id | https://openalex.org/keywords/database |
| keywords[11].score | 0.17680829763412476 |
| keywords[11].display_name | Database |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2312.12479 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | cc-by |
| locations[0].pdf_url | https://arxiv.org/pdf/2312.12479 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | https://openalex.org/licenses/cc-by |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2312.12479 |
| locations[1].id | doi:10.48550/arxiv.2312.12479 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2312.12479 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5063873349 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-6361-0936 |
| authorships[0].author.display_name | Fei Pan |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Pan, Fei |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5014123447 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-0991-6165 |
| authorships[1].author.display_name | Sangryul Jeon |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Jeon, Sangryul |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5073241086 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-5903-4593 |
| authorships[2].author.display_name | Brian Wang |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Wang, Brian |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5049642625 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Frank McKenna |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Mckenna, Frank |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5042014034 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-3507-5761 |
| authorships[4].author.display_name | Stella X. Yu |
| authorships[4].author_position | last |
| authorships[4].raw_author_name | Yu, Stella X. |
| authorships[4].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2312.12479 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Zero-shot Building Attribute Extraction from Large-Scale Vision and Language Models |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11714 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9980000257492065 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Multimodal Machine Learning Applications |
| related_works | https://openalex.org/W4210416330, https://openalex.org/W2775506363, https://openalex.org/W3088136942, https://openalex.org/W4290852288, https://openalex.org/W2949362007, https://openalex.org/W4388893791, https://openalex.org/W4283207562, https://openalex.org/W2963177403, https://openalex.org/W2330246314, https://openalex.org/W2949522393 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2312.12479 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | cc-by |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2312.12479 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2312.12479 |
| primary_location.id | pmh:oai:arXiv.org:2312.12479 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | cc-by |
| primary_location.pdf_url | https://arxiv.org/pdf/2312.12479 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | https://openalex.org/licenses/cc-by |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2312.12479 |
| publication_date | 2023-12-19 |
| publication_year | 2023 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 44, 109, 122 |
| abstract_inverted_index.In | 40 |
| abstract_inverted_index.by | 5, 97 |
| abstract_inverted_index.in | 133 |
| abstract_inverted_index.of | 101 |
| abstract_inverted_index.on | 62, 82, 143 |
| abstract_inverted_index.to | 10, 34, 59, 86, 125 |
| abstract_inverted_index.we | 42 |
| abstract_inverted_index.The | 65 |
| abstract_inverted_index.and | 15, 20, 32, 37, 56, 74, 88, 104, 107, 115, 136, 149 |
| abstract_inverted_index.for | 18, 48, 77, 129 |
| abstract_inverted_index.key | 70 |
| abstract_inverted_index.new | 45 |
| abstract_inverted_index.our | 119 |
| abstract_inverted_index.the | 30, 78, 83, 102, 105, 113, 134 |
| abstract_inverted_index.two | 69, 92 |
| abstract_inverted_index.each | 23 |
| abstract_inverted_index.from | 13 |
| abstract_inverted_index.task | 24 |
| abstract_inverted_index.that | 52 |
| abstract_inverted_index.These | 91 |
| abstract_inverted_index.based | 81 |
| abstract_inverted_index.civil | 89, 137 |
| abstract_inverted_index.data, | 28 |
| abstract_inverted_index.human | 144 |
| abstract_inverted_index.image | 103 |
| abstract_inverted_index.match | 111 |
| abstract_inverted_index.while | 146 |
| abstract_inverted_index.avenue | 124 |
| abstract_inverted_index.images | 17, 80 |
| abstract_inverted_index.models | 58 |
| abstract_inverted_index.module | 25 |
| abstract_inverted_index.offers | 121 |
| abstract_inverted_index.vision | 55 |
| abstract_inverted_index.visual | 114 |
| abstract_inverted_index.BRAILS, | 6 |
| abstract_inverted_index.between | 112 |
| abstract_inverted_index.enhance | 126 |
| abstract_inverted_index.extract | 11 |
| abstract_inverted_index.feature | 99 |
| abstract_inverted_index.propose | 43 |
| abstract_inverted_index.textual | 116 |
| abstract_inverted_index.utilize | 7 |
| abstract_inverted_index.Existing | 0 |
| abstract_inverted_index.However, | 22 |
| abstract_inverted_index.building | 1, 49, 79, 130 |
| abstract_inverted_index.captions | 96 |
| abstract_inverted_index.contains | 68 |
| abstract_inverted_index.domains, | 139 |
| abstract_inverted_index.external | 63 |
| abstract_inverted_index.generate | 94 |
| abstract_inverted_index.language | 57 |
| abstract_inverted_index.learning | 9 |
| abstract_inverted_index.methods, | 3 |
| abstract_inverted_index.mitigate | 60 |
| abstract_inverted_index.proposed | 66 |
| abstract_inverted_index.reducing | 141 |
| abstract_inverted_index.regional | 35 |
| abstract_inverted_index.reliance | 61, 142 |
| abstract_inverted_index.requires | 26 |
| abstract_inverted_index.semantic | 110 |
| abstract_inverted_index.utilizes | 53 |
| abstract_inverted_index.workflow | 47, 67 |
| abstract_inverted_index.AI-driven | 127 |
| abstract_inverted_index.attribute | 50, 131 |
| abstract_inverted_index.computing | 98 |
| abstract_inverted_index.framework | 120 |
| abstract_inverted_index.hindering | 29 |
| abstract_inverted_index.pertinent | 85 |
| abstract_inverted_index.promising | 123 |
| abstract_inverted_index.response, | 41 |
| abstract_inverted_index.satellite | 14 |
| abstract_inverted_index.zero-shot | 46 |
| abstract_inverted_index.annotation | 38 |
| abstract_inverted_index.bolstering | 147 |
| abstract_inverted_index.captioning | 73, 76, 128 |
| abstract_inverted_index.components | 93 |
| abstract_inverted_index.extraction | 51, 132 |
| abstract_inverted_index.robustness | 33 |
| abstract_inverted_index.structural | 87, 135 |
| abstract_inverted_index.supervised | 8 |
| abstract_inverted_index.ultimately | 140 |
| abstract_inverted_index.variations | 36 |
| abstract_inverted_index.annotations | 145 |
| abstract_inverted_index.components: | 71 |
| abstract_inverted_index.descriptive | 95 |
| abstract_inverted_index.engineering | 138 |
| abstract_inverted_index.exemplified | 4 |
| abstract_inverted_index.image-level | 72 |
| abstract_inverted_index.imbalances. | 39 |
| abstract_inverted_index.information | 12 |
| abstract_inverted_index.large-scale | 54 |
| abstract_inverted_index.performance | 148 |
| abstract_inverted_index.recognition | 2 |
| abstract_inverted_index.scalability | 31 |
| abstract_inverted_index.street-view | 16 |
| abstract_inverted_index.annotations. | 64 |
| abstract_inverted_index.engineering. | 90 |
| abstract_inverted_index.facilitating | 108 |
| abstract_inverted_index.vocabularies | 84 |
| abstract_inverted_index.Consequently, | 118 |
| abstract_inverted_index.adaptability. | 150 |
| abstract_inverted_index.segment-level | 75 |
| abstract_inverted_index.segmentation. | 21 |
| abstract_inverted_index.vocabularies, | 106 |
| abstract_inverted_index.classification | 19 |
| abstract_inverted_index.human-annotated | 27 |
| abstract_inverted_index.representations | 100 |
| abstract_inverted_index.representations. | 117 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 5 |
| citation_normalized_percentile |