Zero-shot Building Attribute Extraction from Large-Scale Vision and Language Models Article Swipe

PDF

Fei Pan , Sangryul Jeon , Brian Wang , Frank McKenna , Stella X. Yu ·

YOU? · · 2023 · Open Access · · DOI: https://doi.org/10.48550/arxiv.2312.12479

Existing building recognition methods, exemplified by BRAILS, utilize supervised learning to extract information from satellite and street-view images for classification and segmentation. However, each task module requires human-annotated data, hindering the scalability and robustness to regional variations and annotation imbalances. In response, we propose a new zero-shot workflow for building attribute extraction that utilizes large-scale vision and language models to mitigate reliance on external annotations. The proposed workflow contains two key components: image-level captioning and segment-level captioning for the building images based on the vocabularies pertinent to structural and civil engineering. These two components generate descriptive captions by computing feature representations of the image and the vocabularies, and facilitating a semantic match between the visual and textual representations. Consequently, our framework offers a promising avenue to enhance AI-driven captioning for building attribute extraction in the structural and civil engineering domains, ultimately reducing reliance on human annotations while bolstering performance and adaptability.

Related Topics

Computer Science

Artificial Intelligence

Database

Chemistry

Biochemistry

Concepts

Closed captioning Computer science Workflow Robustness (evolution) Artificial intelligence Scalability Natural language processing Feature extraction Annotation Information retrieval Image (mathematics) Database Chemistry Gene Biochemistry

Metadata

Type: preprint
Language: en
Landing Page: http://arxiv.org/abs/2312.12479
PDF: https://arxiv.org/pdf/2312.12479
OA Status: green
Related Works: 10
OpenAlex ID: https://openalex.org/W4390091556

All OpenAlex metadata

Raw OpenAlex JSON

OpenAlex ID: https://openalex.org/W4390091556

Canonical identifier for this work in OpenAlex
DOI: https://doi.org/10.48550/arxiv.2312.12479

Digital Object Identifier
Title: Zero-shot Building Attribute Extraction from Large-Scale Vision and Language Models

Work title
Type: preprint

OpenAlex work type
Language: en

Primary language
Publication year: 2023

Year of publication
Publication date: 2023-12-19

Full publication date if available
Authors: Fei Pan, Sangryul Jeon, Brian Wang, Frank McKenna, Stella X. Yu

List of authors in order
Landing page: https://arxiv.org/abs/2312.12479

Publisher landing page
PDF URL: https://arxiv.org/pdf/2312.12479

Direct link to full text PDF
Open access: Yes

Whether a free full text is available
OA status: green

Open access status per OpenAlex
OA URL: https://arxiv.org/pdf/2312.12479

Direct OA link when available
Concepts: Closed captioning, Computer science, Workflow, Robustness (evolution), Artificial intelligence, Scalability, Natural language processing, Feature extraction, Annotation, Information retrieval, Image (mathematics), Database, Chemistry, Gene, Biochemistry

Top concepts (fields/topics) attached by OpenAlex
Cited by: 0

Total citation count in OpenAlex
Related works (count): 10

Other works algorithmically related by OpenAlex

Full payload

id	https://openalex.org/W4390091556
doi	https://doi.org/10.48550/arxiv.2312.12479
ids.doi	https://doi.org/10.48550/arxiv.2312.12479
ids.openalex	https://openalex.org/W4390091556
fwci
type	preprint
title	Zero-shot Building Attribute Extraction from Large-Scale Vision and Language Models
biblio.issue
biblio.volume
biblio.last_page
biblio.first_page
topics[0].id	https://openalex.org/T11714
topics[0].field.id	https://openalex.org/fields/17
topics[0].field.display_name	Computer Science
topics[0].score	0.9980000257492065
topics[0].domain.id	https://openalex.org/domains/3
topics[0].domain.display_name	Physical Sciences
topics[0].subfield.id	https://openalex.org/subfields/1707
topics[0].subfield.display_name	Computer Vision and Pattern Recognition
topics[0].display_name	Multimodal Machine Learning Applications
topics[1].id	https://openalex.org/T10627
topics[1].field.id	https://openalex.org/fields/17
topics[1].field.display_name	Computer Science
topics[1].score	0.9779000282287598
topics[1].domain.id	https://openalex.org/domains/3
topics[1].domain.display_name	Physical Sciences
topics[1].subfield.id	https://openalex.org/subfields/1707
topics[1].subfield.display_name	Computer Vision and Pattern Recognition
topics[1].display_name	Advanced Image and Video Retrieval Techniques
topics[2].id	https://openalex.org/T11307
topics[2].field.id	https://openalex.org/fields/17
topics[2].field.display_name	Computer Science
topics[2].score	0.9555000066757202
topics[2].domain.id	https://openalex.org/domains/3
topics[2].domain.display_name	Physical Sciences
topics[2].subfield.id	https://openalex.org/subfields/1702
topics[2].subfield.display_name	Artificial Intelligence
topics[2].display_name	Domain Adaptation and Few-Shot Learning
is_xpac	False
apc_list
apc_paid
concepts[0].id	https://openalex.org/C157657479
concepts[0].level	3
concepts[0].score	0.9629337787628174
concepts[0].wikidata	https://www.wikidata.org/wiki/Q2367247
concepts[0].display_name	Closed captioning
concepts[1].id	https://openalex.org/C41008148
concepts[1].level	0
concepts[1].score	0.7990620136260986
concepts[1].wikidata	https://www.wikidata.org/wiki/Q21198
concepts[1].display_name	Computer science
concepts[2].id	https://openalex.org/C177212765
concepts[2].level	2
concepts[2].score	0.7107390761375427
concepts[2].wikidata	https://www.wikidata.org/wiki/Q627335
concepts[2].display_name	Workflow
concepts[3].id	https://openalex.org/C63479239
concepts[3].level	3
concepts[3].score	0.5860669016838074
concepts[3].wikidata	https://www.wikidata.org/wiki/Q7353546
concepts[3].display_name	Robustness (evolution)
concepts[4].id	https://openalex.org/C154945302
concepts[4].level	1
concepts[4].score	0.5768523216247559
concepts[4].wikidata	https://www.wikidata.org/wiki/Q11660
concepts[4].display_name	Artificial intelligence
concepts[5].id	https://openalex.org/C48044578
concepts[5].level	2
concepts[5].score	0.5717222094535828
concepts[5].wikidata	https://www.wikidata.org/wiki/Q727490
concepts[5].display_name	Scalability
concepts[6].id	https://openalex.org/C204321447
concepts[6].level	1
concepts[6].score	0.5507630705833435
concepts[6].wikidata	https://www.wikidata.org/wiki/Q30642
concepts[6].display_name	Natural language processing
concepts[7].id	https://openalex.org/C52622490
concepts[7].level	2
concepts[7].score	0.47364914417266846
concepts[7].wikidata	https://www.wikidata.org/wiki/Q1026626
concepts[7].display_name	Feature extraction
concepts[8].id	https://openalex.org/C2776321320
concepts[8].level	2
concepts[8].score	0.4590649902820587
concepts[8].wikidata	https://www.wikidata.org/wiki/Q857525
concepts[8].display_name	Annotation
concepts[9].id	https://openalex.org/C23123220
concepts[9].level	1
concepts[9].score	0.3750884234905243
concepts[9].wikidata	https://www.wikidata.org/wiki/Q816826
concepts[9].display_name	Information retrieval
concepts[10].id	https://openalex.org/C115961682
concepts[10].level	2
concepts[10].score	0.25876569747924805
concepts[10].wikidata	https://www.wikidata.org/wiki/Q860623
concepts[10].display_name	Image (mathematics)
concepts[11].id	https://openalex.org/C77088390
concepts[11].level	1
concepts[11].score	0.17680829763412476
concepts[11].wikidata	https://www.wikidata.org/wiki/Q8513
concepts[11].display_name	Database
concepts[12].id	https://openalex.org/C185592680
concepts[12].level	0
concepts[12].score	0.0
concepts[12].wikidata	https://www.wikidata.org/wiki/Q2329
concepts[12].display_name	Chemistry
concepts[13].id	https://openalex.org/C104317684
concepts[13].level	2
concepts[13].score	0.0
concepts[13].wikidata	https://www.wikidata.org/wiki/Q7187
concepts[13].display_name	Gene
concepts[14].id	https://openalex.org/C55493867
concepts[14].level	1
concepts[14].score	0.0
concepts[14].wikidata	https://www.wikidata.org/wiki/Q7094
concepts[14].display_name	Biochemistry
keywords[0].id	https://openalex.org/keywords/closed-captioning
keywords[0].score	0.9629337787628174
keywords[0].display_name	Closed captioning
keywords[1].id	https://openalex.org/keywords/computer-science
keywords[1].score	0.7990620136260986
keywords[1].display_name	Computer science
keywords[2].id	https://openalex.org/keywords/workflow
keywords[2].score	0.7107390761375427
keywords[2].display_name	Workflow
keywords[3].id	https://openalex.org/keywords/robustness
keywords[3].score	0.5860669016838074
keywords[3].display_name	Robustness (evolution)
keywords[4].id	https://openalex.org/keywords/artificial-intelligence
keywords[4].score	0.5768523216247559
keywords[4].display_name	Artificial intelligence
keywords[5].id	https://openalex.org/keywords/scalability
keywords[5].score	0.5717222094535828
keywords[5].display_name	Scalability
keywords[6].id	https://openalex.org/keywords/natural-language-processing
keywords[6].score	0.5507630705833435
keywords[6].display_name	Natural language processing
keywords[7].id	https://openalex.org/keywords/feature-extraction
keywords[7].score	0.47364914417266846
keywords[7].display_name	Feature extraction
keywords[8].id	https://openalex.org/keywords/annotation
keywords[8].score	0.4590649902820587
keywords[8].display_name	Annotation
keywords[9].id	https://openalex.org/keywords/information-retrieval
keywords[9].score	0.3750884234905243
keywords[9].display_name	Information retrieval
keywords[10].id	https://openalex.org/keywords/image
keywords[10].score	0.25876569747924805
keywords[10].display_name	Image (mathematics)
keywords[11].id	https://openalex.org/keywords/database
keywords[11].score	0.17680829763412476
keywords[11].display_name	Database
language	en
locations[0].id	pmh:oai:arXiv.org:2312.12479
locations[0].is_oa	True
locations[0].source.id	https://openalex.org/S4306400194
locations[0].source.issn
locations[0].source.type	repository
locations[0].source.is_oa	True
locations[0].source.issn_l
locations[0].source.is_core	False
locations[0].source.is_in_doaj	False
locations[0].source.display_name	arXiv (Cornell University)
locations[0].source.host_organization	https://openalex.org/I205783295
locations[0].source.host_organization_name	Cornell University
locations[0].source.host_organization_lineage	https://openalex.org/I205783295
locations[0].license	cc-by
locations[0].pdf_url	https://arxiv.org/pdf/2312.12479
locations[0].version	submittedVersion
locations[0].raw_type	text
locations[0].license_id	https://openalex.org/licenses/cc-by
locations[0].is_accepted	False
locations[0].is_published	False
locations[0].raw_source_name
locations[0].landing_page_url	http://arxiv.org/abs/2312.12479
locations[1].id	doi:10.48550/arxiv.2312.12479
locations[1].is_oa	True
locations[1].source.id	https://openalex.org/S4306400194
locations[1].source.issn
locations[1].source.type	repository
locations[1].source.is_oa	True
locations[1].source.issn_l
locations[1].source.is_core	False
locations[1].source.is_in_doaj	False
locations[1].source.display_name	arXiv (Cornell University)
locations[1].source.host_organization	https://openalex.org/I205783295
locations[1].source.host_organization_name	Cornell University
locations[1].source.host_organization_lineage	https://openalex.org/I205783295
locations[1].license	cc-by
locations[1].pdf_url
locations[1].version
locations[1].raw_type	article
locations[1].license_id	https://openalex.org/licenses/cc-by
locations[1].is_accepted	False
locations[1].is_published
locations[1].raw_source_name
locations[1].landing_page_url	https://doi.org/10.48550/arxiv.2312.12479
indexed_in	arxiv, datacite
authorships[0].author.id	https://openalex.org/A5063873349
authorships[0].author.orcid	https://orcid.org/0000-0002-6361-0936
authorships[0].author.display_name	Fei Pan
authorships[0].author_position	first
authorships[0].raw_author_name	Pan, Fei
authorships[0].is_corresponding	False
authorships[1].author.id	https://openalex.org/A5014123447
authorships[1].author.orcid	https://orcid.org/0000-0003-0991-6165
authorships[1].author.display_name	Sangryul Jeon
authorships[1].author_position	middle
authorships[1].raw_author_name	Jeon, Sangryul
authorships[1].is_corresponding	False
authorships[2].author.id	https://openalex.org/A5073241086
authorships[2].author.orcid	https://orcid.org/0000-0002-5903-4593
authorships[2].author.display_name	Brian Wang
authorships[2].author_position	middle
authorships[2].raw_author_name	Wang, Brian
authorships[2].is_corresponding	False
authorships[3].author.id	https://openalex.org/A5049642625
authorships[3].author.orcid
authorships[3].author.display_name	Frank McKenna
authorships[3].author_position	middle
authorships[3].raw_author_name	Mckenna, Frank
authorships[3].is_corresponding	False
authorships[4].author.id	https://openalex.org/A5042014034
authorships[4].author.orcid	https://orcid.org/0000-0002-3507-5761
authorships[4].author.display_name	Stella X. Yu
authorships[4].author_position	last
authorships[4].raw_author_name	Yu, Stella X.
authorships[4].is_corresponding	False
has_content.pdf	True
has_content.grobid_xml	True
is_paratext	False
open_access.is_oa	True
open_access.oa_url	https://arxiv.org/pdf/2312.12479
open_access.oa_status	green
open_access.any_repository_has_fulltext	False
created_date	2025-10-10T00:00:00
display_name	Zero-shot Building Attribute Extraction from Large-Scale Vision and Language Models
has_fulltext	False
is_retracted	False
updated_date	2025-11-06T06:51:31.235846
primary_topic.id	https://openalex.org/T11714
primary_topic.field.id	https://openalex.org/fields/17
primary_topic.field.display_name	Computer Science
primary_topic.score	0.9980000257492065
primary_topic.domain.id	https://openalex.org/domains/3
primary_topic.domain.display_name	Physical Sciences
primary_topic.subfield.id	https://openalex.org/subfields/1707
primary_topic.subfield.display_name	Computer Vision and Pattern Recognition
primary_topic.display_name	Multimodal Machine Learning Applications
related_works	https://openalex.org/W4210416330, https://openalex.org/W2775506363, https://openalex.org/W3088136942, https://openalex.org/W4290852288, https://openalex.org/W2949362007, https://openalex.org/W4388893791, https://openalex.org/W4283207562, https://openalex.org/W2963177403, https://openalex.org/W2330246314, https://openalex.org/W2949522393
cited_by_count	0
locations_count	2
best_oa_location.id	pmh:oai:arXiv.org:2312.12479
best_oa_location.is_oa	True
best_oa_location.source.id	https://openalex.org/S4306400194
best_oa_location.source.issn
best_oa_location.source.type	repository
best_oa_location.source.is_oa	True
best_oa_location.source.issn_l
best_oa_location.source.is_core	False
best_oa_location.source.is_in_doaj	False
best_oa_location.source.display_name	arXiv (Cornell University)
best_oa_location.source.host_organization	https://openalex.org/I205783295
best_oa_location.source.host_organization_name	Cornell University
best_oa_location.source.host_organization_lineage	https://openalex.org/I205783295
best_oa_location.license	cc-by
best_oa_location.pdf_url	https://arxiv.org/pdf/2312.12479
best_oa_location.version	submittedVersion
best_oa_location.raw_type	text
best_oa_location.license_id	https://openalex.org/licenses/cc-by
best_oa_location.is_accepted	False
best_oa_location.is_published	False
best_oa_location.raw_source_name
best_oa_location.landing_page_url	http://arxiv.org/abs/2312.12479
primary_location.id	pmh:oai:arXiv.org:2312.12479
primary_location.is_oa	True
primary_location.source.id	https://openalex.org/S4306400194
primary_location.source.issn
primary_location.source.type	repository
primary_location.source.is_oa	True
primary_location.source.issn_l
primary_location.source.is_core	False
primary_location.source.is_in_doaj	False
primary_location.source.display_name	arXiv (Cornell University)
primary_location.source.host_organization	https://openalex.org/I205783295
primary_location.source.host_organization_name	Cornell University
primary_location.source.host_organization_lineage	https://openalex.org/I205783295
primary_location.license	cc-by
primary_location.pdf_url	https://arxiv.org/pdf/2312.12479
primary_location.version	submittedVersion
primary_location.raw_type	text
primary_location.license_id	https://openalex.org/licenses/cc-by
primary_location.is_accepted	False
primary_location.is_published	False
primary_location.raw_source_name
primary_location.landing_page_url	http://arxiv.org/abs/2312.12479
publication_date	2023-12-19
publication_year	2023
referenced_works_count	0
abstract_inverted_index.a	44, 109, 122
abstract_inverted_index.In	40
abstract_inverted_index.by	5, 97
abstract_inverted_index.in	133
abstract_inverted_index.of	101
abstract_inverted_index.on	62, 82, 143
abstract_inverted_index.to	10, 34, 59, 86, 125
abstract_inverted_index.we	42
abstract_inverted_index.The	65
abstract_inverted_index.and	15, 20, 32, 37, 56, 74, 88, 104, 107, 115, 136, 149
abstract_inverted_index.for	18, 48, 77, 129
abstract_inverted_index.key	70
abstract_inverted_index.new	45
abstract_inverted_index.our	119
abstract_inverted_index.the	30, 78, 83, 102, 105, 113, 134
abstract_inverted_index.two	69, 92
abstract_inverted_index.each	23
abstract_inverted_index.from	13
abstract_inverted_index.task	24
abstract_inverted_index.that	52
abstract_inverted_index.These	91
abstract_inverted_index.based	81
abstract_inverted_index.civil	89, 137
abstract_inverted_index.data,	28
abstract_inverted_index.human	144
abstract_inverted_index.image	103
abstract_inverted_index.match	111
abstract_inverted_index.while	146
abstract_inverted_index.avenue	124
abstract_inverted_index.images	17, 80
abstract_inverted_index.models	58
abstract_inverted_index.module	25
abstract_inverted_index.offers	121
abstract_inverted_index.vision	55
abstract_inverted_index.visual	114
abstract_inverted_index.BRAILS,	6
abstract_inverted_index.between	112
abstract_inverted_index.enhance	126
abstract_inverted_index.extract	11
abstract_inverted_index.feature	99
abstract_inverted_index.propose	43
abstract_inverted_index.textual	116
abstract_inverted_index.utilize	7
abstract_inverted_index.Existing	0
abstract_inverted_index.However,	22
abstract_inverted_index.building	1, 49, 79, 130
abstract_inverted_index.captions	96
abstract_inverted_index.contains	68
abstract_inverted_index.domains,	139
abstract_inverted_index.external	63
abstract_inverted_index.generate	94
abstract_inverted_index.language	57
abstract_inverted_index.learning	9
abstract_inverted_index.methods,	3
abstract_inverted_index.mitigate	60
abstract_inverted_index.proposed	66
abstract_inverted_index.reducing	141
abstract_inverted_index.regional	35
abstract_inverted_index.reliance	61, 142
abstract_inverted_index.requires	26
abstract_inverted_index.semantic	110
abstract_inverted_index.utilizes	53
abstract_inverted_index.workflow	47, 67
abstract_inverted_index.AI-driven	127
abstract_inverted_index.attribute	50, 131
abstract_inverted_index.computing	98
abstract_inverted_index.framework	120
abstract_inverted_index.hindering	29
abstract_inverted_index.pertinent	85
abstract_inverted_index.promising	123
abstract_inverted_index.response,	41
abstract_inverted_index.satellite	14
abstract_inverted_index.zero-shot	46
abstract_inverted_index.annotation	38
abstract_inverted_index.bolstering	147
abstract_inverted_index.captioning	73, 76, 128
abstract_inverted_index.components	93
abstract_inverted_index.extraction	51, 132
abstract_inverted_index.robustness	33
abstract_inverted_index.structural	87, 135
abstract_inverted_index.supervised	8
abstract_inverted_index.ultimately	140
abstract_inverted_index.variations	36
abstract_inverted_index.annotations	145
abstract_inverted_index.components:	71
abstract_inverted_index.descriptive	95
abstract_inverted_index.engineering	138
abstract_inverted_index.exemplified	4
abstract_inverted_index.image-level	72
abstract_inverted_index.imbalances.	39
abstract_inverted_index.information	12
abstract_inverted_index.large-scale	54
abstract_inverted_index.performance	148
abstract_inverted_index.recognition	2
abstract_inverted_index.scalability	31
abstract_inverted_index.street-view	16
abstract_inverted_index.annotations.	64
abstract_inverted_index.engineering.	90
abstract_inverted_index.facilitating	108
abstract_inverted_index.vocabularies	84
abstract_inverted_index.Consequently,	118
abstract_inverted_index.adaptability.	150
abstract_inverted_index.segment-level	75
abstract_inverted_index.segmentation.	21
abstract_inverted_index.vocabularies,	106
abstract_inverted_index.classification	19
abstract_inverted_index.human-annotated	27
abstract_inverted_index.representations	100
abstract_inverted_index.representations.	117
cited_by_percentile_year
countries_distinct_count	0
institutions_distinct_count	5
citation_normalized_percentile