PLIP: Language-Image Pre-training for Person Representation Learning Article Swipe

PDF

Jialong Zuo , Changqian Yu , Nong Sang , Changxin Gao ·

YOU? · · 2023 · Open Access · · DOI: https://doi.org/10.48550/arxiv.2305.08386

Language-image pre-training is an effective technique for learning powerful representations in general domains. However, when directly turning to person representation learning, these general pre-training methods suffer from unsatisfactory performance. The reason is that they neglect critical person-related characteristics, i.e., fine-grained attributes and identities. To address this issue, we propose a novel language-image pre-training framework for person representation learning, termed PLIP. Specifically, we elaborately design three pretext tasks: 1) Text-guided Image Colorization, aims to establish the correspondence between the person-related image regions and the fine-grained color-part textual phrases. 2) Image-guided Attributes Prediction, aims to mine fine-grained attribute information of the person body in the image; and 3) Identity-based Vision-Language Contrast, aims to correlate the cross-modal representations at the identity level rather than the instance level. Moreover, to implement our pre-train framework, we construct a large-scale person dataset with image-text pairs named SYNTH-PEDES by automatically generating textual annotations. We pre-train PLIP on SYNTH-PEDES and evaluate our models by spanning downstream person-centric tasks. PLIP not only significantly improves existing methods on all these tasks, but also shows great ability in the zero-shot and domain generalization settings. The code, dataset and weights will be released at~\url{https://github.com/Zplusdragon/PLIP}

Related Topics

Computer Science

Artificial Intelligence

Mathematical Analysis

Concepts

Closed captioning Computer science Discriminative model Artificial intelligence Representation (politics) Code (set theory) Generalization Matching (statistics) Image (mathematics) Natural language processing Feature learning Machine learning Set (abstract data type) Law Political science Mathematical analysis Mathematics Politics Programming language Statistics

Metadata

Type: preprint
Language: en
Landing Page: http://arxiv.org/abs/2305.08386
PDF: https://arxiv.org/pdf/2305.08386
OA Status: green
Cited By: 19
Related Works: 10
OpenAlex ID: https://openalex.org/W4376653864

All OpenAlex metadata

Raw OpenAlex JSON

OpenAlex ID: https://openalex.org/W4376653864

Canonical identifier for this work in OpenAlex
DOI: https://doi.org/10.48550/arxiv.2305.08386

Digital Object Identifier
Title: PLIP: Language-Image Pre-training for Person Representation Learning

Work title
Type: preprint

OpenAlex work type
Language: en

Primary language
Publication year: 2023

Year of publication
Publication date: 2023-05-15

Full publication date if available
Authors: Jialong Zuo, Changqian Yu, Nong Sang, Changxin Gao

List of authors in order
Landing page: https://arxiv.org/abs/2305.08386

Publisher landing page
PDF URL: https://arxiv.org/pdf/2305.08386

Direct link to full text PDF
Open access: Yes

Whether a free full text is available
OA status: green

Open access status per OpenAlex
OA URL: https://arxiv.org/pdf/2305.08386

Direct OA link when available
Concepts: Closed captioning, Computer science, Discriminative model, Artificial intelligence, Representation (politics), Code (set theory), Generalization, Matching (statistics), Image (mathematics), Natural language processing, Feature learning, Machine learning, Set (abstract data type), Law, Political science, Mathematical analysis, Mathematics, Politics, Programming language, Statistics

Top concepts (fields/topics) attached by OpenAlex
Cited by: 19

Total citation count in OpenAlex
Citations by year (recent): 2025: 3, 2024: 14, 2023: 2

Per-year citation counts (last 5 years)
Related works (count): 10

Other works algorithmically related by OpenAlex

Full payload

id	https://openalex.org/W4376653864
doi	https://doi.org/10.48550/arxiv.2305.08386
ids.doi	https://doi.org/10.48550/arxiv.2305.08386
ids.openalex	https://openalex.org/W4376653864
fwci
type	preprint
title	PLIP: Language-Image Pre-training for Person Representation Learning
biblio.issue
biblio.volume
biblio.last_page
biblio.first_page
topics[0].id	https://openalex.org/T10331
topics[0].field.id	https://openalex.org/fields/17
topics[0].field.display_name	Computer Science
topics[0].score	0.9976999759674072
topics[0].domain.id	https://openalex.org/domains/3
topics[0].domain.display_name	Physical Sciences
topics[0].subfield.id	https://openalex.org/subfields/1707
topics[0].subfield.display_name	Computer Vision and Pattern Recognition
topics[0].display_name	Video Surveillance and Tracking Methods
topics[1].id	https://openalex.org/T10812
topics[1].field.id	https://openalex.org/fields/17
topics[1].field.display_name	Computer Science
topics[1].score	0.9941999912261963
topics[1].domain.id	https://openalex.org/domains/3
topics[1].domain.display_name	Physical Sciences
topics[1].subfield.id	https://openalex.org/subfields/1707
topics[1].subfield.display_name	Computer Vision and Pattern Recognition
topics[1].display_name	Human Pose and Action Recognition
topics[2].id	https://openalex.org/T12740
topics[2].field.id	https://openalex.org/fields/22
topics[2].field.display_name	Engineering
topics[2].score	0.9679999947547913
topics[2].domain.id	https://openalex.org/domains/3
topics[2].domain.display_name	Physical Sciences
topics[2].subfield.id	https://openalex.org/subfields/2204
topics[2].subfield.display_name	Biomedical Engineering
topics[2].display_name	Gait Recognition and Analysis
is_xpac	False
apc_list
apc_paid
concepts[0].id	https://openalex.org/C157657479
concepts[0].level	3
concepts[0].score	0.8663817644119263
concepts[0].wikidata	https://www.wikidata.org/wiki/Q2367247
concepts[0].display_name	Closed captioning
concepts[1].id	https://openalex.org/C41008148
concepts[1].level	0
concepts[1].score	0.7870161533355713
concepts[1].wikidata	https://www.wikidata.org/wiki/Q21198
concepts[1].display_name	Computer science
concepts[2].id	https://openalex.org/C97931131
concepts[2].level	2
concepts[2].score	0.6996609568595886
concepts[2].wikidata	https://www.wikidata.org/wiki/Q5282087
concepts[2].display_name	Discriminative model
concepts[3].id	https://openalex.org/C154945302
concepts[3].level	1
concepts[3].score	0.6152242422103882
concepts[3].wikidata	https://www.wikidata.org/wiki/Q11660
concepts[3].display_name	Artificial intelligence
concepts[4].id	https://openalex.org/C2776359362
concepts[4].level	3
concepts[4].score	0.6103115081787109
concepts[4].wikidata	https://www.wikidata.org/wiki/Q2145286
concepts[4].display_name	Representation (politics)
concepts[5].id	https://openalex.org/C2776760102
concepts[5].level	3
concepts[5].score	0.6098970174789429
concepts[5].wikidata	https://www.wikidata.org/wiki/Q5139990
concepts[5].display_name	Code (set theory)
concepts[6].id	https://openalex.org/C177148314
concepts[6].level	2
concepts[6].score	0.5879033803939819
concepts[6].wikidata	https://www.wikidata.org/wiki/Q170084
concepts[6].display_name	Generalization
concepts[7].id	https://openalex.org/C165064840
concepts[7].level	2
concepts[7].score	0.5434309840202332
concepts[7].wikidata	https://www.wikidata.org/wiki/Q1321061
concepts[7].display_name	Matching (statistics)
concepts[8].id	https://openalex.org/C115961682
concepts[8].level	2
concepts[8].score	0.5063034296035767
concepts[8].wikidata	https://www.wikidata.org/wiki/Q860623
concepts[8].display_name	Image (mathematics)
concepts[9].id	https://openalex.org/C204321447
concepts[9].level	1
concepts[9].score	0.5006155967712402
concepts[9].wikidata	https://www.wikidata.org/wiki/Q30642
concepts[9].display_name	Natural language processing
concepts[10].id	https://openalex.org/C59404180
concepts[10].level	2
concepts[10].score	0.4634082019329071
concepts[10].wikidata	https://www.wikidata.org/wiki/Q17013334
concepts[10].display_name	Feature learning
concepts[11].id	https://openalex.org/C119857082
concepts[11].level	1
concepts[11].score	0.4050605297088623
concepts[11].wikidata	https://www.wikidata.org/wiki/Q2539
concepts[11].display_name	Machine learning
concepts[12].id	https://openalex.org/C177264268
concepts[12].level	2
concepts[12].score	0.11431330442428589
concepts[12].wikidata	https://www.wikidata.org/wiki/Q1514741
concepts[12].display_name	Set (abstract data type)
concepts[13].id	https://openalex.org/C199539241
concepts[13].level	1
concepts[13].score	0.0
concepts[13].wikidata	https://www.wikidata.org/wiki/Q7748
concepts[13].display_name	Law
concepts[14].id	https://openalex.org/C17744445
concepts[14].level	0
concepts[14].score	0.0
concepts[14].wikidata	https://www.wikidata.org/wiki/Q36442
concepts[14].display_name	Political science
concepts[15].id	https://openalex.org/C134306372
concepts[15].level	1
concepts[15].score	0.0
concepts[15].wikidata	https://www.wikidata.org/wiki/Q7754
concepts[15].display_name	Mathematical analysis
concepts[16].id	https://openalex.org/C33923547
concepts[16].level	0
concepts[16].score	0.0
concepts[16].wikidata	https://www.wikidata.org/wiki/Q395
concepts[16].display_name	Mathematics
concepts[17].id	https://openalex.org/C94625758
concepts[17].level	2
concepts[17].score	0.0
concepts[17].wikidata	https://www.wikidata.org/wiki/Q7163
concepts[17].display_name	Politics
concepts[18].id	https://openalex.org/C199360897
concepts[18].level	1
concepts[18].score	0.0
concepts[18].wikidata	https://www.wikidata.org/wiki/Q9143
concepts[18].display_name	Programming language
concepts[19].id	https://openalex.org/C105795698
concepts[19].level	1
concepts[19].score	0.0
concepts[19].wikidata	https://www.wikidata.org/wiki/Q12483
concepts[19].display_name	Statistics
keywords[0].id	https://openalex.org/keywords/closed-captioning
keywords[0].score	0.8663817644119263
keywords[0].display_name	Closed captioning
keywords[1].id	https://openalex.org/keywords/computer-science
keywords[1].score	0.7870161533355713
keywords[1].display_name	Computer science
keywords[2].id	https://openalex.org/keywords/discriminative-model
keywords[2].score	0.6996609568595886
keywords[2].display_name	Discriminative model
keywords[3].id	https://openalex.org/keywords/artificial-intelligence
keywords[3].score	0.6152242422103882
keywords[3].display_name	Artificial intelligence
keywords[4].id	https://openalex.org/keywords/representation
keywords[4].score	0.6103115081787109
keywords[4].display_name	Representation (politics)
keywords[5].id	https://openalex.org/keywords/code
keywords[5].score	0.6098970174789429
keywords[5].display_name	Code (set theory)
keywords[6].id	https://openalex.org/keywords/generalization
keywords[6].score	0.5879033803939819
keywords[6].display_name	Generalization
keywords[7].id	https://openalex.org/keywords/matching
keywords[7].score	0.5434309840202332
keywords[7].display_name	Matching (statistics)
keywords[8].id	https://openalex.org/keywords/image
keywords[8].score	0.5063034296035767
keywords[8].display_name	Image (mathematics)
keywords[9].id	https://openalex.org/keywords/natural-language-processing
keywords[9].score	0.5006155967712402
keywords[9].display_name	Natural language processing
keywords[10].id	https://openalex.org/keywords/feature-learning
keywords[10].score	0.4634082019329071
keywords[10].display_name	Feature learning
keywords[11].id	https://openalex.org/keywords/machine-learning
keywords[11].score	0.4050605297088623
keywords[11].display_name	Machine learning
keywords[12].id	https://openalex.org/keywords/set
keywords[12].score	0.11431330442428589
keywords[12].display_name	Set (abstract data type)
language	en
locations[0].id	pmh:oai:arXiv.org:2305.08386
locations[0].is_oa	True
locations[0].source.id	https://openalex.org/S4306400194
locations[0].source.issn
locations[0].source.type	repository
locations[0].source.is_oa	True
locations[0].source.issn_l
locations[0].source.is_core	False
locations[0].source.is_in_doaj	False
locations[0].source.display_name	arXiv (Cornell University)
locations[0].source.host_organization	https://openalex.org/I205783295
locations[0].source.host_organization_name	Cornell University
locations[0].source.host_organization_lineage	https://openalex.org/I205783295
locations[0].license
locations[0].pdf_url	https://arxiv.org/pdf/2305.08386
locations[0].version	submittedVersion
locations[0].raw_type
locations[0].license_id
locations[0].is_accepted	False
locations[0].is_published	False
locations[0].raw_source_name
locations[0].landing_page_url	http://arxiv.org/abs/2305.08386
locations[1].id	doi:10.48550/arxiv.2305.08386
locations[1].is_oa	True
locations[1].source.id	https://openalex.org/S4306400194
locations[1].source.issn
locations[1].source.type	repository
locations[1].source.is_oa	True
locations[1].source.issn_l
locations[1].source.is_core	False
locations[1].source.is_in_doaj	False
locations[1].source.display_name	arXiv (Cornell University)
locations[1].source.host_organization	https://openalex.org/I205783295
locations[1].source.host_organization_name	Cornell University
locations[1].source.host_organization_lineage	https://openalex.org/I205783295
locations[1].license
locations[1].pdf_url
locations[1].version
locations[1].raw_type	article
locations[1].license_id
locations[1].is_accepted	False
locations[1].is_published
locations[1].raw_source_name
locations[1].landing_page_url	https://doi.org/10.48550/arxiv.2305.08386
indexed_in	arxiv, datacite
authorships[0].author.id	https://openalex.org/A5100580693
authorships[0].author.orcid
authorships[0].author.display_name	Jialong Zuo
authorships[0].author_position	first
authorships[0].raw_author_name	Zuo, Jialong
authorships[0].is_corresponding	False
authorships[1].author.id	https://openalex.org/A5013651586
authorships[1].author.orcid	https://orcid.org/0000-0002-4488-4157
authorships[1].author.display_name	Changqian Yu
authorships[1].author_position	middle
authorships[1].raw_author_name	Yu, Changqian
authorships[1].is_corresponding	False
authorships[2].author.id	https://openalex.org/A5013734579
authorships[2].author.orcid	https://orcid.org/0000-0002-9167-1496
authorships[2].author.display_name	Nong Sang
authorships[2].author_position	middle
authorships[2].raw_author_name	Sang, Nong
authorships[2].is_corresponding	False
authorships[3].author.id	https://openalex.org/A5035295689
authorships[3].author.orcid	https://orcid.org/0000-0003-2736-3920
authorships[3].author.display_name	Changxin Gao
authorships[3].author_position	last
authorships[3].raw_author_name	Gao, Changxin
authorships[3].is_corresponding	False
has_content.pdf	False
has_content.grobid_xml	False
is_paratext	False
open_access.is_oa	True
open_access.oa_url	https://arxiv.org/pdf/2305.08386
open_access.oa_status	green
open_access.any_repository_has_fulltext	False
created_date	2025-10-10T00:00:00
display_name	PLIP: Language-Image Pre-training for Person Representation Learning
has_fulltext	False
is_retracted	False
updated_date	2025-11-06T06:51:31.235846
primary_topic.id	https://openalex.org/T10331
primary_topic.field.id	https://openalex.org/fields/17
primary_topic.field.display_name	Computer Science
primary_topic.score	0.9976999759674072
primary_topic.domain.id	https://openalex.org/domains/3
primary_topic.domain.display_name	Physical Sciences
primary_topic.subfield.id	https://openalex.org/subfields/1707
primary_topic.subfield.display_name	Computer Vision and Pattern Recognition
primary_topic.display_name	Video Surveillance and Tracking Methods
related_works	https://openalex.org/W4210416330, https://openalex.org/W2775506363, https://openalex.org/W3088136942, https://openalex.org/W2949362007, https://openalex.org/W3208297503, https://openalex.org/W3119773509, https://openalex.org/W2889153461, https://openalex.org/W2964117661, https://openalex.org/W4388405611, https://openalex.org/W2619127353
cited_by_count	19
counts_by_year[0].year	2025
counts_by_year[0].cited_by_count	3
counts_by_year[1].year	2024
counts_by_year[1].cited_by_count	14
counts_by_year[2].year	2023
counts_by_year[2].cited_by_count	2
locations_count	2
best_oa_location.id	pmh:oai:arXiv.org:2305.08386
best_oa_location.is_oa	True
best_oa_location.source.id	https://openalex.org/S4306400194
best_oa_location.source.issn
best_oa_location.source.type	repository
best_oa_location.source.is_oa	True
best_oa_location.source.issn_l
best_oa_location.source.is_core	False
best_oa_location.source.is_in_doaj	False
best_oa_location.source.display_name	arXiv (Cornell University)
best_oa_location.source.host_organization	https://openalex.org/I205783295
best_oa_location.source.host_organization_name	Cornell University
best_oa_location.source.host_organization_lineage	https://openalex.org/I205783295
best_oa_location.license
best_oa_location.pdf_url	https://arxiv.org/pdf/2305.08386
best_oa_location.version	submittedVersion
best_oa_location.raw_type
best_oa_location.license_id
best_oa_location.is_accepted	False
best_oa_location.is_published	False
best_oa_location.raw_source_name
best_oa_location.landing_page_url	http://arxiv.org/abs/2305.08386
primary_location.id	pmh:oai:arXiv.org:2305.08386
primary_location.is_oa	True
primary_location.source.id	https://openalex.org/S4306400194
primary_location.source.issn
primary_location.source.type	repository
primary_location.source.is_oa	True
primary_location.source.issn_l
primary_location.source.is_core	False
primary_location.source.is_in_doaj	False
primary_location.source.display_name	arXiv (Cornell University)
primary_location.source.host_organization	https://openalex.org/I205783295
primary_location.source.host_organization_name	Cornell University
primary_location.source.host_organization_lineage	https://openalex.org/I205783295
primary_location.license
primary_location.pdf_url	https://arxiv.org/pdf/2305.08386
primary_location.version	submittedVersion
primary_location.raw_type
primary_location.license_id
primary_location.is_accepted	False
primary_location.is_published	False
primary_location.raw_source_name
primary_location.landing_page_url	http://arxiv.org/abs/2305.08386
publication_date	2023-05-15
publication_year	2023
referenced_works_count	0
abstract_inverted_index.a	49, 132
abstract_inverted_index.1)	67
abstract_inverted_index.2)	87
abstract_inverted_index.3)	105
abstract_inverted_index.To	43
abstract_inverted_index.We	146
abstract_inverted_index.an	3
abstract_inverted_index.at	115
abstract_inverted_index.be	189
abstract_inverted_index.by	141, 155
abstract_inverted_index.in	10, 101, 176
abstract_inverted_index.is	2, 31
abstract_inverted_index.of	97
abstract_inverted_index.on	149, 167
abstract_inverted_index.to	17, 72, 92, 110, 125
abstract_inverted_index.we	47, 61, 130
abstract_inverted_index.The	29, 183
abstract_inverted_index.all	168
abstract_inverted_index.and	41, 81, 104, 151, 179, 186
abstract_inverted_index.but	171
abstract_inverted_index.for	6, 54
abstract_inverted_index.not	161
abstract_inverted_index.our	127, 153
abstract_inverted_index.the	74, 77, 82, 98, 102, 112, 116, 121, 177
abstract_inverted_index.PLIP	148, 160
abstract_inverted_index.aims	71, 91, 109
abstract_inverted_index.also	172
abstract_inverted_index.body	100
abstract_inverted_index.from	26
abstract_inverted_index.mine	93
abstract_inverted_index.only	162
abstract_inverted_index.than	120
abstract_inverted_index.that	32
abstract_inverted_index.they	33
abstract_inverted_index.this	45
abstract_inverted_index.when	14
abstract_inverted_index.will	188
abstract_inverted_index.with	136
abstract_inverted_index.Image	69
abstract_inverted_index.PLIP.	59
abstract_inverted_index.code,	184
abstract_inverted_index.great	174
abstract_inverted_index.i.e.,	38
abstract_inverted_index.image	79
abstract_inverted_index.level	118
abstract_inverted_index.named	139
abstract_inverted_index.novel	50
abstract_inverted_index.pairs	138
abstract_inverted_index.shows	173
abstract_inverted_index.these	21, 169
abstract_inverted_index.three	64
abstract_inverted_index.design	63
abstract_inverted_index.domain	180
abstract_inverted_index.image;	103
abstract_inverted_index.issue,	46
abstract_inverted_index.level.	123
abstract_inverted_index.models	154
abstract_inverted_index.person	18, 55, 99, 134
abstract_inverted_index.rather	119
abstract_inverted_index.reason	30
abstract_inverted_index.suffer	25
abstract_inverted_index.tasks,	170
abstract_inverted_index.tasks.	159
abstract_inverted_index.tasks:	66
abstract_inverted_index.termed	58
abstract_inverted_index.ability	175
abstract_inverted_index.address	44
abstract_inverted_index.between	76
abstract_inverted_index.dataset	135, 185
abstract_inverted_index.general	11, 22
abstract_inverted_index.methods	24, 166
abstract_inverted_index.neglect	34
abstract_inverted_index.pretext	65
abstract_inverted_index.propose	48
abstract_inverted_index.regions	80
abstract_inverted_index.textual	85, 144
abstract_inverted_index.turning	16
abstract_inverted_index.weights	187
abstract_inverted_index.However,	13
abstract_inverted_index.critical	35
abstract_inverted_index.directly	15
abstract_inverted_index.domains.	12
abstract_inverted_index.evaluate	152
abstract_inverted_index.existing	165
abstract_inverted_index.identity	117
abstract_inverted_index.improves	164
abstract_inverted_index.instance	122
abstract_inverted_index.learning	7
abstract_inverted_index.phrases.	86
abstract_inverted_index.powerful	8
abstract_inverted_index.released	190
abstract_inverted_index.spanning	156
abstract_inverted_index.Contrast,	108
abstract_inverted_index.Moreover,	124
abstract_inverted_index.attribute	95
abstract_inverted_index.construct	131
abstract_inverted_index.correlate	111
abstract_inverted_index.effective	4
abstract_inverted_index.establish	73
abstract_inverted_index.framework	53
abstract_inverted_index.implement	126
abstract_inverted_index.learning,	20, 57
abstract_inverted_index.pre-train	128, 147
abstract_inverted_index.settings.	182
abstract_inverted_index.technique	5
abstract_inverted_index.zero-shot	178
abstract_inverted_index.Attributes	89
abstract_inverted_index.attributes	40
abstract_inverted_index.color-part	84
abstract_inverted_index.downstream	157
abstract_inverted_index.framework,	129
abstract_inverted_index.generating	143
abstract_inverted_index.image-text	137
abstract_inverted_index.Prediction,	90
abstract_inverted_index.SYNTH-PEDES	140, 150
abstract_inverted_index.Text-guided	68
abstract_inverted_index.cross-modal	113
abstract_inverted_index.elaborately	62
abstract_inverted_index.identities.	42
abstract_inverted_index.information	96
abstract_inverted_index.large-scale	133
abstract_inverted_index.Image-guided	88
abstract_inverted_index.annotations.	145
abstract_inverted_index.fine-grained	39, 83, 94
abstract_inverted_index.performance.	28
abstract_inverted_index.pre-training	1, 23, 52
abstract_inverted_index.Colorization,	70
abstract_inverted_index.Specifically,	60
abstract_inverted_index.automatically	142
abstract_inverted_index.significantly	163
abstract_inverted_index.Identity-based	106
abstract_inverted_index.Language-image	0
abstract_inverted_index.correspondence	75
abstract_inverted_index.generalization	181
abstract_inverted_index.language-image	51
abstract_inverted_index.person-centric	158
abstract_inverted_index.person-related	36, 78
abstract_inverted_index.representation	19, 56
abstract_inverted_index.unsatisfactory	27
abstract_inverted_index.Vision-Language	107
abstract_inverted_index.representations	9, 114
abstract_inverted_index.characteristics,	37
abstract_inverted_index.at~\url{https://github.com/Zplusdragon/PLIP}	191
cited_by_percentile_year
countries_distinct_count	0
institutions_distinct_count	4
sustainable_development_goals[0].id	https://metadata.un.org/sdg/10
sustainable_development_goals[0].score	0.7099999785423279
sustainable_development_goals[0].display_name	Reduced inequalities
citation_normalized_percentile