MAMA: Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning Article Swipe

PDF

T. Q. Nguyen , Bin Yi , Xiaobao Wu , Xinshuai Dong , Zhiyuan Hu , Khoi M. Le , Cong-Duy Nguyen , See-Kiong Ng , Luu Anh Tuan ·

YOU? · · 2024 · Open Access · · DOI: https://doi.org/10.48550/arxiv.2407.03788

Data quality stands at the forefront of deciding the effectiveness of video-language representation learning. However, video-text pairs in previous data typically do not align perfectly with each other, which might lead to video-language representations that do not accurately reflect cross-modal semantics. Moreover, previous data also possess an uneven distribution of concepts, thereby hampering the downstream performance across unpopular subjects. To address these problems, we propose MAMA, a new approach to learning video-language representations by utilizing a contrastive objective with a subtractive angular margin to regularize cross-modal representations in their effort to reach perfect similarity. Furthermore, to adapt to the non-uniform concept distribution, MAMA utilizes a multi-layer perceptron (MLP)-parameterized weighting function that maps loss values to sample weights which enable dynamic adjustment of the model's focus throughout the training. With the training guided by a small amount of unbiased meta-data and augmented by video-text data generated by large vision-language model, MAMA improves video-language representations and achieve superior performances on commonly used video question answering and text-video retrieval datasets. The code, model, and data have been made available at https://nguyentthong.github.io/MAMA.

Related Topics

Computer Science

Artificial Intelligence

Concepts

Margin (machine learning) Computer science Representation (politics) Natural language processing Artificial intelligence Feature learning Machine learning Political science Law Politics

Metadata

Type: preprint
Language: en
Landing Page: http://arxiv.org/abs/2407.03788
PDF: https://arxiv.org/pdf/2407.03788
OA Status: green
Related Works: 10
OpenAlex ID: https://openalex.org/W4400434033

All OpenAlex metadata

Raw OpenAlex JSON

OpenAlex ID: https://openalex.org/W4400434033

Canonical identifier for this work in OpenAlex
DOI: https://doi.org/10.48550/arxiv.2407.03788

Digital Object Identifier
Title: MAMA: Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning

Work title
Type: preprint

OpenAlex work type
Language: en

Primary language
Publication year: 2024

Year of publication
Publication date: 2024-07-04

Full publication date if available
Authors: T. Q. Nguyen, Bin Yi, Xiaobao Wu, Xinshuai Dong, Zhiyuan Hu, Khoi M. Le, Cong-Duy Nguyen, See-Kiong Ng, Luu Anh Tuan

List of authors in order
Landing page: https://arxiv.org/abs/2407.03788

Publisher landing page
PDF URL: https://arxiv.org/pdf/2407.03788

Direct link to full text PDF
Open access: Yes

Whether a free full text is available
OA status: green

Open access status per OpenAlex
OA URL: https://arxiv.org/pdf/2407.03788

Direct OA link when available
Concepts: Margin (machine learning), Computer science, Representation (politics), Natural language processing, Artificial intelligence, Feature learning, Machine learning, Political science, Law, Politics

Top concepts (fields/topics) attached by OpenAlex
Cited by: 0

Total citation count in OpenAlex
Related works (count): 10

Other works algorithmically related by OpenAlex

Full payload

id	https://openalex.org/W4400434033
doi	https://doi.org/10.48550/arxiv.2407.03788
ids.doi	https://doi.org/10.48550/arxiv.2407.03788
ids.openalex	https://openalex.org/W4400434033
fwci
type	preprint
title	MAMA: Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning
biblio.issue
biblio.volume
biblio.last_page
biblio.first_page
topics[0].id	https://openalex.org/T11714
topics[0].field.id	https://openalex.org/fields/17
topics[0].field.display_name	Computer Science
topics[0].score	0.9941999912261963
topics[0].domain.id	https://openalex.org/domains/3
topics[0].domain.display_name	Physical Sciences
topics[0].subfield.id	https://openalex.org/subfields/1707
topics[0].subfield.display_name	Computer Vision and Pattern Recognition
topics[0].display_name	Multimodal Machine Learning Applications
topics[1].id	https://openalex.org/T10812
topics[1].field.id	https://openalex.org/fields/17
topics[1].field.display_name	Computer Science
topics[1].score	0.9929999709129333
topics[1].domain.id	https://openalex.org/domains/3
topics[1].domain.display_name	Physical Sciences
topics[1].subfield.id	https://openalex.org/subfields/1707
topics[1].subfield.display_name	Computer Vision and Pattern Recognition
topics[1].display_name	Human Pose and Action Recognition
topics[2].id	https://openalex.org/T11439
topics[2].field.id	https://openalex.org/fields/17
topics[2].field.display_name	Computer Science
topics[2].score	0.9830999970436096
topics[2].domain.id	https://openalex.org/domains/3
topics[2].domain.display_name	Physical Sciences
topics[2].subfield.id	https://openalex.org/subfields/1707
topics[2].subfield.display_name	Computer Vision and Pattern Recognition
topics[2].display_name	Video Analysis and Summarization
is_xpac	False
apc_list
apc_paid
concepts[0].id	https://openalex.org/C774472
concepts[0].level	2
concepts[0].score	0.836654782295227
concepts[0].wikidata	https://www.wikidata.org/wiki/Q6760393
concepts[0].display_name	Margin (machine learning)
concepts[1].id	https://openalex.org/C41008148
concepts[1].level	0
concepts[1].score	0.6203455924987793
concepts[1].wikidata	https://www.wikidata.org/wiki/Q21198
concepts[1].display_name	Computer science
concepts[2].id	https://openalex.org/C2776359362
concepts[2].level	3
concepts[2].score	0.6188117265701294
concepts[2].wikidata	https://www.wikidata.org/wiki/Q2145286
concepts[2].display_name	Representation (politics)
concepts[3].id	https://openalex.org/C204321447
concepts[3].level	1
concepts[3].score	0.4554750919342041
concepts[3].wikidata	https://www.wikidata.org/wiki/Q30642
concepts[3].display_name	Natural language processing
concepts[4].id	https://openalex.org/C154945302
concepts[4].level	1
concepts[4].score	0.44752395153045654
concepts[4].wikidata	https://www.wikidata.org/wiki/Q11660
concepts[4].display_name	Artificial intelligence
concepts[5].id	https://openalex.org/C59404180
concepts[5].level	2
concepts[5].score	0.4399040639400482
concepts[5].wikidata	https://www.wikidata.org/wiki/Q17013334
concepts[5].display_name	Feature learning
concepts[6].id	https://openalex.org/C119857082
concepts[6].level	1
concepts[6].score	0.21887359023094177
concepts[6].wikidata	https://www.wikidata.org/wiki/Q2539
concepts[6].display_name	Machine learning
concepts[7].id	https://openalex.org/C17744445
concepts[7].level	0
concepts[7].score	0.05697512626647949
concepts[7].wikidata	https://www.wikidata.org/wiki/Q36442
concepts[7].display_name	Political science
concepts[8].id	https://openalex.org/C199539241
concepts[8].level	1
concepts[8].score	0.0
concepts[8].wikidata	https://www.wikidata.org/wiki/Q7748
concepts[8].display_name	Law
concepts[9].id	https://openalex.org/C94625758
concepts[9].level	2
concepts[9].score	0.0
concepts[9].wikidata	https://www.wikidata.org/wiki/Q7163
concepts[9].display_name	Politics
keywords[0].id	https://openalex.org/keywords/margin
keywords[0].score	0.836654782295227
keywords[0].display_name	Margin (machine learning)
keywords[1].id	https://openalex.org/keywords/computer-science
keywords[1].score	0.6203455924987793
keywords[1].display_name	Computer science
keywords[2].id	https://openalex.org/keywords/representation
keywords[2].score	0.6188117265701294
keywords[2].display_name	Representation (politics)
keywords[3].id	https://openalex.org/keywords/natural-language-processing
keywords[3].score	0.4554750919342041
keywords[3].display_name	Natural language processing
keywords[4].id	https://openalex.org/keywords/artificial-intelligence
keywords[4].score	0.44752395153045654
keywords[4].display_name	Artificial intelligence
keywords[5].id	https://openalex.org/keywords/feature-learning
keywords[5].score	0.4399040639400482
keywords[5].display_name	Feature learning
keywords[6].id	https://openalex.org/keywords/machine-learning
keywords[6].score	0.21887359023094177
keywords[6].display_name	Machine learning
keywords[7].id	https://openalex.org/keywords/political-science
keywords[7].score	0.05697512626647949
keywords[7].display_name	Political science
language	en
locations[0].id	pmh:oai:arXiv.org:2407.03788
locations[0].is_oa	True
locations[0].source.id	https://openalex.org/S4306400194
locations[0].source.issn
locations[0].source.type	repository
locations[0].source.is_oa	True
locations[0].source.issn_l
locations[0].source.is_core	False
locations[0].source.is_in_doaj	False
locations[0].source.display_name	arXiv (Cornell University)
locations[0].source.host_organization	https://openalex.org/I205783295
locations[0].source.host_organization_name	Cornell University
locations[0].source.host_organization_lineage	https://openalex.org/I205783295
locations[0].license
locations[0].pdf_url	https://arxiv.org/pdf/2407.03788
locations[0].version	submittedVersion
locations[0].raw_type
locations[0].license_id
locations[0].is_accepted	False
locations[0].is_published	False
locations[0].raw_source_name
locations[0].landing_page_url	http://arxiv.org/abs/2407.03788
locations[1].id	doi:10.48550/arxiv.2407.03788
locations[1].is_oa	True
locations[1].source.id	https://openalex.org/S4306400194
locations[1].source.issn
locations[1].source.type	repository
locations[1].source.is_oa	True
locations[1].source.issn_l
locations[1].source.is_core	False
locations[1].source.is_in_doaj	False
locations[1].source.display_name	arXiv (Cornell University)
locations[1].source.host_organization	https://openalex.org/I205783295
locations[1].source.host_organization_name	Cornell University
locations[1].source.host_organization_lineage	https://openalex.org/I205783295
locations[1].license	cc-by
locations[1].pdf_url
locations[1].version
locations[1].raw_type	article
locations[1].license_id	https://openalex.org/licenses/cc-by
locations[1].is_accepted	False
locations[1].is_published
locations[1].raw_source_name
locations[1].landing_page_url	https://doi.org/10.48550/arxiv.2407.03788
indexed_in	arxiv, datacite
authorships[0].author.id	https://openalex.org/A5101561868
authorships[0].author.orcid	https://orcid.org/0000-0003-3954-5131
authorships[0].author.display_name	T. Q. Nguyen
authorships[0].author_position	first
authorships[0].raw_author_name	Nguyen, Thong
authorships[0].is_corresponding	False
authorships[1].author.id	https://openalex.org/A5084911441
authorships[1].author.orcid	https://orcid.org/0000-0001-5840-2086
authorships[1].author.display_name	Bin Yi
authorships[1].author_position	middle
authorships[1].raw_author_name	Bin, Yi
authorships[1].is_corresponding	False
authorships[2].author.id	https://openalex.org/A5011376608
authorships[2].author.orcid	https://orcid.org/0000-0003-0076-3924
authorships[2].author.display_name	Xiaobao Wu
authorships[2].author_position	middle
authorships[2].raw_author_name	Wu, Xiaobao
authorships[2].is_corresponding	False
authorships[3].author.id	https://openalex.org/A5079198353
authorships[3].author.orcid
authorships[3].author.display_name	Xinshuai Dong
authorships[3].author_position	middle
authorships[3].raw_author_name	Dong, Xinshuai
authorships[3].is_corresponding	False
authorships[4].author.id	https://openalex.org/A5101468117
authorships[4].author.orcid	https://orcid.org/0000-0003-4095-0249
authorships[4].author.display_name	Zhiyuan Hu
authorships[4].author_position	middle
authorships[4].raw_author_name	Hu, Zhiyuan
authorships[4].is_corresponding	False
authorships[5].author.id	https://openalex.org/A5028581687
authorships[5].author.orcid	https://orcid.org/0000-0003-2250-0818
authorships[5].author.display_name	Khoi M. Le
authorships[5].author_position	middle
authorships[5].raw_author_name	Le, Khoi
authorships[5].is_corresponding	False
authorships[6].author.id	https://openalex.org/A5025645183
authorships[6].author.orcid	https://orcid.org/0000-0002-0931-460X
authorships[6].author.display_name	Cong-Duy Nguyen
authorships[6].author_position	middle
authorships[6].raw_author_name	Nguyen, Cong-Duy
authorships[6].is_corresponding	False
authorships[7].author.id	https://openalex.org/A5090171111
authorships[7].author.orcid	https://orcid.org/0000-0001-6565-7511
authorships[7].author.display_name	See-Kiong Ng
authorships[7].author_position	middle
authorships[7].raw_author_name	Ng, See-Kiong
authorships[7].is_corresponding	False
authorships[8].author.id	https://openalex.org/A5001659855
authorships[8].author.orcid	https://orcid.org/0000-0001-6062-207X
authorships[8].author.display_name	Luu Anh Tuan
authorships[8].author_position	last
authorships[8].raw_author_name	Tuan, Luu Anh
authorships[8].is_corresponding	False
has_content.pdf	False
has_content.grobid_xml	False
is_paratext	False
open_access.is_oa	True
open_access.oa_url	https://arxiv.org/pdf/2407.03788
open_access.oa_status	green
open_access.any_repository_has_fulltext	False
created_date	2024-07-09T00:00:00
display_name	MAMA: Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning
has_fulltext	False
is_retracted	False
updated_date	2025-11-06T06:51:31.235846
primary_topic.id	https://openalex.org/T11714
primary_topic.field.id	https://openalex.org/fields/17
primary_topic.field.display_name	Computer Science
primary_topic.score	0.9941999912261963
primary_topic.domain.id	https://openalex.org/domains/3
primary_topic.domain.display_name	Physical Sciences
primary_topic.subfield.id	https://openalex.org/subfields/1707
primary_topic.subfield.display_name	Computer Vision and Pattern Recognition
primary_topic.display_name	Multimodal Machine Learning Applications
related_works	https://openalex.org/W3125011624, https://openalex.org/W1508631387, https://openalex.org/W2370917603, https://openalex.org/W2952760143, https://openalex.org/W2017776670, https://openalex.org/W2347897961, https://openalex.org/W2340870721, https://openalex.org/W2358318464, https://openalex.org/W2979236518, https://openalex.org/W3204019825
cited_by_count	0
locations_count	2
best_oa_location.id	pmh:oai:arXiv.org:2407.03788
best_oa_location.is_oa	True
best_oa_location.source.id	https://openalex.org/S4306400194
best_oa_location.source.issn
best_oa_location.source.type	repository
best_oa_location.source.is_oa	True
best_oa_location.source.issn_l
best_oa_location.source.is_core	False
best_oa_location.source.is_in_doaj	False
best_oa_location.source.display_name	arXiv (Cornell University)
best_oa_location.source.host_organization	https://openalex.org/I205783295
best_oa_location.source.host_organization_name	Cornell University
best_oa_location.source.host_organization_lineage	https://openalex.org/I205783295
best_oa_location.license
best_oa_location.pdf_url	https://arxiv.org/pdf/2407.03788
best_oa_location.version	submittedVersion
best_oa_location.raw_type
best_oa_location.license_id
best_oa_location.is_accepted	False
best_oa_location.is_published	False
best_oa_location.raw_source_name
best_oa_location.landing_page_url	http://arxiv.org/abs/2407.03788
primary_location.id	pmh:oai:arXiv.org:2407.03788
primary_location.is_oa	True
primary_location.source.id	https://openalex.org/S4306400194
primary_location.source.issn
primary_location.source.type	repository
primary_location.source.is_oa	True
primary_location.source.issn_l
primary_location.source.is_core	False
primary_location.source.is_in_doaj	False
primary_location.source.display_name	arXiv (Cornell University)
primary_location.source.host_organization	https://openalex.org/I205783295
primary_location.source.host_organization_name	Cornell University
primary_location.source.host_organization_lineage	https://openalex.org/I205783295
primary_location.license
primary_location.pdf_url	https://arxiv.org/pdf/2407.03788
primary_location.version	submittedVersion
primary_location.raw_type
primary_location.license_id
primary_location.is_accepted	False
primary_location.is_published	False
primary_location.raw_source_name
primary_location.landing_page_url	http://arxiv.org/abs/2407.03788
publication_date	2024-07-04
publication_year	2024
referenced_works_count	0
abstract_inverted_index.a	66, 75, 79, 104, 133
abstract_inverted_index.To	59
abstract_inverted_index.an	46
abstract_inverted_index.at	3, 176
abstract_inverted_index.by	73, 132, 141, 145
abstract_inverted_index.do	21, 35
abstract_inverted_index.in	17, 87
abstract_inverted_index.of	6, 10, 49, 121, 136
abstract_inverted_index.on	157
abstract_inverted_index.to	31, 69, 83, 90, 95, 97, 114
abstract_inverted_index.we	63
abstract_inverted_index.The	167
abstract_inverted_index.and	139, 153, 163, 170
abstract_inverted_index.new	67
abstract_inverted_index.not	22, 36
abstract_inverted_index.the	4, 8, 53, 98, 122, 126, 129
abstract_inverted_index.Data	0
abstract_inverted_index.MAMA	102, 149
abstract_inverted_index.With	128
abstract_inverted_index.also	44
abstract_inverted_index.been	173
abstract_inverted_index.data	19, 43, 143, 171
abstract_inverted_index.each	26
abstract_inverted_index.have	172
abstract_inverted_index.lead	30
abstract_inverted_index.loss	112
abstract_inverted_index.made	174
abstract_inverted_index.maps	111
abstract_inverted_index.that	34, 110
abstract_inverted_index.used	159
abstract_inverted_index.with	25, 78
abstract_inverted_index.MAMA,	65
abstract_inverted_index.adapt	96
abstract_inverted_index.align	23
abstract_inverted_index.code,	168
abstract_inverted_index.focus	124
abstract_inverted_index.large	146
abstract_inverted_index.might	29
abstract_inverted_index.pairs	16
abstract_inverted_index.reach	91
abstract_inverted_index.small	134
abstract_inverted_index.their	88
abstract_inverted_index.these	61
abstract_inverted_index.video	160
abstract_inverted_index.which	28, 117
abstract_inverted_index.across	56
abstract_inverted_index.amount	135
abstract_inverted_index.effort	89
abstract_inverted_index.enable	118
abstract_inverted_index.guided	131
abstract_inverted_index.margin	82
abstract_inverted_index.model,	148, 169
abstract_inverted_index.other,	27
abstract_inverted_index.sample	115
abstract_inverted_index.stands	2
abstract_inverted_index.uneven	47
abstract_inverted_index.values	113
abstract_inverted_index.achieve	154
abstract_inverted_index.address	60
abstract_inverted_index.angular	81
abstract_inverted_index.concept	100
abstract_inverted_index.dynamic	119
abstract_inverted_index.model's	123
abstract_inverted_index.perfect	92
abstract_inverted_index.possess	45
abstract_inverted_index.propose	64
abstract_inverted_index.quality	1
abstract_inverted_index.reflect	38
abstract_inverted_index.thereby	51
abstract_inverted_index.weights	116
abstract_inverted_index.However,	14
abstract_inverted_index.approach	68
abstract_inverted_index.commonly	158
abstract_inverted_index.deciding	7
abstract_inverted_index.function	109
abstract_inverted_index.improves	150
abstract_inverted_index.learning	70
abstract_inverted_index.previous	18, 42
abstract_inverted_index.question	161
abstract_inverted_index.superior	155
abstract_inverted_index.training	130
abstract_inverted_index.unbiased	137
abstract_inverted_index.utilizes	103
abstract_inverted_index.Moreover,	41
abstract_inverted_index.answering	162
abstract_inverted_index.augmented	140
abstract_inverted_index.available	175
abstract_inverted_index.concepts,	50
abstract_inverted_index.datasets.	166
abstract_inverted_index.forefront	5
abstract_inverted_index.generated	144
abstract_inverted_index.hampering	52
abstract_inverted_index.learning.	13
abstract_inverted_index.meta-data	138
abstract_inverted_index.objective	77
abstract_inverted_index.perfectly	24
abstract_inverted_index.problems,	62
abstract_inverted_index.retrieval	165
abstract_inverted_index.subjects.	58
abstract_inverted_index.training.	127
abstract_inverted_index.typically	20
abstract_inverted_index.unpopular	57
abstract_inverted_index.utilizing	74
abstract_inverted_index.weighting	108
abstract_inverted_index.accurately	37
abstract_inverted_index.adjustment	120
abstract_inverted_index.downstream	54
abstract_inverted_index.perceptron	106
abstract_inverted_index.regularize	84
abstract_inverted_index.semantics.	40
abstract_inverted_index.text-video	164
abstract_inverted_index.throughout	125
abstract_inverted_index.video-text	15, 142
abstract_inverted_index.contrastive	76
abstract_inverted_index.cross-modal	39, 85
abstract_inverted_index.multi-layer	105
abstract_inverted_index.non-uniform	99
abstract_inverted_index.performance	55
abstract_inverted_index.similarity.	93
abstract_inverted_index.subtractive	80
abstract_inverted_index.Furthermore,	94
abstract_inverted_index.distribution	48
abstract_inverted_index.performances	156
abstract_inverted_index.distribution,	101
abstract_inverted_index.effectiveness	9
abstract_inverted_index.representation	12
abstract_inverted_index.video-language	11, 32, 71, 151
abstract_inverted_index.representations	33, 72, 86, 152
abstract_inverted_index.vision-language	147
abstract_inverted_index.(MLP)-parameterized	107
abstract_inverted_index.https://nguyentthong.github.io/MAMA.	177
cited_by_percentile_year
countries_distinct_count	0
institutions_distinct_count	9
citation_normalized_percentile