Contrastive Cross-Modal Knowledge Sharing Pre-training for Vision-Language Representation Learning and Retrieval Article Swipe
YOU?
·
· 2022
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2207.00733
Recently, the cross-modal pre-training task has been a hotspot because of its wide application in various down-streaming researches including retrieval, captioning, question answering and so on. However, exiting methods adopt a one-stream pre-training model to explore the united vision-language representation for conducting cross-modal retrieval, which easily suffer from the calculation explosion. Moreover, although the conventional double-stream structures are quite efficient, they still lack the vital cross-modal interactions, resulting in low performances. Motivated by these challenges, we put forward a Contrastive Cross-Modal Knowledge Sharing Pre-training (COOKIE) to grasp the joint text-image representations. Structurally, COOKIE adopts the traditional double-stream structure because of the acceptable time consumption. To overcome the inherent defects of double-stream structure as mentioned above, we elaborately design two effective modules. Concretely, the first module is a weight-sharing transformer that builds on the head of the visual and textual encoders, aiming to semantically align text and image. This design enables visual and textual paths focus on the same semantics. The other one is three specially designed contrastive learning, aiming to share knowledge between different models. The shared cross-modal knowledge develops the study of unimodal representation greatly, promoting the single-modal retrieval tasks. Extensive experimental results on multi-modal matching researches that includes cross-modal retrieval, text matching, and image retrieval reveal the superiors in calculation efficiency and statistical indicators of our pre-training model.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2207.00733
- https://arxiv.org/pdf/2207.00733
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4283823073
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4283823073Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2207.00733Digital Object Identifier
- Title
-
Contrastive Cross-Modal Knowledge Sharing Pre-training for Vision-Language Representation Learning and RetrievalWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2022Year of publication
- Publication date
-
2022-07-02Full publication date if available
- Authors
-
Keyu Wen, Zhenshan Tan, Qingrong Cheng, Cheng Chen, Xiaodong GuList of authors in order
- Landing page
-
https://arxiv.org/abs/2207.00733Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2207.00733Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2207.00733Direct OA link when available
- Concepts
-
Computer science, Modal, Natural language processing, Artificial intelligence, Feature learning, GRASP, Closed captioning, Transformer, Image (mathematics), Polymer chemistry, Programming language, Quantum mechanics, Chemistry, Physics, VoltageTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4283823073 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2207.00733 |
| ids.doi | https://doi.org/10.48550/arxiv.2207.00733 |
| ids.openalex | https://openalex.org/W4283823073 |
| fwci | |
| type | preprint |
| title | Contrastive Cross-Modal Knowledge Sharing Pre-training for Vision-Language Representation Learning and Retrieval |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11714 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 1.0 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Multimodal Machine Learning Applications |
| topics[1].id | https://openalex.org/T10627 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9983999729156494 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1707 |
| topics[1].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[1].display_name | Advanced Image and Video Retrieval Techniques |
| topics[2].id | https://openalex.org/T11307 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9854999780654907 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Domain Adaptation and Few-Shot Learning |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.8460104465484619 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C71139939 |
| concepts[1].level | 2 |
| concepts[1].score | 0.6826146245002747 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q910194 |
| concepts[1].display_name | Modal |
| concepts[2].id | https://openalex.org/C204321447 |
| concepts[2].level | 1 |
| concepts[2].score | 0.49586352705955505 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[2].display_name | Natural language processing |
| concepts[3].id | https://openalex.org/C154945302 |
| concepts[3].level | 1 |
| concepts[3].score | 0.49534526467323303 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[3].display_name | Artificial intelligence |
| concepts[4].id | https://openalex.org/C59404180 |
| concepts[4].level | 2 |
| concepts[4].score | 0.48526284098625183 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q17013334 |
| concepts[4].display_name | Feature learning |
| concepts[5].id | https://openalex.org/C171268870 |
| concepts[5].level | 2 |
| concepts[5].score | 0.4510499835014343 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q1486676 |
| concepts[5].display_name | GRASP |
| concepts[6].id | https://openalex.org/C157657479 |
| concepts[6].level | 3 |
| concepts[6].score | 0.43807587027549744 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q2367247 |
| concepts[6].display_name | Closed captioning |
| concepts[7].id | https://openalex.org/C66322947 |
| concepts[7].level | 3 |
| concepts[7].score | 0.4326328635215759 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q11658 |
| concepts[7].display_name | Transformer |
| concepts[8].id | https://openalex.org/C115961682 |
| concepts[8].level | 2 |
| concepts[8].score | 0.1610700786113739 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q860623 |
| concepts[8].display_name | Image (mathematics) |
| concepts[9].id | https://openalex.org/C188027245 |
| concepts[9].level | 1 |
| concepts[9].score | 0.0 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q750446 |
| concepts[9].display_name | Polymer chemistry |
| concepts[10].id | https://openalex.org/C199360897 |
| concepts[10].level | 1 |
| concepts[10].score | 0.0 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q9143 |
| concepts[10].display_name | Programming language |
| concepts[11].id | https://openalex.org/C62520636 |
| concepts[11].level | 1 |
| concepts[11].score | 0.0 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q944 |
| concepts[11].display_name | Quantum mechanics |
| concepts[12].id | https://openalex.org/C185592680 |
| concepts[12].level | 0 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q2329 |
| concepts[12].display_name | Chemistry |
| concepts[13].id | https://openalex.org/C121332964 |
| concepts[13].level | 0 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q413 |
| concepts[13].display_name | Physics |
| concepts[14].id | https://openalex.org/C165801399 |
| concepts[14].level | 2 |
| concepts[14].score | 0.0 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q25428 |
| concepts[14].display_name | Voltage |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.8460104465484619 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/modal |
| keywords[1].score | 0.6826146245002747 |
| keywords[1].display_name | Modal |
| keywords[2].id | https://openalex.org/keywords/natural-language-processing |
| keywords[2].score | 0.49586352705955505 |
| keywords[2].display_name | Natural language processing |
| keywords[3].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[3].score | 0.49534526467323303 |
| keywords[3].display_name | Artificial intelligence |
| keywords[4].id | https://openalex.org/keywords/feature-learning |
| keywords[4].score | 0.48526284098625183 |
| keywords[4].display_name | Feature learning |
| keywords[5].id | https://openalex.org/keywords/grasp |
| keywords[5].score | 0.4510499835014343 |
| keywords[5].display_name | GRASP |
| keywords[6].id | https://openalex.org/keywords/closed-captioning |
| keywords[6].score | 0.43807587027549744 |
| keywords[6].display_name | Closed captioning |
| keywords[7].id | https://openalex.org/keywords/transformer |
| keywords[7].score | 0.4326328635215759 |
| keywords[7].display_name | Transformer |
| keywords[8].id | https://openalex.org/keywords/image |
| keywords[8].score | 0.1610700786113739 |
| keywords[8].display_name | Image (mathematics) |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2207.00733 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2207.00733 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2207.00733 |
| locations[1].id | doi:10.48550/arxiv.2207.00733 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2207.00733 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5004061050 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-5048-9014 |
| authorships[0].author.display_name | Keyu Wen |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Wen, Keyu |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5064787764 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-3466-5417 |
| authorships[1].author.display_name | Zhenshan Tan |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Tan, Zhenshan |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5101178676 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Qingrong Cheng |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Cheng, Qingrong |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5100420600 |
| authorships[3].author.orcid | https://orcid.org/0000-0003-3662-0263 |
| authorships[3].author.display_name | Cheng Chen |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Chen, Cheng |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5101294804 |
| authorships[4].author.orcid | |
| authorships[4].author.display_name | Xiaodong Gu |
| authorships[4].author_position | last |
| authorships[4].raw_author_name | Gu, Xiaodong |
| authorships[4].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2207.00733 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2022-07-07T00:00:00 |
| display_name | Contrastive Cross-Modal Knowledge Sharing Pre-training for Vision-Language Representation Learning and Retrieval |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11714 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 1.0 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Multimodal Machine Learning Applications |
| related_works | https://openalex.org/W4210416330, https://openalex.org/W2775506363, https://openalex.org/W3088136942, https://openalex.org/W4290852288, https://openalex.org/W4310447809, https://openalex.org/W4200243030, https://openalex.org/W2800782462, https://openalex.org/W3209117276, https://openalex.org/W4388184981, https://openalex.org/W4323777661 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2207.00733 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2207.00733 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2207.00733 |
| primary_location.id | pmh:oai:arXiv.org:2207.00733 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2207.00733 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2207.00733 |
| publication_date | 2022-07-02 |
| publication_year | 2022 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 7, 30, 78, 126 |
| abstract_inverted_index.To | 104 |
| abstract_inverted_index.as | 112 |
| abstract_inverted_index.by | 72 |
| abstract_inverted_index.in | 14, 68, 210 |
| abstract_inverted_index.is | 125, 162 |
| abstract_inverted_index.of | 10, 99, 109, 134, 182, 216 |
| abstract_inverted_index.on | 131, 155, 194 |
| abstract_inverted_index.so | 24 |
| abstract_inverted_index.to | 34, 85, 141, 169 |
| abstract_inverted_index.we | 75, 115 |
| abstract_inverted_index.The | 159, 175 |
| abstract_inverted_index.and | 23, 137, 145, 151, 204, 213 |
| abstract_inverted_index.are | 57 |
| abstract_inverted_index.for | 40 |
| abstract_inverted_index.has | 5 |
| abstract_inverted_index.its | 11 |
| abstract_inverted_index.low | 69 |
| abstract_inverted_index.on. | 25 |
| abstract_inverted_index.one | 161 |
| abstract_inverted_index.our | 217 |
| abstract_inverted_index.put | 76 |
| abstract_inverted_index.the | 1, 36, 48, 53, 63, 87, 94, 100, 106, 122, 132, 135, 156, 180, 187, 208 |
| abstract_inverted_index.two | 118 |
| abstract_inverted_index.This | 147 |
| abstract_inverted_index.been | 6 |
| abstract_inverted_index.from | 47 |
| abstract_inverted_index.head | 133 |
| abstract_inverted_index.lack | 62 |
| abstract_inverted_index.same | 157 |
| abstract_inverted_index.task | 4 |
| abstract_inverted_index.text | 144, 202 |
| abstract_inverted_index.that | 129, 198 |
| abstract_inverted_index.they | 60 |
| abstract_inverted_index.time | 102 |
| abstract_inverted_index.wide | 12 |
| abstract_inverted_index.adopt | 29 |
| abstract_inverted_index.align | 143 |
| abstract_inverted_index.first | 123 |
| abstract_inverted_index.focus | 154 |
| abstract_inverted_index.grasp | 86 |
| abstract_inverted_index.image | 205 |
| abstract_inverted_index.joint | 88 |
| abstract_inverted_index.model | 33 |
| abstract_inverted_index.other | 160 |
| abstract_inverted_index.paths | 153 |
| abstract_inverted_index.quite | 58 |
| abstract_inverted_index.share | 170 |
| abstract_inverted_index.still | 61 |
| abstract_inverted_index.study | 181 |
| abstract_inverted_index.these | 73 |
| abstract_inverted_index.three | 163 |
| abstract_inverted_index.vital | 64 |
| abstract_inverted_index.which | 44 |
| abstract_inverted_index.COOKIE | 92 |
| abstract_inverted_index.above, | 114 |
| abstract_inverted_index.adopts | 93 |
| abstract_inverted_index.aiming | 140, 168 |
| abstract_inverted_index.builds | 130 |
| abstract_inverted_index.design | 117, 148 |
| abstract_inverted_index.easily | 45 |
| abstract_inverted_index.image. | 146 |
| abstract_inverted_index.model. | 219 |
| abstract_inverted_index.module | 124 |
| abstract_inverted_index.reveal | 207 |
| abstract_inverted_index.shared | 176 |
| abstract_inverted_index.suffer | 46 |
| abstract_inverted_index.tasks. | 190 |
| abstract_inverted_index.united | 37 |
| abstract_inverted_index.visual | 136, 150 |
| abstract_inverted_index.Sharing | 82 |
| abstract_inverted_index.because | 9, 98 |
| abstract_inverted_index.between | 172 |
| abstract_inverted_index.defects | 108 |
| abstract_inverted_index.enables | 149 |
| abstract_inverted_index.exiting | 27 |
| abstract_inverted_index.explore | 35 |
| abstract_inverted_index.forward | 77 |
| abstract_inverted_index.hotspot | 8 |
| abstract_inverted_index.methods | 28 |
| abstract_inverted_index.models. | 174 |
| abstract_inverted_index.results | 193 |
| abstract_inverted_index.textual | 138, 152 |
| abstract_inverted_index.various | 15 |
| abstract_inverted_index.(COOKIE) | 84 |
| abstract_inverted_index.However, | 26 |
| abstract_inverted_index.although | 52 |
| abstract_inverted_index.designed | 165 |
| abstract_inverted_index.develops | 179 |
| abstract_inverted_index.greatly, | 185 |
| abstract_inverted_index.includes | 199 |
| abstract_inverted_index.inherent | 107 |
| abstract_inverted_index.matching | 196 |
| abstract_inverted_index.modules. | 120 |
| abstract_inverted_index.overcome | 105 |
| abstract_inverted_index.question | 21 |
| abstract_inverted_index.unimodal | 183 |
| abstract_inverted_index.Extensive | 191 |
| abstract_inverted_index.Knowledge | 81 |
| abstract_inverted_index.Moreover, | 51 |
| abstract_inverted_index.Motivated | 71 |
| abstract_inverted_index.Recently, | 0 |
| abstract_inverted_index.answering | 22 |
| abstract_inverted_index.different | 173 |
| abstract_inverted_index.effective | 119 |
| abstract_inverted_index.encoders, | 139 |
| abstract_inverted_index.including | 18 |
| abstract_inverted_index.knowledge | 171, 178 |
| abstract_inverted_index.learning, | 167 |
| abstract_inverted_index.matching, | 203 |
| abstract_inverted_index.mentioned | 113 |
| abstract_inverted_index.promoting | 186 |
| abstract_inverted_index.resulting | 67 |
| abstract_inverted_index.retrieval | 189, 206 |
| abstract_inverted_index.specially | 164 |
| abstract_inverted_index.structure | 97, 111 |
| abstract_inverted_index.superiors | 209 |
| abstract_inverted_index.acceptable | 101 |
| abstract_inverted_index.conducting | 41 |
| abstract_inverted_index.efficiency | 212 |
| abstract_inverted_index.efficient, | 59 |
| abstract_inverted_index.explosion. | 50 |
| abstract_inverted_index.indicators | 215 |
| abstract_inverted_index.one-stream | 31 |
| abstract_inverted_index.researches | 17, 197 |
| abstract_inverted_index.retrieval, | 19, 43, 201 |
| abstract_inverted_index.semantics. | 158 |
| abstract_inverted_index.structures | 56 |
| abstract_inverted_index.text-image | 89 |
| abstract_inverted_index.Concretely, | 121 |
| abstract_inverted_index.Contrastive | 79 |
| abstract_inverted_index.Cross-Modal | 80 |
| abstract_inverted_index.application | 13 |
| abstract_inverted_index.calculation | 49, 211 |
| abstract_inverted_index.captioning, | 20 |
| abstract_inverted_index.challenges, | 74 |
| abstract_inverted_index.contrastive | 166 |
| abstract_inverted_index.cross-modal | 2, 42, 65, 177, 200 |
| abstract_inverted_index.elaborately | 116 |
| abstract_inverted_index.multi-modal | 195 |
| abstract_inverted_index.statistical | 214 |
| abstract_inverted_index.traditional | 95 |
| abstract_inverted_index.transformer | 128 |
| abstract_inverted_index.Pre-training | 83 |
| abstract_inverted_index.consumption. | 103 |
| abstract_inverted_index.conventional | 54 |
| abstract_inverted_index.experimental | 192 |
| abstract_inverted_index.pre-training | 3, 32, 218 |
| abstract_inverted_index.semantically | 142 |
| abstract_inverted_index.single-modal | 188 |
| abstract_inverted_index.Structurally, | 91 |
| abstract_inverted_index.double-stream | 55, 96, 110 |
| abstract_inverted_index.interactions, | 66 |
| abstract_inverted_index.performances. | 70 |
| abstract_inverted_index.down-streaming | 16 |
| abstract_inverted_index.representation | 39, 184 |
| abstract_inverted_index.weight-sharing | 127 |
| abstract_inverted_index.vision-language | 38 |
| abstract_inverted_index.representations. | 90 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 5 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/4 |
| sustainable_development_goals[0].score | 0.75 |
| sustainable_development_goals[0].display_name | Quality Education |
| citation_normalized_percentile |