Multimodal LLM-based Query Paraphrasing for Video Search Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2407.12341
Text-to-video retrieval answers user queries through searches based on concepts and embeddings. However, due to limitations in the size of the concept bank and the amount of training data, answering queries in the wild is not always effective because of the out-of-vocabulary problem. Furthermore, neither concept-based nor embedding-based search can perform reasoning to consolidate search results for complex queries that include logical and spatial constraints. To address these challenges, we leverage large language models (LLMs) to paraphrase queries using text-to-text (T2T), text-to-image (T2I), and image-to-text (I2T) transformations. These transformations rephrase abstract concepts into simpler terms to mitigate the out-of-vocabulary problem. Additionally, complex relationships within a query can be decomposed into simpler sub-queries, improving retrieval performance by effectively fusing the search results of these sub-queries. To mitigate the issue of LLM hallucination, this paper also proposes a novel consistency-based verification strategy to filter out factually incorrect paraphrased queries. Extensive experiments are conducted for ad-hoc video search and known-item search on the TRECVid datasets. We provide empirical insights into how traditionally difficult-to-answer queries can be effectively resolved through query paraphrasing.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2407.12341
- https://arxiv.org/pdf/2407.12341
- OA Status
- green
- Cited By
- 1
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4402345736
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4402345736Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2407.12341Digital Object Identifier
- Title
-
Multimodal LLM-based Query Paraphrasing for Video SearchWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-07-17Full publication date if available
- Authors
-
Jiaxin Wu, Chong‐Wah Ngo, W. K. Chan, Sheng-hua ZhongList of authors in order
- Landing page
-
https://arxiv.org/abs/2407.12341Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2407.12341Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2407.12341Direct OA link when available
- Concepts
-
Information retrieval, Computer science, Query expansion, Web search query, Search engineTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2024: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4402345736 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2407.12341 |
| ids.doi | https://doi.org/10.48550/arxiv.2407.12341 |
| ids.openalex | https://openalex.org/W4402345736 |
| fwci | |
| type | preprint |
| title | Multimodal LLM-based Query Paraphrasing for Video Search |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10627 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9987999796867371 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Advanced Image and Video Retrieval Techniques |
| topics[1].id | https://openalex.org/T10824 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9973000288009644 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1707 |
| topics[1].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[1].display_name | Image Retrieval and Classification Techniques |
| topics[2].id | https://openalex.org/T11714 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9887999892234802 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1707 |
| topics[2].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[2].display_name | Multimodal Machine Learning Applications |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C23123220 |
| concepts[0].level | 1 |
| concepts[0].score | 0.701397180557251 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q816826 |
| concepts[0].display_name | Information retrieval |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.6406830549240112 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C99016210 |
| concepts[2].level | 2 |
| concepts[2].score | 0.4724317491054535 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q5488129 |
| concepts[2].display_name | Query expansion |
| concepts[3].id | https://openalex.org/C164120249 |
| concepts[3].level | 3 |
| concepts[3].score | 0.4299573302268982 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q995982 |
| concepts[3].display_name | Web search query |
| concepts[4].id | https://openalex.org/C97854310 |
| concepts[4].level | 2 |
| concepts[4].score | 0.23851177096366882 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q19541 |
| concepts[4].display_name | Search engine |
| keywords[0].id | https://openalex.org/keywords/information-retrieval |
| keywords[0].score | 0.701397180557251 |
| keywords[0].display_name | Information retrieval |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.6406830549240112 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/query-expansion |
| keywords[2].score | 0.4724317491054535 |
| keywords[2].display_name | Query expansion |
| keywords[3].id | https://openalex.org/keywords/web-search-query |
| keywords[3].score | 0.4299573302268982 |
| keywords[3].display_name | Web search query |
| keywords[4].id | https://openalex.org/keywords/search-engine |
| keywords[4].score | 0.23851177096366882 |
| keywords[4].display_name | Search engine |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2407.12341 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2407.12341 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2407.12341 |
| locations[1].id | doi:10.48550/arxiv.2407.12341 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2407.12341 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5056908913 |
| authorships[0].author.orcid | https://orcid.org/0000-0003-4074-3442 |
| authorships[0].author.display_name | Jiaxin Wu |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Wu, Jiaxin |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5010722442 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-4182-8261 |
| authorships[1].author.display_name | Chong‐Wah Ngo |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Ngo, Chong-Wah |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5020936420 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-7726-6235 |
| authorships[2].author.display_name | W. K. Chan |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Chan, Wing-Kwong |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5086801574 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-7524-5999 |
| authorships[3].author.display_name | Sheng-hua Zhong |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Zhong, Sheng-Hua |
| authorships[3].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2407.12341 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2024-09-09T00:00:00 |
| display_name | Multimodal LLM-based Query Paraphrasing for Video Search |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10627 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9987999796867371 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Advanced Image and Video Retrieval Techniques |
| related_works | https://openalex.org/W2096359267, https://openalex.org/W1521725692, https://openalex.org/W3008917487, https://openalex.org/W3001245047, https://openalex.org/W1873153460, https://openalex.org/W2901901036, https://openalex.org/W3197639690, https://openalex.org/W2044231962, https://openalex.org/W2170059263, https://openalex.org/W2961567132 |
| cited_by_count | 1 |
| counts_by_year[0].year | 2024 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2407.12341 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2407.12341 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2407.12341 |
| primary_location.id | pmh:oai:arXiv.org:2407.12341 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2407.12341 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2407.12341 |
| publication_date | 2024-07-17 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 104, 135 |
| abstract_inverted_index.To | 65, 124 |
| abstract_inverted_index.We | 162 |
| abstract_inverted_index.be | 107, 172 |
| abstract_inverted_index.by | 115 |
| abstract_inverted_index.in | 16, 31 |
| abstract_inverted_index.is | 34 |
| abstract_inverted_index.of | 19, 26, 39, 121, 128 |
| abstract_inverted_index.on | 8, 158 |
| abstract_inverted_index.to | 14, 52, 75, 95, 140 |
| abstract_inverted_index.we | 69 |
| abstract_inverted_index.LLM | 129 |
| abstract_inverted_index.and | 10, 23, 62, 83, 155 |
| abstract_inverted_index.are | 149 |
| abstract_inverted_index.can | 49, 106, 171 |
| abstract_inverted_index.due | 13 |
| abstract_inverted_index.for | 56, 151 |
| abstract_inverted_index.how | 167 |
| abstract_inverted_index.nor | 46 |
| abstract_inverted_index.not | 35 |
| abstract_inverted_index.out | 142 |
| abstract_inverted_index.the | 17, 20, 24, 32, 40, 97, 118, 126, 159 |
| abstract_inverted_index.also | 133 |
| abstract_inverted_index.bank | 22 |
| abstract_inverted_index.into | 92, 109, 166 |
| abstract_inverted_index.size | 18 |
| abstract_inverted_index.that | 59 |
| abstract_inverted_index.this | 131 |
| abstract_inverted_index.user | 3 |
| abstract_inverted_index.wild | 33 |
| abstract_inverted_index.(I2T) | 85 |
| abstract_inverted_index.These | 87 |
| abstract_inverted_index.based | 7 |
| abstract_inverted_index.data, | 28 |
| abstract_inverted_index.issue | 127 |
| abstract_inverted_index.large | 71 |
| abstract_inverted_index.novel | 136 |
| abstract_inverted_index.paper | 132 |
| abstract_inverted_index.query | 105, 176 |
| abstract_inverted_index.terms | 94 |
| abstract_inverted_index.these | 67, 122 |
| abstract_inverted_index.using | 78 |
| abstract_inverted_index.video | 153 |
| abstract_inverted_index.(LLMs) | 74 |
| abstract_inverted_index.(T2I), | 82 |
| abstract_inverted_index.(T2T), | 80 |
| abstract_inverted_index.ad-hoc | 152 |
| abstract_inverted_index.always | 36 |
| abstract_inverted_index.amount | 25 |
| abstract_inverted_index.filter | 141 |
| abstract_inverted_index.fusing | 117 |
| abstract_inverted_index.models | 73 |
| abstract_inverted_index.search | 48, 54, 119, 154, 157 |
| abstract_inverted_index.within | 103 |
| abstract_inverted_index.TRECVid | 160 |
| abstract_inverted_index.address | 66 |
| abstract_inverted_index.answers | 2 |
| abstract_inverted_index.because | 38 |
| abstract_inverted_index.complex | 57, 101 |
| abstract_inverted_index.concept | 21 |
| abstract_inverted_index.include | 60 |
| abstract_inverted_index.logical | 61 |
| abstract_inverted_index.neither | 44 |
| abstract_inverted_index.perform | 50 |
| abstract_inverted_index.provide | 163 |
| abstract_inverted_index.queries | 4, 30, 58, 77, 170 |
| abstract_inverted_index.results | 55, 120 |
| abstract_inverted_index.simpler | 93, 110 |
| abstract_inverted_index.spatial | 63 |
| abstract_inverted_index.through | 5, 175 |
| abstract_inverted_index.However, | 12 |
| abstract_inverted_index.abstract | 90 |
| abstract_inverted_index.concepts | 9, 91 |
| abstract_inverted_index.insights | 165 |
| abstract_inverted_index.language | 72 |
| abstract_inverted_index.leverage | 70 |
| abstract_inverted_index.mitigate | 96, 125 |
| abstract_inverted_index.problem. | 42, 99 |
| abstract_inverted_index.proposes | 134 |
| abstract_inverted_index.queries. | 146 |
| abstract_inverted_index.rephrase | 89 |
| abstract_inverted_index.resolved | 174 |
| abstract_inverted_index.searches | 6 |
| abstract_inverted_index.strategy | 139 |
| abstract_inverted_index.training | 27 |
| abstract_inverted_index.Extensive | 147 |
| abstract_inverted_index.answering | 29 |
| abstract_inverted_index.conducted | 150 |
| abstract_inverted_index.datasets. | 161 |
| abstract_inverted_index.effective | 37 |
| abstract_inverted_index.empirical | 164 |
| abstract_inverted_index.factually | 143 |
| abstract_inverted_index.improving | 112 |
| abstract_inverted_index.incorrect | 144 |
| abstract_inverted_index.reasoning | 51 |
| abstract_inverted_index.retrieval | 1, 113 |
| abstract_inverted_index.decomposed | 108 |
| abstract_inverted_index.known-item | 156 |
| abstract_inverted_index.paraphrase | 76 |
| abstract_inverted_index.challenges, | 68 |
| abstract_inverted_index.consolidate | 53 |
| abstract_inverted_index.effectively | 116, 173 |
| abstract_inverted_index.embeddings. | 11 |
| abstract_inverted_index.experiments | 148 |
| abstract_inverted_index.limitations | 15 |
| abstract_inverted_index.paraphrased | 145 |
| abstract_inverted_index.performance | 114 |
| abstract_inverted_index.Furthermore, | 43 |
| abstract_inverted_index.constraints. | 64 |
| abstract_inverted_index.sub-queries, | 111 |
| abstract_inverted_index.sub-queries. | 123 |
| abstract_inverted_index.text-to-text | 79 |
| abstract_inverted_index.verification | 138 |
| abstract_inverted_index.Additionally, | 100 |
| abstract_inverted_index.Text-to-video | 0 |
| abstract_inverted_index.concept-based | 45 |
| abstract_inverted_index.image-to-text | 84 |
| abstract_inverted_index.paraphrasing. | 177 |
| abstract_inverted_index.relationships | 102 |
| abstract_inverted_index.text-to-image | 81 |
| abstract_inverted_index.traditionally | 168 |
| abstract_inverted_index.hallucination, | 130 |
| abstract_inverted_index.embedding-based | 47 |
| abstract_inverted_index.transformations | 88 |
| abstract_inverted_index.transformations. | 86 |
| abstract_inverted_index.consistency-based | 137 |
| abstract_inverted_index.out-of-vocabulary | 41, 98 |
| abstract_inverted_index.difficult-to-answer | 169 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| citation_normalized_percentile |