Beyond Images: Adaptive Fusion of Visual and Textual Data for Food Classification Article Swipe
YOU?
·
· 2023
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2308.02562
This study introduces a novel multimodal food recognition framework that effectively combines visual and textual modalities to enhance classification accuracy and robustness. The proposed approach employs a dynamic multimodal fusion strategy that adaptively integrates features from unimodal visual inputs and complementary textual metadata. This fusion mechanism is designed to maximize the use of informative content, while mitigating the adverse impact of missing or inconsistent modality data. The framework was rigorously evaluated on the UPMC Food-101 dataset and achieved unimodal classification accuracies of 73.60% for images and 88.84% for text. When both modalities were fused, the model achieved an accuracy of 97.84%, outperforming several state-of-the-art methods. Extensive experimental analysis demonstrated the robustness, adaptability, and computational efficiency of the proposed settings, highlighting its practical applicability to real-world multimodal food-recognition scenarios.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2308.02562
- https://arxiv.org/pdf/2308.02562
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4385680908
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4385680908Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2308.02562Digital Object Identifier
- Title
-
Beyond Images: Adaptive Fusion of Visual and Textual Data for Food ClassificationWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2023Year of publication
- Publication date
-
2023-08-03Full publication date if available
- Authors
-
Prateek Mittal, Puneet Goyal, Joohi ChauhanList of authors in order
- Landing page
-
https://arxiv.org/abs/2308.02562Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2308.02562Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2308.02562Direct OA link when available
- Concepts
-
Computer science, Artificial intelligence, Robustness (evolution), Transformer, Pattern recognition (psychology), Recall, Machine learning, Precision and recall, Contextual image classification, Natural language processing, Image (mathematics), Biochemistry, Physics, Quantum mechanics, Chemistry, Voltage, Gene, Philosophy, LinguisticsTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4385680908 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2308.02562 |
| ids.doi | https://doi.org/10.48550/arxiv.2308.02562 |
| ids.openalex | https://openalex.org/W4385680908 |
| fwci | |
| type | preprint |
| title | Beyond Images: Adaptive Fusion of Visual and Textual Data for Food Classification |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11925 |
| topics[0].field.id | https://openalex.org/fields/11 |
| topics[0].field.display_name | Agricultural and Biological Sciences |
| topics[0].score | 0.7283999919891357 |
| topics[0].domain.id | https://openalex.org/domains/1 |
| topics[0].domain.display_name | Life Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1106 |
| topics[0].subfield.display_name | Food Science |
| topics[0].display_name | Culinary Culture and Tourism |
| topics[1].id | https://openalex.org/T10824 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.6996999979019165 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1707 |
| topics[1].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[1].display_name | Image Retrieval and Classification Techniques |
| topics[2].id | https://openalex.org/T11550 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.6401000022888184 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Text and Document Classification Technologies |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.7429693937301636 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C154945302 |
| concepts[1].level | 1 |
| concepts[1].score | 0.6611276268959045 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[1].display_name | Artificial intelligence |
| concepts[2].id | https://openalex.org/C63479239 |
| concepts[2].level | 3 |
| concepts[2].score | 0.6561263203620911 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q7353546 |
| concepts[2].display_name | Robustness (evolution) |
| concepts[3].id | https://openalex.org/C66322947 |
| concepts[3].level | 3 |
| concepts[3].score | 0.5630037784576416 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q11658 |
| concepts[3].display_name | Transformer |
| concepts[4].id | https://openalex.org/C153180895 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5570996999740601 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q7148389 |
| concepts[4].display_name | Pattern recognition (psychology) |
| concepts[5].id | https://openalex.org/C100660578 |
| concepts[5].level | 2 |
| concepts[5].score | 0.5362953543663025 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q18733 |
| concepts[5].display_name | Recall |
| concepts[6].id | https://openalex.org/C119857082 |
| concepts[6].level | 1 |
| concepts[6].score | 0.5025289058685303 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q2539 |
| concepts[6].display_name | Machine learning |
| concepts[7].id | https://openalex.org/C81669768 |
| concepts[7].level | 2 |
| concepts[7].score | 0.42782270908355713 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q2359161 |
| concepts[7].display_name | Precision and recall |
| concepts[8].id | https://openalex.org/C75294576 |
| concepts[8].level | 3 |
| concepts[8].score | 0.4235674738883972 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q5165192 |
| concepts[8].display_name | Contextual image classification |
| concepts[9].id | https://openalex.org/C204321447 |
| concepts[9].level | 1 |
| concepts[9].score | 0.33631807565689087 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[9].display_name | Natural language processing |
| concepts[10].id | https://openalex.org/C115961682 |
| concepts[10].level | 2 |
| concepts[10].score | 0.2897205352783203 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q860623 |
| concepts[10].display_name | Image (mathematics) |
| concepts[11].id | https://openalex.org/C55493867 |
| concepts[11].level | 1 |
| concepts[11].score | 0.0 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q7094 |
| concepts[11].display_name | Biochemistry |
| concepts[12].id | https://openalex.org/C121332964 |
| concepts[12].level | 0 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q413 |
| concepts[12].display_name | Physics |
| concepts[13].id | https://openalex.org/C62520636 |
| concepts[13].level | 1 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q944 |
| concepts[13].display_name | Quantum mechanics |
| concepts[14].id | https://openalex.org/C185592680 |
| concepts[14].level | 0 |
| concepts[14].score | 0.0 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q2329 |
| concepts[14].display_name | Chemistry |
| concepts[15].id | https://openalex.org/C165801399 |
| concepts[15].level | 2 |
| concepts[15].score | 0.0 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q25428 |
| concepts[15].display_name | Voltage |
| concepts[16].id | https://openalex.org/C104317684 |
| concepts[16].level | 2 |
| concepts[16].score | 0.0 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q7187 |
| concepts[16].display_name | Gene |
| concepts[17].id | https://openalex.org/C138885662 |
| concepts[17].level | 0 |
| concepts[17].score | 0.0 |
| concepts[17].wikidata | https://www.wikidata.org/wiki/Q5891 |
| concepts[17].display_name | Philosophy |
| concepts[18].id | https://openalex.org/C41895202 |
| concepts[18].level | 1 |
| concepts[18].score | 0.0 |
| concepts[18].wikidata | https://www.wikidata.org/wiki/Q8162 |
| concepts[18].display_name | Linguistics |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.7429693937301636 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[1].score | 0.6611276268959045 |
| keywords[1].display_name | Artificial intelligence |
| keywords[2].id | https://openalex.org/keywords/robustness |
| keywords[2].score | 0.6561263203620911 |
| keywords[2].display_name | Robustness (evolution) |
| keywords[3].id | https://openalex.org/keywords/transformer |
| keywords[3].score | 0.5630037784576416 |
| keywords[3].display_name | Transformer |
| keywords[4].id | https://openalex.org/keywords/pattern-recognition |
| keywords[4].score | 0.5570996999740601 |
| keywords[4].display_name | Pattern recognition (psychology) |
| keywords[5].id | https://openalex.org/keywords/recall |
| keywords[5].score | 0.5362953543663025 |
| keywords[5].display_name | Recall |
| keywords[6].id | https://openalex.org/keywords/machine-learning |
| keywords[6].score | 0.5025289058685303 |
| keywords[6].display_name | Machine learning |
| keywords[7].id | https://openalex.org/keywords/precision-and-recall |
| keywords[7].score | 0.42782270908355713 |
| keywords[7].display_name | Precision and recall |
| keywords[8].id | https://openalex.org/keywords/contextual-image-classification |
| keywords[8].score | 0.4235674738883972 |
| keywords[8].display_name | Contextual image classification |
| keywords[9].id | https://openalex.org/keywords/natural-language-processing |
| keywords[9].score | 0.33631807565689087 |
| keywords[9].display_name | Natural language processing |
| keywords[10].id | https://openalex.org/keywords/image |
| keywords[10].score | 0.2897205352783203 |
| keywords[10].display_name | Image (mathematics) |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2308.02562 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2308.02562 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2308.02562 |
| locations[1].id | doi:10.48550/arxiv.2308.02562 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2308.02562 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5101504494 |
| authorships[0].author.orcid | https://orcid.org/0000-0003-4727-1708 |
| authorships[0].author.display_name | Prateek Mittal |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Mittal, Prateek |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5083836357 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-6196-9347 |
| authorships[1].author.display_name | Puneet Goyal |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Goyal, Puneet |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5045109981 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-7331-851X |
| authorships[2].author.display_name | Joohi Chauhan |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | Chauhan, Joohi |
| authorships[2].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2308.02562 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2023-08-09T00:00:00 |
| display_name | Beyond Images: Adaptive Fusion of Visual and Textual Data for Food Classification |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11925 |
| primary_topic.field.id | https://openalex.org/fields/11 |
| primary_topic.field.display_name | Agricultural and Biological Sciences |
| primary_topic.score | 0.7283999919891357 |
| primary_topic.domain.id | https://openalex.org/domains/1 |
| primary_topic.domain.display_name | Life Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1106 |
| primary_topic.subfield.display_name | Food Science |
| primary_topic.display_name | Culinary Culture and Tourism |
| related_works | https://openalex.org/W4330338194, https://openalex.org/W2118758177, https://openalex.org/W2153520307, https://openalex.org/W2770593030, https://openalex.org/W2151459719, https://openalex.org/W3154990682, https://openalex.org/W4281727072, https://openalex.org/W623261610, https://openalex.org/W2358294942, https://openalex.org/W4367460280 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2308.02562 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2308.02562 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2308.02562 |
| primary_location.id | pmh:oai:arXiv.org:2308.02562 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2308.02562 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2308.02562 |
| publication_date | 2023-08-03 |
| publication_year | 2023 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 3, 26 |
| abstract_inverted_index.an | 97 |
| abstract_inverted_index.is | 46 |
| abstract_inverted_index.of | 52, 60, 81, 99, 115 |
| abstract_inverted_index.on | 71 |
| abstract_inverted_index.or | 62 |
| abstract_inverted_index.to | 16, 48, 123 |
| abstract_inverted_index.The | 22, 66 |
| abstract_inverted_index.and | 13, 20, 39, 76, 85, 112 |
| abstract_inverted_index.for | 83, 87 |
| abstract_inverted_index.its | 120 |
| abstract_inverted_index.the | 50, 57, 72, 94, 109, 116 |
| abstract_inverted_index.use | 51 |
| abstract_inverted_index.was | 68 |
| abstract_inverted_index.This | 0, 43 |
| abstract_inverted_index.UPMC | 73 |
| abstract_inverted_index.When | 89 |
| abstract_inverted_index.both | 90 |
| abstract_inverted_index.food | 6 |
| abstract_inverted_index.from | 35 |
| abstract_inverted_index.that | 9, 31 |
| abstract_inverted_index.were | 92 |
| abstract_inverted_index.data. | 65 |
| abstract_inverted_index.model | 95 |
| abstract_inverted_index.novel | 4 |
| abstract_inverted_index.study | 1 |
| abstract_inverted_index.text. | 88 |
| abstract_inverted_index.while | 55 |
| abstract_inverted_index.73.60% | 82 |
| abstract_inverted_index.88.84% | 86 |
| abstract_inverted_index.fused, | 93 |
| abstract_inverted_index.fusion | 29, 44 |
| abstract_inverted_index.images | 84 |
| abstract_inverted_index.impact | 59 |
| abstract_inverted_index.inputs | 38 |
| abstract_inverted_index.visual | 12, 37 |
| abstract_inverted_index.97.84%, | 100 |
| abstract_inverted_index.adverse | 58 |
| abstract_inverted_index.dataset | 75 |
| abstract_inverted_index.dynamic | 27 |
| abstract_inverted_index.employs | 25 |
| abstract_inverted_index.enhance | 17 |
| abstract_inverted_index.missing | 61 |
| abstract_inverted_index.several | 102 |
| abstract_inverted_index.textual | 14, 41 |
| abstract_inverted_index.Food-101 | 74 |
| abstract_inverted_index.accuracy | 19, 98 |
| abstract_inverted_index.achieved | 77, 96 |
| abstract_inverted_index.analysis | 107 |
| abstract_inverted_index.approach | 24 |
| abstract_inverted_index.combines | 11 |
| abstract_inverted_index.content, | 54 |
| abstract_inverted_index.designed | 47 |
| abstract_inverted_index.features | 34 |
| abstract_inverted_index.maximize | 49 |
| abstract_inverted_index.methods. | 104 |
| abstract_inverted_index.modality | 64 |
| abstract_inverted_index.proposed | 23, 117 |
| abstract_inverted_index.strategy | 30 |
| abstract_inverted_index.unimodal | 36, 78 |
| abstract_inverted_index.Extensive | 105 |
| abstract_inverted_index.evaluated | 70 |
| abstract_inverted_index.framework | 8, 67 |
| abstract_inverted_index.mechanism | 45 |
| abstract_inverted_index.metadata. | 42 |
| abstract_inverted_index.practical | 121 |
| abstract_inverted_index.settings, | 118 |
| abstract_inverted_index.accuracies | 80 |
| abstract_inverted_index.adaptively | 32 |
| abstract_inverted_index.efficiency | 114 |
| abstract_inverted_index.integrates | 33 |
| abstract_inverted_index.introduces | 2 |
| abstract_inverted_index.mitigating | 56 |
| abstract_inverted_index.modalities | 15, 91 |
| abstract_inverted_index.multimodal | 5, 28, 125 |
| abstract_inverted_index.real-world | 124 |
| abstract_inverted_index.rigorously | 69 |
| abstract_inverted_index.scenarios. | 127 |
| abstract_inverted_index.effectively | 10 |
| abstract_inverted_index.informative | 53 |
| abstract_inverted_index.recognition | 7 |
| abstract_inverted_index.robustness, | 110 |
| abstract_inverted_index.robustness. | 21 |
| abstract_inverted_index.demonstrated | 108 |
| abstract_inverted_index.experimental | 106 |
| abstract_inverted_index.highlighting | 119 |
| abstract_inverted_index.inconsistent | 63 |
| abstract_inverted_index.adaptability, | 111 |
| abstract_inverted_index.applicability | 122 |
| abstract_inverted_index.complementary | 40 |
| abstract_inverted_index.computational | 113 |
| abstract_inverted_index.outperforming | 101 |
| abstract_inverted_index.classification | 18, 79 |
| abstract_inverted_index.food-recognition | 126 |
| abstract_inverted_index.state-of-the-art | 103 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 3 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/2 |
| sustainable_development_goals[0].score | 0.5799999833106995 |
| sustainable_development_goals[0].display_name | Zero hunger |
| citation_normalized_percentile |