Towards End-to-End Image Compression and Analysis with Transformers Article Swipe
YOU?
·
· 2021
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2112.09300
We propose an end-to-end image compression and analysis model with Transformers, targeting to the cloud-based image classification application. Instead of placing an existing Transformer-based image classification model directly after an image codec, we aim to redesign the Vision Transformer (ViT) model to perform image classification from the compressed features and facilitate image compression with the long-term information from the Transformer. Specifically, we first replace the patchify stem (i.e., image splitting and embedding) of the ViT model with a lightweight image encoder modelled by a convolutional neural network. The compressed features generated by the image encoder are injected convolutional inductive bias and are fed to the Transformer for image classification bypassing image reconstruction. Meanwhile, we propose a feature aggregation module to fuse the compressed features with the selected intermediate features of the Transformer, and feed the aggregated features to a deconvolutional neural network for image reconstruction. The aggregated features can obtain the long-term information from the self-attention mechanism of the Transformer and improve the compression performance. The rate-distortion-accuracy optimization problem is finally solved by a two-step training strategy. Experimental results demonstrate the effectiveness of the proposed model in both the image compression and the classification tasks.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2112.09300
- https://arxiv.org/pdf/2112.09300
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4303418405
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4303418405Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2112.09300Digital Object Identifier
- Title
-
Towards End-to-End Image Compression and Analysis with TransformersWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2021Year of publication
- Publication date
-
2021-12-17Full publication date if available
- Authors
-
Yuanchao Bai, Yang Xu, Xianming Liu, Junjun Jiang, Yaowei Wang, Xiangyang Ji, Wen GaoList of authors in order
- Landing page
-
https://arxiv.org/abs/2112.09300Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2112.09300Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2112.09300Direct OA link when available
- Concepts
-
Computer science, Artificial intelligence, Image compression, Transformer, Encoder, Computer vision, Convolutional neural network, Pattern recognition (psychology), Image processing, Image (mathematics), Engineering, Voltage, Operating system, Electrical engineeringTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4303418405 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2112.09300 |
| ids.doi | https://doi.org/10.48550/arxiv.2112.09300 |
| ids.openalex | https://openalex.org/W4303418405 |
| fwci | |
| type | preprint |
| title | Towards End-to-End Image Compression and Analysis with Transformers |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11105 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9991999864578247 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Advanced Image Processing Techniques |
| topics[1].id | https://openalex.org/T10688 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.998199999332428 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1707 |
| topics[1].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[1].display_name | Image and Signal Denoising Methods |
| topics[2].id | https://openalex.org/T13114 |
| topics[2].field.id | https://openalex.org/fields/22 |
| topics[2].field.display_name | Engineering |
| topics[2].score | 0.9977999925613403 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/2214 |
| topics[2].subfield.display_name | Media Technology |
| topics[2].display_name | Image Processing Techniques and Applications |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.712924599647522 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C154945302 |
| concepts[1].level | 1 |
| concepts[1].score | 0.6882478594779968 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[1].display_name | Artificial intelligence |
| concepts[2].id | https://openalex.org/C13481523 |
| concepts[2].level | 4 |
| concepts[2].score | 0.6496795415878296 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q412438 |
| concepts[2].display_name | Image compression |
| concepts[3].id | https://openalex.org/C66322947 |
| concepts[3].level | 3 |
| concepts[3].score | 0.5919644236564636 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q11658 |
| concepts[3].display_name | Transformer |
| concepts[4].id | https://openalex.org/C118505674 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5523053407669067 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q42586063 |
| concepts[4].display_name | Encoder |
| concepts[5].id | https://openalex.org/C31972630 |
| concepts[5].level | 1 |
| concepts[5].score | 0.5184738039970398 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q844240 |
| concepts[5].display_name | Computer vision |
| concepts[6].id | https://openalex.org/C81363708 |
| concepts[6].level | 2 |
| concepts[6].score | 0.4494365453720093 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q17084460 |
| concepts[6].display_name | Convolutional neural network |
| concepts[7].id | https://openalex.org/C153180895 |
| concepts[7].level | 2 |
| concepts[7].score | 0.41534945368766785 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q7148389 |
| concepts[7].display_name | Pattern recognition (psychology) |
| concepts[8].id | https://openalex.org/C9417928 |
| concepts[8].level | 3 |
| concepts[8].score | 0.35598433017730713 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q1070689 |
| concepts[8].display_name | Image processing |
| concepts[9].id | https://openalex.org/C115961682 |
| concepts[9].level | 2 |
| concepts[9].score | 0.2346552312374115 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q860623 |
| concepts[9].display_name | Image (mathematics) |
| concepts[10].id | https://openalex.org/C127413603 |
| concepts[10].level | 0 |
| concepts[10].score | 0.14252495765686035 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q11023 |
| concepts[10].display_name | Engineering |
| concepts[11].id | https://openalex.org/C165801399 |
| concepts[11].level | 2 |
| concepts[11].score | 0.0 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q25428 |
| concepts[11].display_name | Voltage |
| concepts[12].id | https://openalex.org/C111919701 |
| concepts[12].level | 1 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q9135 |
| concepts[12].display_name | Operating system |
| concepts[13].id | https://openalex.org/C119599485 |
| concepts[13].level | 1 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q43035 |
| concepts[13].display_name | Electrical engineering |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.712924599647522 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[1].score | 0.6882478594779968 |
| keywords[1].display_name | Artificial intelligence |
| keywords[2].id | https://openalex.org/keywords/image-compression |
| keywords[2].score | 0.6496795415878296 |
| keywords[2].display_name | Image compression |
| keywords[3].id | https://openalex.org/keywords/transformer |
| keywords[3].score | 0.5919644236564636 |
| keywords[3].display_name | Transformer |
| keywords[4].id | https://openalex.org/keywords/encoder |
| keywords[4].score | 0.5523053407669067 |
| keywords[4].display_name | Encoder |
| keywords[5].id | https://openalex.org/keywords/computer-vision |
| keywords[5].score | 0.5184738039970398 |
| keywords[5].display_name | Computer vision |
| keywords[6].id | https://openalex.org/keywords/convolutional-neural-network |
| keywords[6].score | 0.4494365453720093 |
| keywords[6].display_name | Convolutional neural network |
| keywords[7].id | https://openalex.org/keywords/pattern-recognition |
| keywords[7].score | 0.41534945368766785 |
| keywords[7].display_name | Pattern recognition (psychology) |
| keywords[8].id | https://openalex.org/keywords/image-processing |
| keywords[8].score | 0.35598433017730713 |
| keywords[8].display_name | Image processing |
| keywords[9].id | https://openalex.org/keywords/image |
| keywords[9].score | 0.2346552312374115 |
| keywords[9].display_name | Image (mathematics) |
| keywords[10].id | https://openalex.org/keywords/engineering |
| keywords[10].score | 0.14252495765686035 |
| keywords[10].display_name | Engineering |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2112.09300 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2112.09300 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2112.09300 |
| locations[1].id | doi:10.48550/arxiv.2112.09300 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2112.09300 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5024093994 |
| authorships[0].author.orcid | https://orcid.org/0000-0003-3449-6537 |
| authorships[0].author.display_name | Yuanchao Bai |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Bai, Yuanchao |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5014490044 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-0958-8547 |
| authorships[1].author.display_name | Yang Xu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Yang, Xu |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5100654390 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-8857-1785 |
| authorships[2].author.display_name | Xianming Liu |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Liu, Xianming |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5087165831 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-5694-505X |
| authorships[3].author.display_name | Junjun Jiang |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Jiang, Junjun |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5100631216 |
| authorships[4].author.orcid | https://orcid.org/0000-0003-2197-9038 |
| authorships[4].author.display_name | Yaowei Wang |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Wang, Yaowei |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5024401174 |
| authorships[5].author.orcid | https://orcid.org/0000-0002-7333-9975 |
| authorships[5].author.display_name | Xiangyang Ji |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Ji, Xiangyang |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5018478553 |
| authorships[6].author.orcid | https://orcid.org/0000-0002-8070-802X |
| authorships[6].author.display_name | Wen Gao |
| authorships[6].author_position | last |
| authorships[6].raw_author_name | Gao, Wen |
| authorships[6].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2112.09300 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Towards End-to-End Image Compression and Analysis with Transformers |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11105 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9991999864578247 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Advanced Image Processing Techniques |
| related_works | https://openalex.org/W4293226380, https://openalex.org/W4390516098, https://openalex.org/W2181948922, https://openalex.org/W2384362569, https://openalex.org/W2142795561, https://openalex.org/W4205302943, https://openalex.org/W2561132942, https://openalex.org/W4321487865, https://openalex.org/W3155418658, https://openalex.org/W2998180244 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2112.09300 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2112.09300 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2112.09300 |
| primary_location.id | pmh:oai:arXiv.org:2112.09300 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2112.09300 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2112.09300 |
| publication_date | 2021-12-17 |
| publication_year | 2021 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 77, 83, 115, 138, 173 |
| abstract_inverted_index.We | 0 |
| abstract_inverted_index.an | 2, 21, 29 |
| abstract_inverted_index.by | 82, 91, 172 |
| abstract_inverted_index.in | 186 |
| abstract_inverted_index.is | 169 |
| abstract_inverted_index.of | 19, 72, 129, 157, 182 |
| abstract_inverted_index.to | 12, 34, 41, 103, 119, 137 |
| abstract_inverted_index.we | 32, 61, 113 |
| abstract_inverted_index.The | 87, 145, 165 |
| abstract_inverted_index.ViT | 74 |
| abstract_inverted_index.aim | 33 |
| abstract_inverted_index.and | 6, 49, 70, 100, 132, 160, 191 |
| abstract_inverted_index.are | 95, 101 |
| abstract_inverted_index.can | 148 |
| abstract_inverted_index.fed | 102 |
| abstract_inverted_index.for | 106, 142 |
| abstract_inverted_index.the | 13, 36, 46, 54, 58, 64, 73, 92, 104, 121, 125, 130, 134, 150, 154, 158, 162, 180, 183, 188, 192 |
| abstract_inverted_index.bias | 99 |
| abstract_inverted_index.both | 187 |
| abstract_inverted_index.feed | 133 |
| abstract_inverted_index.from | 45, 57, 153 |
| abstract_inverted_index.fuse | 120 |
| abstract_inverted_index.stem | 66 |
| abstract_inverted_index.with | 9, 53, 76, 124 |
| abstract_inverted_index.(ViT) | 39 |
| abstract_inverted_index.after | 28 |
| abstract_inverted_index.first | 62 |
| abstract_inverted_index.image | 4, 15, 24, 30, 43, 51, 68, 79, 93, 107, 110, 143, 189 |
| abstract_inverted_index.model | 8, 26, 40, 75, 185 |
| abstract_inverted_index.(i.e., | 67 |
| abstract_inverted_index.Vision | 37 |
| abstract_inverted_index.codec, | 31 |
| abstract_inverted_index.module | 118 |
| abstract_inverted_index.neural | 85, 140 |
| abstract_inverted_index.obtain | 149 |
| abstract_inverted_index.solved | 171 |
| abstract_inverted_index.tasks. | 194 |
| abstract_inverted_index.Instead | 18 |
| abstract_inverted_index.encoder | 80, 94 |
| abstract_inverted_index.feature | 116 |
| abstract_inverted_index.finally | 170 |
| abstract_inverted_index.improve | 161 |
| abstract_inverted_index.network | 141 |
| abstract_inverted_index.perform | 42 |
| abstract_inverted_index.placing | 20 |
| abstract_inverted_index.problem | 168 |
| abstract_inverted_index.propose | 1, 114 |
| abstract_inverted_index.replace | 63 |
| abstract_inverted_index.results | 178 |
| abstract_inverted_index.analysis | 7 |
| abstract_inverted_index.directly | 27 |
| abstract_inverted_index.existing | 22 |
| abstract_inverted_index.features | 48, 89, 123, 128, 136, 147 |
| abstract_inverted_index.injected | 96 |
| abstract_inverted_index.modelled | 81 |
| abstract_inverted_index.network. | 86 |
| abstract_inverted_index.patchify | 65 |
| abstract_inverted_index.proposed | 184 |
| abstract_inverted_index.redesign | 35 |
| abstract_inverted_index.selected | 126 |
| abstract_inverted_index.training | 175 |
| abstract_inverted_index.two-step | 174 |
| abstract_inverted_index.bypassing | 109 |
| abstract_inverted_index.generated | 90 |
| abstract_inverted_index.inductive | 98 |
| abstract_inverted_index.long-term | 55, 151 |
| abstract_inverted_index.mechanism | 156 |
| abstract_inverted_index.splitting | 69 |
| abstract_inverted_index.strategy. | 176 |
| abstract_inverted_index.targeting | 11 |
| abstract_inverted_index.Meanwhile, | 112 |
| abstract_inverted_index.aggregated | 135, 146 |
| abstract_inverted_index.compressed | 47, 88, 122 |
| abstract_inverted_index.embedding) | 71 |
| abstract_inverted_index.end-to-end | 3 |
| abstract_inverted_index.facilitate | 50 |
| abstract_inverted_index.Transformer | 38, 105, 159 |
| abstract_inverted_index.aggregation | 117 |
| abstract_inverted_index.cloud-based | 14 |
| abstract_inverted_index.compression | 5, 52, 163, 190 |
| abstract_inverted_index.demonstrate | 179 |
| abstract_inverted_index.information | 56, 152 |
| abstract_inverted_index.lightweight | 78 |
| abstract_inverted_index.Experimental | 177 |
| abstract_inverted_index.Transformer, | 131 |
| abstract_inverted_index.Transformer. | 59 |
| abstract_inverted_index.application. | 17 |
| abstract_inverted_index.intermediate | 127 |
| abstract_inverted_index.optimization | 167 |
| abstract_inverted_index.performance. | 164 |
| abstract_inverted_index.Specifically, | 60 |
| abstract_inverted_index.Transformers, | 10 |
| abstract_inverted_index.convolutional | 84, 97 |
| abstract_inverted_index.effectiveness | 181 |
| abstract_inverted_index.classification | 16, 25, 44, 108, 193 |
| abstract_inverted_index.self-attention | 155 |
| abstract_inverted_index.deconvolutional | 139 |
| abstract_inverted_index.reconstruction. | 111, 144 |
| abstract_inverted_index.Transformer-based | 23 |
| abstract_inverted_index.rate-distortion-accuracy | 166 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 7 |
| citation_normalized_percentile |