Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2501.08580
In the domain of computer vision, Parameter-Efficient Tuning (PET) is increasingly replacing the traditional paradigm of pre-training followed by full fine-tuning. PET is particularly favored for its effectiveness in large foundation models, as it streamlines transfer learning costs and optimizes hardware utilization. However, the current PET methods are mainly designed for single-modal optimization. While some pioneering studies have undertaken preliminary explorations, they still remain at the level of aligned encoders (e.g., CLIP) and lack exploration of misaligned encoders. These methods show sub-optimal performance with misaligned encoders, as they fail to effectively align the multimodal features during fine-tuning. In this paper, we introduce DETRIS, a parameter-efficient tuning framework designed to enhance low-rank visual feature propagation by establishing dense interconnections between each layer and all preceding layers, which enables effective cross-modal feature interaction and adaptation to misaligned encoders. We also suggest using text adapters to improve textual features. Our simple yet efficient approach greatly surpasses state-of-the-art methods with 0.9% to 1.8% backbone parameter updates, evaluated on challenging benchmarks. Our project is available at \url{https://github.com/jiaqihuang01/DETRIS}.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2501.08580
- https://arxiv.org/pdf/2501.08580
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4406482921
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4406482921Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2501.08580Digital Object Identifier
- Title
-
Densely Connected Parameter-Efficient Tuning for Referring Image SegmentationWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-01-15Full publication date if available
- Authors
-
Jiaqi Huang, Zunnan Xu, Ting Liu, Yong Liu, Haonan Han, Kehong Yuan, Xiu LiList of authors in order
- Landing page
-
https://arxiv.org/abs/2501.08580Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2501.08580Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2501.08580Direct OA link when available
- Concepts
-
Artificial intelligence, Computer science, Image (mathematics), Segmentation, Computer vision, Image segmentation, Pattern recognition (psychology)Top concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4406482921 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2501.08580 |
| ids.doi | https://doi.org/10.48550/arxiv.2501.08580 |
| ids.openalex | https://openalex.org/W4406482921 |
| fwci | |
| type | preprint |
| title | Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10052 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.8835999965667725 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Medical Image Segmentation Techniques |
| topics[1].id | https://openalex.org/T12702 |
| topics[1].field.id | https://openalex.org/fields/28 |
| topics[1].field.display_name | Neuroscience |
| topics[1].score | 0.8371000289916992 |
| topics[1].domain.id | https://openalex.org/domains/1 |
| topics[1].domain.display_name | Life Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/2808 |
| topics[1].subfield.display_name | Neurology |
| topics[1].display_name | Brain Tumor Detection and Classification |
| topics[2].id | https://openalex.org/T10627 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.8029000163078308 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1707 |
| topics[2].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[2].display_name | Advanced Image and Video Retrieval Techniques |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C154945302 |
| concepts[0].level | 1 |
| concepts[0].score | 0.5840606093406677 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[0].display_name | Artificial intelligence |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.5750250816345215 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C115961682 |
| concepts[2].level | 2 |
| concepts[2].score | 0.565159261226654 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q860623 |
| concepts[2].display_name | Image (mathematics) |
| concepts[3].id | https://openalex.org/C89600930 |
| concepts[3].level | 2 |
| concepts[3].score | 0.5634405612945557 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q1423946 |
| concepts[3].display_name | Segmentation |
| concepts[4].id | https://openalex.org/C31972630 |
| concepts[4].level | 1 |
| concepts[4].score | 0.553627610206604 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q844240 |
| concepts[4].display_name | Computer vision |
| concepts[5].id | https://openalex.org/C124504099 |
| concepts[5].level | 3 |
| concepts[5].score | 0.4942200481891632 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q56933 |
| concepts[5].display_name | Image segmentation |
| concepts[6].id | https://openalex.org/C153180895 |
| concepts[6].level | 2 |
| concepts[6].score | 0.4012943506240845 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q7148389 |
| concepts[6].display_name | Pattern recognition (psychology) |
| keywords[0].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[0].score | 0.5840606093406677 |
| keywords[0].display_name | Artificial intelligence |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.5750250816345215 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/image |
| keywords[2].score | 0.565159261226654 |
| keywords[2].display_name | Image (mathematics) |
| keywords[3].id | https://openalex.org/keywords/segmentation |
| keywords[3].score | 0.5634405612945557 |
| keywords[3].display_name | Segmentation |
| keywords[4].id | https://openalex.org/keywords/computer-vision |
| keywords[4].score | 0.553627610206604 |
| keywords[4].display_name | Computer vision |
| keywords[5].id | https://openalex.org/keywords/image-segmentation |
| keywords[5].score | 0.4942200481891632 |
| keywords[5].display_name | Image segmentation |
| keywords[6].id | https://openalex.org/keywords/pattern-recognition |
| keywords[6].score | 0.4012943506240845 |
| keywords[6].display_name | Pattern recognition (psychology) |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2501.08580 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2501.08580 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2501.08580 |
| locations[1].id | doi:10.48550/arxiv.2501.08580 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2501.08580 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5048493109 |
| authorships[0].author.orcid | https://orcid.org/0009-0006-6937-3130 |
| authorships[0].author.display_name | Jiaqi Huang |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Huang, Jiaqi |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5113145225 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Zunnan Xu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Xu, Zunnan |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5100418161 |
| authorships[2].author.orcid | https://orcid.org/0000-0003-0749-064X |
| authorships[2].author.display_name | Ting Liu |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Liu, Ting |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5101837161 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-6714-9411 |
| authorships[3].author.display_name | Yong Liu |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Liu, Yong |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5040802446 |
| authorships[4].author.orcid | |
| authorships[4].author.display_name | Haonan Han |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Han, Haonan |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5102309342 |
| authorships[5].author.orcid | |
| authorships[5].author.display_name | Kehong Yuan |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Yuan, Kehong |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5100754504 |
| authorships[6].author.orcid | https://orcid.org/0000-0003-0403-1923 |
| authorships[6].author.display_name | Xiu Li |
| authorships[6].author_position | last |
| authorships[6].raw_author_name | Li, Xiu |
| authorships[6].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2501.08580 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10052 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.8835999965667725 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Medical Image Segmentation Techniques |
| related_works | https://openalex.org/W4379231730, https://openalex.org/W4389858081, https://openalex.org/W2501551404, https://openalex.org/W4298131179, https://openalex.org/W2113201962, https://openalex.org/W4385583601, https://openalex.org/W4395685956, https://openalex.org/W3159516372, https://openalex.org/W4398146871, https://openalex.org/W1522196789 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2501.08580 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2501.08580 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2501.08580 |
| primary_location.id | pmh:oai:arXiv.org:2501.08580 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2501.08580 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2501.08580 |
| publication_date | 2025-01-15 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 103 |
| abstract_inverted_index.In | 0, 97 |
| abstract_inverted_index.We | 136 |
| abstract_inverted_index.as | 32, 86 |
| abstract_inverted_index.at | 64, 170 |
| abstract_inverted_index.by | 18, 114 |
| abstract_inverted_index.in | 28 |
| abstract_inverted_index.is | 9, 22, 168 |
| abstract_inverted_index.it | 33 |
| abstract_inverted_index.of | 3, 15, 67, 75 |
| abstract_inverted_index.on | 163 |
| abstract_inverted_index.to | 89, 108, 133, 142, 157 |
| abstract_inverted_index.we | 100 |
| abstract_inverted_index.Our | 146, 166 |
| abstract_inverted_index.PET | 21, 45 |
| abstract_inverted_index.all | 122 |
| abstract_inverted_index.and | 38, 72, 121, 131 |
| abstract_inverted_index.are | 47 |
| abstract_inverted_index.for | 25, 50 |
| abstract_inverted_index.its | 26 |
| abstract_inverted_index.the | 1, 12, 43, 65, 92 |
| abstract_inverted_index.yet | 148 |
| abstract_inverted_index.0.9% | 156 |
| abstract_inverted_index.1.8% | 158 |
| abstract_inverted_index.also | 137 |
| abstract_inverted_index.each | 119 |
| abstract_inverted_index.fail | 88 |
| abstract_inverted_index.full | 19 |
| abstract_inverted_index.have | 57 |
| abstract_inverted_index.lack | 73 |
| abstract_inverted_index.show | 80 |
| abstract_inverted_index.some | 54 |
| abstract_inverted_index.text | 140 |
| abstract_inverted_index.they | 61, 87 |
| abstract_inverted_index.this | 98 |
| abstract_inverted_index.with | 83, 155 |
| abstract_inverted_index.(PET) | 8 |
| abstract_inverted_index.CLIP) | 71 |
| abstract_inverted_index.These | 78 |
| abstract_inverted_index.While | 53 |
| abstract_inverted_index.align | 91 |
| abstract_inverted_index.costs | 37 |
| abstract_inverted_index.dense | 116 |
| abstract_inverted_index.large | 29 |
| abstract_inverted_index.layer | 120 |
| abstract_inverted_index.level | 66 |
| abstract_inverted_index.still | 62 |
| abstract_inverted_index.using | 139 |
| abstract_inverted_index.which | 125 |
| abstract_inverted_index.(e.g., | 70 |
| abstract_inverted_index.Tuning | 7 |
| abstract_inverted_index.domain | 2 |
| abstract_inverted_index.during | 95 |
| abstract_inverted_index.mainly | 48 |
| abstract_inverted_index.paper, | 99 |
| abstract_inverted_index.remain | 63 |
| abstract_inverted_index.simple | 147 |
| abstract_inverted_index.tuning | 105 |
| abstract_inverted_index.visual | 111 |
| abstract_inverted_index.DETRIS, | 102 |
| abstract_inverted_index.aligned | 68 |
| abstract_inverted_index.between | 118 |
| abstract_inverted_index.current | 44 |
| abstract_inverted_index.enables | 126 |
| abstract_inverted_index.enhance | 109 |
| abstract_inverted_index.favored | 24 |
| abstract_inverted_index.feature | 112, 129 |
| abstract_inverted_index.greatly | 151 |
| abstract_inverted_index.improve | 143 |
| abstract_inverted_index.layers, | 124 |
| abstract_inverted_index.methods | 46, 79, 154 |
| abstract_inverted_index.models, | 31 |
| abstract_inverted_index.project | 167 |
| abstract_inverted_index.studies | 56 |
| abstract_inverted_index.suggest | 138 |
| abstract_inverted_index.textual | 144 |
| abstract_inverted_index.vision, | 5 |
| abstract_inverted_index.However, | 42 |
| abstract_inverted_index.adapters | 141 |
| abstract_inverted_index.approach | 150 |
| abstract_inverted_index.backbone | 159 |
| abstract_inverted_index.computer | 4 |
| abstract_inverted_index.designed | 49, 107 |
| abstract_inverted_index.encoders | 69 |
| abstract_inverted_index.features | 94 |
| abstract_inverted_index.followed | 17 |
| abstract_inverted_index.hardware | 40 |
| abstract_inverted_index.learning | 36 |
| abstract_inverted_index.low-rank | 110 |
| abstract_inverted_index.paradigm | 14 |
| abstract_inverted_index.transfer | 35 |
| abstract_inverted_index.updates, | 161 |
| abstract_inverted_index.available | 169 |
| abstract_inverted_index.effective | 127 |
| abstract_inverted_index.efficient | 149 |
| abstract_inverted_index.encoders, | 85 |
| abstract_inverted_index.encoders. | 77, 135 |
| abstract_inverted_index.evaluated | 162 |
| abstract_inverted_index.features. | 145 |
| abstract_inverted_index.framework | 106 |
| abstract_inverted_index.introduce | 101 |
| abstract_inverted_index.optimizes | 39 |
| abstract_inverted_index.parameter | 160 |
| abstract_inverted_index.preceding | 123 |
| abstract_inverted_index.replacing | 11 |
| abstract_inverted_index.surpasses | 152 |
| abstract_inverted_index.adaptation | 132 |
| abstract_inverted_index.foundation | 30 |
| abstract_inverted_index.misaligned | 76, 84, 134 |
| abstract_inverted_index.multimodal | 93 |
| abstract_inverted_index.pioneering | 55 |
| abstract_inverted_index.undertaken | 58 |
| abstract_inverted_index.benchmarks. | 165 |
| abstract_inverted_index.challenging | 164 |
| abstract_inverted_index.cross-modal | 128 |
| abstract_inverted_index.effectively | 90 |
| abstract_inverted_index.exploration | 74 |
| abstract_inverted_index.interaction | 130 |
| abstract_inverted_index.performance | 82 |
| abstract_inverted_index.preliminary | 59 |
| abstract_inverted_index.propagation | 113 |
| abstract_inverted_index.streamlines | 34 |
| abstract_inverted_index.sub-optimal | 81 |
| abstract_inverted_index.traditional | 13 |
| abstract_inverted_index.establishing | 115 |
| abstract_inverted_index.fine-tuning. | 20, 96 |
| abstract_inverted_index.increasingly | 10 |
| abstract_inverted_index.particularly | 23 |
| abstract_inverted_index.pre-training | 16 |
| abstract_inverted_index.single-modal | 51 |
| abstract_inverted_index.utilization. | 41 |
| abstract_inverted_index.effectiveness | 27 |
| abstract_inverted_index.explorations, | 60 |
| abstract_inverted_index.optimization. | 52 |
| abstract_inverted_index.interconnections | 117 |
| abstract_inverted_index.state-of-the-art | 153 |
| abstract_inverted_index.Parameter-Efficient | 6 |
| abstract_inverted_index.parameter-efficient | 104 |
| abstract_inverted_index.\url{https://github.com/jiaqihuang01/DETRIS}. | 171 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 7 |
| citation_normalized_percentile |