DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation Transformer Article Swipe
YOU?
·
· 2023
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2304.06668
Most state-of-the-art instance segmentation methods rely on large amounts of pixel-precise ground-truth annotations for training, which are expensive to create. Interactive segmentation networks help generate such annotations based on an image and the corresponding user interactions such as clicks. Existing methods for this task can only process a single instance at a time and each user interaction requires a full forward pass through the entire deep network. We introduce a more efficient approach, called DynaMITe, in which we represent user interactions as spatio-temporal queries to a Transformer decoder with a potential to segment multiple object instances in a single iteration. Our architecture also alleviates any need to re-compute image features during refinement, and requires fewer interactions for segmenting multiple instances in a single image when compared to other methods. DynaMITe achieves state-of-the-art results on multiple existing interactive segmentation benchmarks, and also on the new multi-instance benchmark that we propose in this paper.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2304.06668
- https://arxiv.org/pdf/2304.06668
- OA Status
- green
- Cited By
- 1
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4365601499
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4365601499Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2304.06668Digital Object Identifier
- Title
-
DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation TransformerWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2023Year of publication
- Publication date
-
2023-04-13Full publication date if available
- Authors
-
Amit Kumar Rana, Sabarinath Mahadevan, Alexander Hermans, Bastian LeibeList of authors in order
- Landing page
-
https://arxiv.org/abs/2304.06668Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2304.06668Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2304.06668Direct OA link when available
- Concepts
-
Computer science, Segmentation, Artificial intelligence, Benchmark (surveying), Transformer, Image segmentation, Process (computing), Computer vision, Geography, Operating system, Geodesy, Physics, Quantum mechanics, VoltageTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2024: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4365601499 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2304.06668 |
| ids.doi | https://doi.org/10.48550/arxiv.2304.06668 |
| ids.openalex | https://openalex.org/W4365601499 |
| fwci | |
| type | preprint |
| title | DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation Transformer |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10036 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9994000196456909 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Advanced Neural Network Applications |
| topics[1].id | https://openalex.org/T11714 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9988999962806702 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1707 |
| topics[1].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[1].display_name | Multimodal Machine Learning Applications |
| topics[2].id | https://openalex.org/T10627 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9984999895095825 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1707 |
| topics[2].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[2].display_name | Advanced Image and Video Retrieval Techniques |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.8697614669799805 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C89600930 |
| concepts[1].level | 2 |
| concepts[1].score | 0.7029420137405396 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q1423946 |
| concepts[1].display_name | Segmentation |
| concepts[2].id | https://openalex.org/C154945302 |
| concepts[2].level | 1 |
| concepts[2].score | 0.5661047697067261 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[2].display_name | Artificial intelligence |
| concepts[3].id | https://openalex.org/C185798385 |
| concepts[3].level | 2 |
| concepts[3].score | 0.4720422029495239 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q1161707 |
| concepts[3].display_name | Benchmark (surveying) |
| concepts[4].id | https://openalex.org/C66322947 |
| concepts[4].level | 3 |
| concepts[4].score | 0.45415252447128296 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q11658 |
| concepts[4].display_name | Transformer |
| concepts[5].id | https://openalex.org/C124504099 |
| concepts[5].level | 3 |
| concepts[5].score | 0.4399120509624481 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q56933 |
| concepts[5].display_name | Image segmentation |
| concepts[6].id | https://openalex.org/C98045186 |
| concepts[6].level | 2 |
| concepts[6].score | 0.426987886428833 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q205663 |
| concepts[6].display_name | Process (computing) |
| concepts[7].id | https://openalex.org/C31972630 |
| concepts[7].level | 1 |
| concepts[7].score | 0.3811689615249634 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q844240 |
| concepts[7].display_name | Computer vision |
| concepts[8].id | https://openalex.org/C205649164 |
| concepts[8].level | 0 |
| concepts[8].score | 0.0 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q1071 |
| concepts[8].display_name | Geography |
| concepts[9].id | https://openalex.org/C111919701 |
| concepts[9].level | 1 |
| concepts[9].score | 0.0 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q9135 |
| concepts[9].display_name | Operating system |
| concepts[10].id | https://openalex.org/C13280743 |
| concepts[10].level | 1 |
| concepts[10].score | 0.0 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q131089 |
| concepts[10].display_name | Geodesy |
| concepts[11].id | https://openalex.org/C121332964 |
| concepts[11].level | 0 |
| concepts[11].score | 0.0 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q413 |
| concepts[11].display_name | Physics |
| concepts[12].id | https://openalex.org/C62520636 |
| concepts[12].level | 1 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q944 |
| concepts[12].display_name | Quantum mechanics |
| concepts[13].id | https://openalex.org/C165801399 |
| concepts[13].level | 2 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q25428 |
| concepts[13].display_name | Voltage |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.8697614669799805 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/segmentation |
| keywords[1].score | 0.7029420137405396 |
| keywords[1].display_name | Segmentation |
| keywords[2].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[2].score | 0.5661047697067261 |
| keywords[2].display_name | Artificial intelligence |
| keywords[3].id | https://openalex.org/keywords/benchmark |
| keywords[3].score | 0.4720422029495239 |
| keywords[3].display_name | Benchmark (surveying) |
| keywords[4].id | https://openalex.org/keywords/transformer |
| keywords[4].score | 0.45415252447128296 |
| keywords[4].display_name | Transformer |
| keywords[5].id | https://openalex.org/keywords/image-segmentation |
| keywords[5].score | 0.4399120509624481 |
| keywords[5].display_name | Image segmentation |
| keywords[6].id | https://openalex.org/keywords/process |
| keywords[6].score | 0.426987886428833 |
| keywords[6].display_name | Process (computing) |
| keywords[7].id | https://openalex.org/keywords/computer-vision |
| keywords[7].score | 0.3811689615249634 |
| keywords[7].display_name | Computer vision |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2304.06668 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2304.06668 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2304.06668 |
| locations[1].id | doi:10.48550/arxiv.2304.06668 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2304.06668 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5000471312 |
| authorships[0].author.orcid | https://orcid.org/0000-0001-8196-5346 |
| authorships[0].author.display_name | Amit Kumar Rana |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Rana, Amit Kumar |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5015439947 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-4868-1883 |
| authorships[1].author.display_name | Sabarinath Mahadevan |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Mahadevan, Sabarinath |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5071563379 |
| authorships[2].author.orcid | https://orcid.org/0000-0003-2127-0782 |
| authorships[2].author.display_name | Alexander Hermans |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Hermans, Alexander |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5071006649 |
| authorships[3].author.orcid | https://orcid.org/0000-0003-4225-0051 |
| authorships[3].author.display_name | Bastian Leibe |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Leibe, Bastian |
| authorships[3].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2304.06668 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2023-04-15T00:00:00 |
| display_name | DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation Transformer |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10036 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9994000196456909 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Advanced Neural Network Applications |
| related_works | https://openalex.org/W2378211422, https://openalex.org/W2745001401, https://openalex.org/W4321353415, https://openalex.org/W2130974462, https://openalex.org/W2028665553, https://openalex.org/W2086519370, https://openalex.org/W972276598, https://openalex.org/W2087343574, https://openalex.org/W4246352526, https://openalex.org/W1522196789 |
| cited_by_count | 1 |
| counts_by_year[0].year | 2024 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2304.06668 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2304.06668 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2304.06668 |
| primary_location.id | pmh:oai:arXiv.org:2304.06668 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2304.06668 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2304.06668 |
| publication_date | 2023-04-13 |
| publication_year | 2023 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 47, 51, 58, 69, 85, 89, 97, 121 |
| abstract_inverted_index.We | 67 |
| abstract_inverted_index.an | 29 |
| abstract_inverted_index.as | 37, 81 |
| abstract_inverted_index.at | 50 |
| abstract_inverted_index.in | 75, 96, 120, 149 |
| abstract_inverted_index.of | 9 |
| abstract_inverted_index.on | 6, 28, 133, 141 |
| abstract_inverted_index.to | 18, 84, 91, 106, 126 |
| abstract_inverted_index.we | 77, 147 |
| abstract_inverted_index.Our | 100 |
| abstract_inverted_index.and | 31, 53, 112, 139 |
| abstract_inverted_index.any | 104 |
| abstract_inverted_index.are | 16 |
| abstract_inverted_index.can | 44 |
| abstract_inverted_index.for | 13, 41, 116 |
| abstract_inverted_index.new | 143 |
| abstract_inverted_index.the | 32, 63, 142 |
| abstract_inverted_index.Most | 0 |
| abstract_inverted_index.also | 102, 140 |
| abstract_inverted_index.deep | 65 |
| abstract_inverted_index.each | 54 |
| abstract_inverted_index.full | 59 |
| abstract_inverted_index.help | 23 |
| abstract_inverted_index.more | 70 |
| abstract_inverted_index.need | 105 |
| abstract_inverted_index.only | 45 |
| abstract_inverted_index.pass | 61 |
| abstract_inverted_index.rely | 5 |
| abstract_inverted_index.such | 25, 36 |
| abstract_inverted_index.task | 43 |
| abstract_inverted_index.that | 146 |
| abstract_inverted_index.this | 42, 150 |
| abstract_inverted_index.time | 52 |
| abstract_inverted_index.user | 34, 55, 79 |
| abstract_inverted_index.when | 124 |
| abstract_inverted_index.with | 88 |
| abstract_inverted_index.based | 27 |
| abstract_inverted_index.fewer | 114 |
| abstract_inverted_index.image | 30, 108, 123 |
| abstract_inverted_index.large | 7 |
| abstract_inverted_index.other | 127 |
| abstract_inverted_index.which | 15, 76 |
| abstract_inverted_index.called | 73 |
| abstract_inverted_index.during | 110 |
| abstract_inverted_index.entire | 64 |
| abstract_inverted_index.object | 94 |
| abstract_inverted_index.paper. | 151 |
| abstract_inverted_index.single | 48, 98, 122 |
| abstract_inverted_index.amounts | 8 |
| abstract_inverted_index.clicks. | 38 |
| abstract_inverted_index.create. | 19 |
| abstract_inverted_index.decoder | 87 |
| abstract_inverted_index.forward | 60 |
| abstract_inverted_index.methods | 4, 40 |
| abstract_inverted_index.process | 46 |
| abstract_inverted_index.propose | 148 |
| abstract_inverted_index.queries | 83 |
| abstract_inverted_index.results | 132 |
| abstract_inverted_index.segment | 92 |
| abstract_inverted_index.through | 62 |
| abstract_inverted_index.DynaMITe | 129 |
| abstract_inverted_index.Existing | 39 |
| abstract_inverted_index.achieves | 130 |
| abstract_inverted_index.compared | 125 |
| abstract_inverted_index.existing | 135 |
| abstract_inverted_index.features | 109 |
| abstract_inverted_index.generate | 24 |
| abstract_inverted_index.instance | 2, 49 |
| abstract_inverted_index.methods. | 128 |
| abstract_inverted_index.multiple | 93, 118, 134 |
| abstract_inverted_index.network. | 66 |
| abstract_inverted_index.networks | 22 |
| abstract_inverted_index.requires | 57, 113 |
| abstract_inverted_index.DynaMITe, | 74 |
| abstract_inverted_index.approach, | 72 |
| abstract_inverted_index.benchmark | 145 |
| abstract_inverted_index.efficient | 71 |
| abstract_inverted_index.expensive | 17 |
| abstract_inverted_index.instances | 95, 119 |
| abstract_inverted_index.introduce | 68 |
| abstract_inverted_index.potential | 90 |
| abstract_inverted_index.represent | 78 |
| abstract_inverted_index.training, | 14 |
| abstract_inverted_index.alleviates | 103 |
| abstract_inverted_index.iteration. | 99 |
| abstract_inverted_index.re-compute | 107 |
| abstract_inverted_index.segmenting | 117 |
| abstract_inverted_index.Interactive | 20 |
| abstract_inverted_index.Transformer | 86 |
| abstract_inverted_index.annotations | 12, 26 |
| abstract_inverted_index.benchmarks, | 138 |
| abstract_inverted_index.interaction | 56 |
| abstract_inverted_index.interactive | 136 |
| abstract_inverted_index.refinement, | 111 |
| abstract_inverted_index.architecture | 101 |
| abstract_inverted_index.ground-truth | 11 |
| abstract_inverted_index.interactions | 35, 80, 115 |
| abstract_inverted_index.segmentation | 3, 21, 137 |
| abstract_inverted_index.corresponding | 33 |
| abstract_inverted_index.pixel-precise | 10 |
| abstract_inverted_index.multi-instance | 144 |
| abstract_inverted_index.spatio-temporal | 82 |
| abstract_inverted_index.state-of-the-art | 1, 131 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| citation_normalized_percentile |