Three Ways to Improve Semantic Segmentation with Self-Supervised Depth\n Estimation Article Swipe
YOU?
·
· 2020
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2012.10782
Training deep networks for semantic segmentation requires large amounts of\nlabeled training data, which presents a major challenge in practice, as\nlabeling segmentation masks is a highly labor-intensive process. To address\nthis issue, we present a framework for semi-supervised semantic segmentation,\nwhich is enhanced by self-supervised monocular depth estimation from unlabeled\nimage sequences. In particular, we propose three key contributions: (1) We\ntransfer knowledge from features learned during self-supervised depth\nestimation to semantic segmentation, (2) we implement a strong data\naugmentation by blending images and labels using the geometry of the scene, and\n(3) we utilize the depth feature diversity as well as the level of difficulty\nof learning depth in a student-teacher framework to select the most useful\nsamples to be annotated for semantic segmentation. We validate the proposed\nmodel on the Cityscapes dataset, where all three modules demonstrate\nsignificant performance gains, and we achieve state-of-the-art results for\nsemi-supervised semantic segmentation. The implementation is available at\nhttps://github.com/lhoyer/improving_segmentation_with_selfsupervised_depth.\n
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2012.10782
- https://arxiv.org/pdf/2012.10782
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4287550764
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4287550764Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2012.10782Digital Object Identifier
- Title
-
Three Ways to Improve Semantic Segmentation with Self-Supervised Depth\n EstimationWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2020Year of publication
- Publication date
-
2020-12-19Full publication date if available
- Authors
-
Lukas Hoyer, Dengxin Dai, Yuhua Chen, Adrian Köring, Suman Saha, Luc Van GoolList of authors in order
- Landing page
-
https://arxiv.org/abs/2012.10782Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2012.10782Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2012.10782Direct OA link when available
- Concepts
-
Segmentation, Computer science, Artificial intelligence, Feature (linguistics), Process (computing), Semantics (computer science), Monocular, Pattern recognition (psychology), Scale-space segmentation, Semantic feature, Machine learning, Image segmentation, Linguistics, Operating system, Philosophy, Programming languageTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4287550764 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2012.10782 |
| ids.openalex | https://openalex.org/W4287550764 |
| fwci | 0.0 |
| type | preprint |
| title | Three Ways to Improve Semantic Segmentation with Self-Supervised Depth\n Estimation |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10531 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9968000054359436 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Advanced Vision and Imaging |
| topics[1].id | https://openalex.org/T13114 |
| topics[1].field.id | https://openalex.org/fields/22 |
| topics[1].field.display_name | Engineering |
| topics[1].score | 0.9915000200271606 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/2214 |
| topics[1].subfield.display_name | Media Technology |
| topics[1].display_name | Image Processing Techniques and Applications |
| topics[2].id | https://openalex.org/T11606 |
| topics[2].field.id | https://openalex.org/fields/22 |
| topics[2].field.display_name | Engineering |
| topics[2].score | 0.9872000217437744 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/2205 |
| topics[2].subfield.display_name | Civil and Structural Engineering |
| topics[2].display_name | Infrastructure Maintenance and Monitoring |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C89600930 |
| concepts[0].level | 2 |
| concepts[0].score | 0.8734179735183716 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q1423946 |
| concepts[0].display_name | Segmentation |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.8016436100006104 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C154945302 |
| concepts[2].level | 1 |
| concepts[2].score | 0.6974334716796875 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[2].display_name | Artificial intelligence |
| concepts[3].id | https://openalex.org/C2776401178 |
| concepts[3].level | 2 |
| concepts[3].score | 0.5613366961479187 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q12050496 |
| concepts[3].display_name | Feature (linguistics) |
| concepts[4].id | https://openalex.org/C98045186 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5337112545967102 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q205663 |
| concepts[4].display_name | Process (computing) |
| concepts[5].id | https://openalex.org/C184337299 |
| concepts[5].level | 2 |
| concepts[5].score | 0.4920908808708191 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q1437428 |
| concepts[5].display_name | Semantics (computer science) |
| concepts[6].id | https://openalex.org/C65909025 |
| concepts[6].level | 2 |
| concepts[6].score | 0.4608535170555115 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q1945033 |
| concepts[6].display_name | Monocular |
| concepts[7].id | https://openalex.org/C153180895 |
| concepts[7].level | 2 |
| concepts[7].score | 0.45716726779937744 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q7148389 |
| concepts[7].display_name | Pattern recognition (psychology) |
| concepts[8].id | https://openalex.org/C65885262 |
| concepts[8].level | 4 |
| concepts[8].score | 0.44215235114097595 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q7429708 |
| concepts[8].display_name | Scale-space segmentation |
| concepts[9].id | https://openalex.org/C2781122975 |
| concepts[9].level | 2 |
| concepts[9].score | 0.4139872193336487 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q16928266 |
| concepts[9].display_name | Semantic feature |
| concepts[10].id | https://openalex.org/C119857082 |
| concepts[10].level | 1 |
| concepts[10].score | 0.4014217257499695 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q2539 |
| concepts[10].display_name | Machine learning |
| concepts[11].id | https://openalex.org/C124504099 |
| concepts[11].level | 3 |
| concepts[11].score | 0.39340871572494507 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q56933 |
| concepts[11].display_name | Image segmentation |
| concepts[12].id | https://openalex.org/C41895202 |
| concepts[12].level | 1 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q8162 |
| concepts[12].display_name | Linguistics |
| concepts[13].id | https://openalex.org/C111919701 |
| concepts[13].level | 1 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q9135 |
| concepts[13].display_name | Operating system |
| concepts[14].id | https://openalex.org/C138885662 |
| concepts[14].level | 0 |
| concepts[14].score | 0.0 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q5891 |
| concepts[14].display_name | Philosophy |
| concepts[15].id | https://openalex.org/C199360897 |
| concepts[15].level | 1 |
| concepts[15].score | 0.0 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q9143 |
| concepts[15].display_name | Programming language |
| keywords[0].id | https://openalex.org/keywords/segmentation |
| keywords[0].score | 0.8734179735183716 |
| keywords[0].display_name | Segmentation |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.8016436100006104 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[2].score | 0.6974334716796875 |
| keywords[2].display_name | Artificial intelligence |
| keywords[3].id | https://openalex.org/keywords/feature |
| keywords[3].score | 0.5613366961479187 |
| keywords[3].display_name | Feature (linguistics) |
| keywords[4].id | https://openalex.org/keywords/process |
| keywords[4].score | 0.5337112545967102 |
| keywords[4].display_name | Process (computing) |
| keywords[5].id | https://openalex.org/keywords/semantics |
| keywords[5].score | 0.4920908808708191 |
| keywords[5].display_name | Semantics (computer science) |
| keywords[6].id | https://openalex.org/keywords/monocular |
| keywords[6].score | 0.4608535170555115 |
| keywords[6].display_name | Monocular |
| keywords[7].id | https://openalex.org/keywords/pattern-recognition |
| keywords[7].score | 0.45716726779937744 |
| keywords[7].display_name | Pattern recognition (psychology) |
| keywords[8].id | https://openalex.org/keywords/scale-space-segmentation |
| keywords[8].score | 0.44215235114097595 |
| keywords[8].display_name | Scale-space segmentation |
| keywords[9].id | https://openalex.org/keywords/semantic-feature |
| keywords[9].score | 0.4139872193336487 |
| keywords[9].display_name | Semantic feature |
| keywords[10].id | https://openalex.org/keywords/machine-learning |
| keywords[10].score | 0.4014217257499695 |
| keywords[10].display_name | Machine learning |
| keywords[11].id | https://openalex.org/keywords/image-segmentation |
| keywords[11].score | 0.39340871572494507 |
| keywords[11].display_name | Image segmentation |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2012.10782 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2012.10782 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2012.10782 |
| indexed_in | arxiv |
| authorships[0].author.id | https://openalex.org/A5053328232 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-7391-0676 |
| authorships[0].author.display_name | Lukas Hoyer |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Hoyer, Lukas |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5078838951 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-5440-9678 |
| authorships[1].author.display_name | Dengxin Dai |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Dai, Dengxin |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5100384516 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-4124-1148 |
| authorships[2].author.display_name | Yuhua Chen |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Chen, Yuhua |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5016670590 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Adrian Köring |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Köring, Adrian |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5114686140 |
| authorships[4].author.orcid | https://orcid.org/0009-0005-9440-6785 |
| authorships[4].author.display_name | Suman Saha |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Saha, Suman |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5001254143 |
| authorships[5].author.orcid | https://orcid.org/0000-0002-3445-5711 |
| authorships[5].author.display_name | Luc Van Gool |
| authorships[5].author_position | last |
| authorships[5].raw_author_name | Van Gool, Luc |
| authorships[5].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2012.10782 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Three Ways to Improve Semantic Segmentation with Self-Supervised Depth\n Estimation |
| has_fulltext | True |
| is_retracted | False |
| updated_date | 2025-11-06T03:46:38.306776 |
| primary_topic.id | https://openalex.org/T10531 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9968000054359436 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Advanced Vision and Imaging |
| related_works | https://openalex.org/W2185902295, https://openalex.org/W2103507220, https://openalex.org/W3144569342, https://openalex.org/W2945274617, https://openalex.org/W4313052709, https://openalex.org/W2022929107, https://openalex.org/W2055202857, https://openalex.org/W4205800335, https://openalex.org/W2758994127, https://openalex.org/W2386644571 |
| cited_by_count | 0 |
| locations_count | 1 |
| best_oa_location.id | pmh:oai:arXiv.org:2012.10782 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2012.10782 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2012.10782 |
| primary_location.id | pmh:oai:arXiv.org:2012.10782 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2012.10782 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2012.10782 |
| publication_date | 2020-12-19 |
| publication_year | 2020 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 14, 23, 32, 70, 101 |
| abstract_inverted_index.In | 48 |
| abstract_inverted_index.To | 27 |
| abstract_inverted_index.We | 115 |
| abstract_inverted_index.as | 91, 93 |
| abstract_inverted_index.be | 110 |
| abstract_inverted_index.by | 40, 73 |
| abstract_inverted_index.in | 17, 100 |
| abstract_inverted_index.is | 22, 38, 140 |
| abstract_inverted_index.of | 81, 96 |
| abstract_inverted_index.on | 119 |
| abstract_inverted_index.to | 64, 104, 109 |
| abstract_inverted_index.we | 30, 50, 68, 85, 131 |
| abstract_inverted_index.(1) | 55 |
| abstract_inverted_index.(2) | 67 |
| abstract_inverted_index.The | 138 |
| abstract_inverted_index.all | 124 |
| abstract_inverted_index.and | 76, 130 |
| abstract_inverted_index.for | 3, 34, 112 |
| abstract_inverted_index.key | 53 |
| abstract_inverted_index.the | 79, 82, 87, 94, 106, 117, 120 |
| abstract_inverted_index.deep | 1 |
| abstract_inverted_index.from | 45, 58 |
| abstract_inverted_index.most | 107 |
| abstract_inverted_index.well | 92 |
| abstract_inverted_index.data, | 11 |
| abstract_inverted_index.depth | 43, 88, 99 |
| abstract_inverted_index.large | 7 |
| abstract_inverted_index.level | 95 |
| abstract_inverted_index.major | 15 |
| abstract_inverted_index.masks | 21 |
| abstract_inverted_index.three | 52, 125 |
| abstract_inverted_index.using | 78 |
| abstract_inverted_index.where | 123 |
| abstract_inverted_index.which | 12 |
| abstract_inverted_index.during | 61 |
| abstract_inverted_index.gains, | 129 |
| abstract_inverted_index.highly | 24 |
| abstract_inverted_index.images | 75 |
| abstract_inverted_index.issue, | 29 |
| abstract_inverted_index.labels | 77 |
| abstract_inverted_index.scene, | 83 |
| abstract_inverted_index.select | 105 |
| abstract_inverted_index.strong | 71 |
| abstract_inverted_index.achieve | 132 |
| abstract_inverted_index.amounts | 8 |
| abstract_inverted_index.feature | 89 |
| abstract_inverted_index.learned | 60 |
| abstract_inverted_index.modules | 126 |
| abstract_inverted_index.present | 31 |
| abstract_inverted_index.propose | 51 |
| abstract_inverted_index.results | 134 |
| abstract_inverted_index.utilize | 86 |
| abstract_inverted_index.Training | 0 |
| abstract_inverted_index.and\n(3) | 84 |
| abstract_inverted_index.blending | 74 |
| abstract_inverted_index.dataset, | 122 |
| abstract_inverted_index.enhanced | 39 |
| abstract_inverted_index.features | 59 |
| abstract_inverted_index.geometry | 80 |
| abstract_inverted_index.learning | 98 |
| abstract_inverted_index.networks | 2 |
| abstract_inverted_index.presents | 13 |
| abstract_inverted_index.process. | 26 |
| abstract_inverted_index.requires | 6 |
| abstract_inverted_index.semantic | 4, 36, 65, 113, 136 |
| abstract_inverted_index.training | 10 |
| abstract_inverted_index.validate | 116 |
| abstract_inverted_index.annotated | 111 |
| abstract_inverted_index.available | 141 |
| abstract_inverted_index.challenge | 16 |
| abstract_inverted_index.diversity | 90 |
| abstract_inverted_index.framework | 33, 103 |
| abstract_inverted_index.implement | 69 |
| abstract_inverted_index.knowledge | 57 |
| abstract_inverted_index.monocular | 42 |
| abstract_inverted_index.practice, | 18 |
| abstract_inverted_index.Cityscapes | 121 |
| abstract_inverted_index.estimation | 44 |
| abstract_inverted_index.sequences. | 47 |
| abstract_inverted_index.of\nlabeled | 9 |
| abstract_inverted_index.particular, | 49 |
| abstract_inverted_index.performance | 128 |
| abstract_inverted_index.We\ntransfer | 56 |
| abstract_inverted_index.as\nlabeling | 19 |
| abstract_inverted_index.segmentation | 5, 20 |
| abstract_inverted_index.address\nthis | 28 |
| abstract_inverted_index.segmentation, | 66 |
| abstract_inverted_index.segmentation. | 114, 137 |
| abstract_inverted_index.contributions: | 54 |
| abstract_inverted_index.difficulty\nof | 97 |
| abstract_inverted_index.implementation | 139 |
| abstract_inverted_index.labor-intensive | 25 |
| abstract_inverted_index.proposed\nmodel | 118 |
| abstract_inverted_index.self-supervised | 41, 62 |
| abstract_inverted_index.semi-supervised | 35 |
| abstract_inverted_index.student-teacher | 102 |
| abstract_inverted_index.useful\nsamples | 108 |
| abstract_inverted_index.state-of-the-art | 133 |
| abstract_inverted_index.unlabeled\nimage | 46 |
| abstract_inverted_index.depth\nestimation | 63 |
| abstract_inverted_index.data\naugmentation | 72 |
| abstract_inverted_index.for\nsemi-supervised | 135 |
| abstract_inverted_index.segmentation,\nwhich | 37 |
| abstract_inverted_index.demonstrate\nsignificant | 127 |
| abstract_inverted_index.at\nhttps://github.com/lhoyer/improving_segmentation_with_selfsupervised_depth.\n | 142 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 6 |
| citation_normalized_percentile.value | 0.23826513 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | False |