Towards Visual Foundational Models of Physical Scenes Article Swipe
YOU?
·
· 2023
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2306.03727
We describe a first step towards learning general-purpose visual representations of physical scenes using only image prediction as a training criterion. To do so, we first define "physical scene" and show that, even though different agents may maintain different representations of the same scene, the underlying physical scene that can be inferred is unique. Then, we show that NeRFs cannot represent the physical scene, as they lack extrapolation mechanisms. Those, however, could be provided by Diffusion Models, at least in theory. To test this hypothesis empirically, NeRFs can be combined with Diffusion Models, a process we refer to as NeRF Diffusion, used as unsupervised representations of the physical scene. Our analysis is limited to visual data, without external grounding mechanisms that can be provided by independent sensory modalities.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2306.03727
- https://arxiv.org/pdf/2306.03727
- OA Status
- green
- Cited By
- 1
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4379924972
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4379924972Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2306.03727Digital Object Identifier
- Title
-
Towards Visual Foundational Models of Physical ScenesWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2023Year of publication
- Publication date
-
2023-06-06Full publication date if available
- Authors
-
Chethan M. Parameshwara, Alessandro Achille, Matthew Trager, Xiaolong Li, Jiawei Mo, Matthew Trager, Ashwin Swaminathan, Chris Taylor, Dheera Venkatraman, Xiaohan Fei, Stefano SoattoList of authors in order
- Landing page
-
https://arxiv.org/abs/2306.03727Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2306.03727Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2306.03727Direct OA link when available
- Concepts
-
Computer science, Extrapolation, Artificial intelligence, Process (computing), Diffusion, Computer vision, Mathematics, Physics, Mathematical analysis, Operating system, ThermodynamicsTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2024: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4379924972 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2306.03727 |
| ids.doi | https://doi.org/10.48550/arxiv.2306.03727 |
| ids.openalex | https://openalex.org/W4379924972 |
| fwci | |
| type | preprint |
| title | Towards Visual Foundational Models of Physical Scenes |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10775 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9085999727249146 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Generative Adversarial Networks and Image Synthesis |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.6644890904426575 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C132459708 |
| concepts[1].level | 2 |
| concepts[1].score | 0.6503655910491943 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q744069 |
| concepts[1].display_name | Extrapolation |
| concepts[2].id | https://openalex.org/C154945302 |
| concepts[2].level | 1 |
| concepts[2].score | 0.5659711956977844 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[2].display_name | Artificial intelligence |
| concepts[3].id | https://openalex.org/C98045186 |
| concepts[3].level | 2 |
| concepts[3].score | 0.4708801209926605 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q205663 |
| concepts[3].display_name | Process (computing) |
| concepts[4].id | https://openalex.org/C69357855 |
| concepts[4].level | 2 |
| concepts[4].score | 0.4437289535999298 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q163214 |
| concepts[4].display_name | Diffusion |
| concepts[5].id | https://openalex.org/C31972630 |
| concepts[5].level | 1 |
| concepts[5].score | 0.38769668340682983 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q844240 |
| concepts[5].display_name | Computer vision |
| concepts[6].id | https://openalex.org/C33923547 |
| concepts[6].level | 0 |
| concepts[6].score | 0.21773439645767212 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[6].display_name | Mathematics |
| concepts[7].id | https://openalex.org/C121332964 |
| concepts[7].level | 0 |
| concepts[7].score | 0.0 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q413 |
| concepts[7].display_name | Physics |
| concepts[8].id | https://openalex.org/C134306372 |
| concepts[8].level | 1 |
| concepts[8].score | 0.0 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q7754 |
| concepts[8].display_name | Mathematical analysis |
| concepts[9].id | https://openalex.org/C111919701 |
| concepts[9].level | 1 |
| concepts[9].score | 0.0 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q9135 |
| concepts[9].display_name | Operating system |
| concepts[10].id | https://openalex.org/C97355855 |
| concepts[10].level | 1 |
| concepts[10].score | 0.0 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q11473 |
| concepts[10].display_name | Thermodynamics |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.6644890904426575 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/extrapolation |
| keywords[1].score | 0.6503655910491943 |
| keywords[1].display_name | Extrapolation |
| keywords[2].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[2].score | 0.5659711956977844 |
| keywords[2].display_name | Artificial intelligence |
| keywords[3].id | https://openalex.org/keywords/process |
| keywords[3].score | 0.4708801209926605 |
| keywords[3].display_name | Process (computing) |
| keywords[4].id | https://openalex.org/keywords/diffusion |
| keywords[4].score | 0.4437289535999298 |
| keywords[4].display_name | Diffusion |
| keywords[5].id | https://openalex.org/keywords/computer-vision |
| keywords[5].score | 0.38769668340682983 |
| keywords[5].display_name | Computer vision |
| keywords[6].id | https://openalex.org/keywords/mathematics |
| keywords[6].score | 0.21773439645767212 |
| keywords[6].display_name | Mathematics |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2306.03727 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2306.03727 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2306.03727 |
| locations[1].id | doi:10.48550/arxiv.2306.03727 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2306.03727 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5075055783 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Chethan M. Parameshwara |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Parameshwara, Chethan |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5065386783 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-8163-8326 |
| authorships[1].author.display_name | Alessandro Achille |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Achille, Alessandro |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5030073760 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-9304-6427 |
| authorships[2].author.display_name | Matthew Trager |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Trager, Matthew |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5100371535 |
| authorships[3].author.orcid | https://orcid.org/0000-0001-7493-2650 |
| authorships[3].author.display_name | Xiaolong Li |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Li, Xiaolong |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5087799311 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-1043-5353 |
| authorships[4].author.display_name | Jiawei Mo |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Mo, Jiawei |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5030073760 |
| authorships[5].author.orcid | https://orcid.org/0000-0001-9304-6427 |
| authorships[5].author.display_name | Matthew Trager |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Trager, Matthew |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5084355679 |
| authorships[6].author.orcid | https://orcid.org/0000-0002-4279-369X |
| authorships[6].author.display_name | Ashwin Swaminathan |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Swaminathan, Ashwin |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5067224766 |
| authorships[7].author.orcid | https://orcid.org/0000-0001-7867-9533 |
| authorships[7].author.display_name | Chris Taylor |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Taylor, CJ |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A5018993812 |
| authorships[8].author.orcid | |
| authorships[8].author.display_name | Dheera Venkatraman |
| authorships[8].author_position | middle |
| authorships[8].raw_author_name | Venkatraman, Dheera |
| authorships[8].is_corresponding | False |
| authorships[9].author.id | https://openalex.org/A5087030407 |
| authorships[9].author.orcid | https://orcid.org/0000-0002-1030-2286 |
| authorships[9].author.display_name | Xiaohan Fei |
| authorships[9].author_position | middle |
| authorships[9].raw_author_name | Fei, Xiaohan |
| authorships[9].is_corresponding | False |
| authorships[10].author.id | https://openalex.org/A5038328783 |
| authorships[10].author.orcid | https://orcid.org/0000-0003-2902-6362 |
| authorships[10].author.display_name | Stefano Soatto |
| authorships[10].author_position | last |
| authorships[10].raw_author_name | Soatto, Stefano |
| authorships[10].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2306.03727 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Towards Visual Foundational Models of Physical Scenes |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10775 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9085999727249146 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Generative Adversarial Networks and Image Synthesis |
| related_works | https://openalex.org/W2058170566, https://openalex.org/W2755342338, https://openalex.org/W2772917594, https://openalex.org/W2775347418, https://openalex.org/W2166024367, https://openalex.org/W3116076068, https://openalex.org/W2229312674, https://openalex.org/W2951359407, https://openalex.org/W2079911747, https://openalex.org/W1969923398 |
| cited_by_count | 1 |
| counts_by_year[0].year | 2024 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2306.03727 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2306.03727 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2306.03727 |
| primary_location.id | pmh:oai:arXiv.org:2306.03727 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2306.03727 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2306.03727 |
| publication_date | 2023-06-06 |
| publication_year | 2023 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 2, 18, 93 |
| abstract_inverted_index.To | 21, 81 |
| abstract_inverted_index.We | 0 |
| abstract_inverted_index.as | 17, 64, 98, 102 |
| abstract_inverted_index.at | 77 |
| abstract_inverted_index.be | 50, 72, 88, 122 |
| abstract_inverted_index.by | 74, 124 |
| abstract_inverted_index.do | 22 |
| abstract_inverted_index.in | 79 |
| abstract_inverted_index.is | 52, 111 |
| abstract_inverted_index.of | 10, 40, 105 |
| abstract_inverted_index.to | 97, 113 |
| abstract_inverted_index.we | 24, 55, 95 |
| abstract_inverted_index.Our | 109 |
| abstract_inverted_index.and | 29 |
| abstract_inverted_index.can | 49, 87, 121 |
| abstract_inverted_index.may | 36 |
| abstract_inverted_index.so, | 23 |
| abstract_inverted_index.the | 41, 44, 61, 106 |
| abstract_inverted_index.NeRF | 99 |
| abstract_inverted_index.even | 32 |
| abstract_inverted_index.lack | 66 |
| abstract_inverted_index.only | 14 |
| abstract_inverted_index.same | 42 |
| abstract_inverted_index.show | 30, 56 |
| abstract_inverted_index.step | 4 |
| abstract_inverted_index.test | 82 |
| abstract_inverted_index.that | 48, 57, 120 |
| abstract_inverted_index.they | 65 |
| abstract_inverted_index.this | 83 |
| abstract_inverted_index.used | 101 |
| abstract_inverted_index.with | 90 |
| abstract_inverted_index.NeRFs | 58, 86 |
| abstract_inverted_index.Then, | 54 |
| abstract_inverted_index.could | 71 |
| abstract_inverted_index.data, | 115 |
| abstract_inverted_index.first | 3, 25 |
| abstract_inverted_index.image | 15 |
| abstract_inverted_index.least | 78 |
| abstract_inverted_index.refer | 96 |
| abstract_inverted_index.scene | 47 |
| abstract_inverted_index.that, | 31 |
| abstract_inverted_index.using | 13 |
| abstract_inverted_index.Those, | 69 |
| abstract_inverted_index.agents | 35 |
| abstract_inverted_index.cannot | 59 |
| abstract_inverted_index.define | 26 |
| abstract_inverted_index.scene" | 28 |
| abstract_inverted_index.scene, | 43, 63 |
| abstract_inverted_index.scene. | 108 |
| abstract_inverted_index.scenes | 12 |
| abstract_inverted_index.though | 33 |
| abstract_inverted_index.visual | 8, 114 |
| abstract_inverted_index.Models, | 76, 92 |
| abstract_inverted_index.limited | 112 |
| abstract_inverted_index.process | 94 |
| abstract_inverted_index.sensory | 126 |
| abstract_inverted_index.theory. | 80 |
| abstract_inverted_index.towards | 5 |
| abstract_inverted_index.unique. | 53 |
| abstract_inverted_index.without | 116 |
| abstract_inverted_index.analysis | 110 |
| abstract_inverted_index.combined | 89 |
| abstract_inverted_index.describe | 1 |
| abstract_inverted_index.external | 117 |
| abstract_inverted_index.however, | 70 |
| abstract_inverted_index.inferred | 51 |
| abstract_inverted_index.learning | 6 |
| abstract_inverted_index.maintain | 37 |
| abstract_inverted_index.physical | 11, 46, 62, 107 |
| abstract_inverted_index.provided | 73, 123 |
| abstract_inverted_index.training | 19 |
| abstract_inverted_index."physical | 27 |
| abstract_inverted_index.Diffusion | 75, 91 |
| abstract_inverted_index.different | 34, 38 |
| abstract_inverted_index.grounding | 118 |
| abstract_inverted_index.represent | 60 |
| abstract_inverted_index.Diffusion, | 100 |
| abstract_inverted_index.criterion. | 20 |
| abstract_inverted_index.hypothesis | 84 |
| abstract_inverted_index.mechanisms | 119 |
| abstract_inverted_index.prediction | 16 |
| abstract_inverted_index.underlying | 45 |
| abstract_inverted_index.independent | 125 |
| abstract_inverted_index.mechanisms. | 68 |
| abstract_inverted_index.modalities. | 127 |
| abstract_inverted_index.empirically, | 85 |
| abstract_inverted_index.unsupervised | 103 |
| abstract_inverted_index.extrapolation | 67 |
| abstract_inverted_index.general-purpose | 7 |
| abstract_inverted_index.representations | 9, 39, 104 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 11 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/10 |
| sustainable_development_goals[0].score | 0.5199999809265137 |
| sustainable_development_goals[0].display_name | Reduced inequalities |
| citation_normalized_percentile |