Understanding generative AI output with embedding models Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.1126/sciadv.adx4082
Constructing high-quality features is critical to any quantitative data analysis. While feature engineering was historically addressed by carefully handcrafting data representations on the basis of domain expertise, deep neural networks (DNNs) now offer a radically different approach. DNNs implicitly engineer features by transforming their input data into hidden feature vectors called embeddings. For embedding vectors produced by foundation models—which are trained to be useful across many contexts—we demonstrate that simple and well-studied dimensionality-reduction techniques such as principal components analysis uncover inherent heterogeneity in input data concordant with human-understandable explanations. Of the many applications for this framework, we find empirical evidence that there is intrinsic separability between real samples and those generated by artificial intelligence.
Related Topics
- Type
- article
- Language
- en
- Landing Page
- https://doi.org/10.1126/sciadv.adx4082
- OA Status
- gold
- References
- 47
- OpenAlex ID
- https://openalex.org/W4416775306
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4416775306Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.1126/sciadv.adx4082Digital Object Identifier
- Title
-
Understanding generative AI output with embedding modelsWork title
- Type
-
articleOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-11-26Full publication date if available
- Authors
-
Max Vargas, Andrew G. Engel, Tony ChiangList of authors in order
- Landing page
-
https://doi.org/10.1126/sciadv.adx4082Publisher landing page
- Open access
-
YesWhether a free full text is available
- OA status
-
goldOpen access status per OpenAlex
- OA URL
-
https://doi.org/10.1126/sciadv.adx4082Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
- References (count)
-
47Number of works referenced by this work
Full payload
| id | https://openalex.org/W4416775306 |
|---|---|
| doi | https://doi.org/10.1126/sciadv.adx4082 |
| ids.doi | https://doi.org/10.1126/sciadv.adx4082 |
| ids.pmid | https://pubmed.ncbi.nlm.nih.gov/41296845 |
| ids.openalex | https://openalex.org/W4416775306 |
| fwci | |
| type | article |
| title | Understanding generative AI output with embedding models |
| biblio.issue | 48 |
| biblio.volume | 11 |
| biblio.last_page | eadx4082 |
| biblio.first_page | eadx4082 |
| is_xpac | False |
| apc_list.value | 4500 |
| apc_list.currency | USD |
| apc_list.value_usd | 4500 |
| apc_paid.value | 4500 |
| apc_paid.currency | USD |
| apc_paid.value_usd | 4500 |
| language | en |
| locations[0].id | doi:10.1126/sciadv.adx4082 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S2737427234 |
| locations[0].source.issn | 2375-2548 |
| locations[0].source.type | journal |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | 2375-2548 |
| locations[0].source.is_core | True |
| locations[0].source.is_in_doaj | True |
| locations[0].source.display_name | Science Advances |
| locations[0].source.host_organization | https://openalex.org/P4310315823 |
| locations[0].source.host_organization_name | American Association for the Advancement of Science |
| locations[0].source.host_organization_lineage | https://openalex.org/P4310315823 |
| locations[0].source.host_organization_lineage_names | American Association for the Advancement of Science |
| locations[0].license | |
| locations[0].pdf_url | |
| locations[0].version | publishedVersion |
| locations[0].raw_type | journal-article |
| locations[0].license_id | |
| locations[0].is_accepted | True |
| locations[0].is_published | True |
| locations[0].raw_source_name | Science Advances |
| locations[0].landing_page_url | https://doi.org/10.1126/sciadv.adx4082 |
| locations[1].id | pmid:41296845 |
| locations[1].is_oa | False |
| locations[1].source.id | https://openalex.org/S4306525036 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | False |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | PubMed |
| locations[1].source.host_organization | https://openalex.org/I1299303238 |
| locations[1].source.host_organization_name | National Institutes of Health |
| locations[1].source.host_organization_lineage | https://openalex.org/I1299303238 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | publishedVersion |
| locations[1].raw_type | |
| locations[1].license_id | |
| locations[1].is_accepted | True |
| locations[1].is_published | True |
| locations[1].raw_source_name | Science advances |
| locations[1].landing_page_url | https://pubmed.ncbi.nlm.nih.gov/41296845 |
| indexed_in | crossref, doaj, pubmed |
| authorships[0].author.id | https://openalex.org/A5108695327 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-5977-046X |
| authorships[0].author.display_name | Max Vargas |
| authorships[0].countries | US |
| authorships[0].affiliations[0].institution_ids | https://openalex.org/I142606810 |
| authorships[0].affiliations[0].raw_affiliation_string | Pacific Northwest National Laboratory, Richland, WA 99354, USA. |
| authorships[0].institutions[0].id | https://openalex.org/I142606810 |
| authorships[0].institutions[0].ror | https://ror.org/05h992307 |
| authorships[0].institutions[0].type | facility |
| authorships[0].institutions[0].lineage | https://openalex.org/I1325736334, https://openalex.org/I1330989302, https://openalex.org/I142606810, https://openalex.org/I39565521 |
| authorships[0].institutions[0].country_code | US |
| authorships[0].institutions[0].display_name | Pacific Northwest National Laboratory |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Max Vargas |
| authorships[0].is_corresponding | False |
| authorships[0].raw_affiliation_strings | Pacific Northwest National Laboratory, Richland, WA 99354, USA. |
| authorships[1].author.id | https://openalex.org/A5020189594 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-2348-483X |
| authorships[1].author.display_name | Andrew G. Engel |
| authorships[1].countries | US |
| authorships[1].affiliations[0].institution_ids | https://openalex.org/I142606810 |
| authorships[1].affiliations[0].raw_affiliation_string | Pacific Northwest National Laboratory, Richland, WA 99354, USA. |
| authorships[1].institutions[0].id | https://openalex.org/I142606810 |
| authorships[1].institutions[0].ror | https://ror.org/05h992307 |
| authorships[1].institutions[0].type | facility |
| authorships[1].institutions[0].lineage | https://openalex.org/I1325736334, https://openalex.org/I1330989302, https://openalex.org/I142606810, https://openalex.org/I39565521 |
| authorships[1].institutions[0].country_code | US |
| authorships[1].institutions[0].display_name | Pacific Northwest National Laboratory |
| authorships[1].author_position | last |
| authorships[1].raw_author_name | Andrew Engel |
| authorships[1].is_corresponding | False |
| authorships[1].raw_affiliation_strings | Pacific Northwest National Laboratory, Richland, WA 99354, USA. |
| authorships[2].author.id | https://openalex.org/A5088679899 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-9817-2526 |
| authorships[2].author.display_name | Tony Chiang |
| authorships[2].countries | US |
| authorships[2].affiliations[0].institution_ids | https://openalex.org/I201448701 |
| authorships[2].affiliations[0].raw_affiliation_string | Department of Mathematics, University of Washington, Seattle, WA 98195, USA. |
| authorships[2].affiliations[1].institution_ids | https://openalex.org/I142606810 |
| authorships[2].affiliations[1].raw_affiliation_string | Pacific Northwest National Laboratory, Richland, WA 99354, USA. |
| authorships[2].institutions[0].id | https://openalex.org/I142606810 |
| authorships[2].institutions[0].ror | https://ror.org/05h992307 |
| authorships[2].institutions[0].type | facility |
| authorships[2].institutions[0].lineage | https://openalex.org/I1325736334, https://openalex.org/I1330989302, https://openalex.org/I142606810, https://openalex.org/I39565521 |
| authorships[2].institutions[0].country_code | US |
| authorships[2].institutions[0].display_name | Pacific Northwest National Laboratory |
| authorships[2].institutions[1].id | https://openalex.org/I201448701 |
| authorships[2].institutions[1].ror | https://ror.org/00cvxb145 |
| authorships[2].institutions[1].type | education |
| authorships[2].institutions[1].lineage | https://openalex.org/I201448701 |
| authorships[2].institutions[1].country_code | US |
| authorships[2].institutions[1].display_name | University of Washington |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Tony Chiang |
| authorships[2].is_corresponding | False |
| authorships[2].raw_affiliation_strings | Department of Mathematics, University of Washington, Seattle, WA 98195, USA., Pacific Northwest National Laboratory, Richland, WA 99354, USA. |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://doi.org/10.1126/sciadv.adx4082 |
| open_access.oa_status | gold |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-11-27T00:00:00 |
| display_name | Understanding generative AI output with embedding models |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-28T23:14:17.795251 |
| primary_topic | |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | doi:10.1126/sciadv.adx4082 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S2737427234 |
| best_oa_location.source.issn | 2375-2548 |
| best_oa_location.source.type | journal |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | 2375-2548 |
| best_oa_location.source.is_core | True |
| best_oa_location.source.is_in_doaj | True |
| best_oa_location.source.display_name | Science Advances |
| best_oa_location.source.host_organization | https://openalex.org/P4310315823 |
| best_oa_location.source.host_organization_name | American Association for the Advancement of Science |
| best_oa_location.source.host_organization_lineage | https://openalex.org/P4310315823 |
| best_oa_location.source.host_organization_lineage_names | American Association for the Advancement of Science |
| best_oa_location.license | |
| best_oa_location.pdf_url | |
| best_oa_location.version | publishedVersion |
| best_oa_location.raw_type | journal-article |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | True |
| best_oa_location.is_published | True |
| best_oa_location.raw_source_name | Science Advances |
| best_oa_location.landing_page_url | https://doi.org/10.1126/sciadv.adx4082 |
| primary_location.id | doi:10.1126/sciadv.adx4082 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S2737427234 |
| primary_location.source.issn | 2375-2548 |
| primary_location.source.type | journal |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | 2375-2548 |
| primary_location.source.is_core | True |
| primary_location.source.is_in_doaj | True |
| primary_location.source.display_name | Science Advances |
| primary_location.source.host_organization | https://openalex.org/P4310315823 |
| primary_location.source.host_organization_name | American Association for the Advancement of Science |
| primary_location.source.host_organization_lineage | https://openalex.org/P4310315823 |
| primary_location.source.host_organization_lineage_names | American Association for the Advancement of Science |
| primary_location.license | |
| primary_location.pdf_url | |
| primary_location.version | publishedVersion |
| primary_location.raw_type | journal-article |
| primary_location.license_id | |
| primary_location.is_accepted | True |
| primary_location.is_published | True |
| primary_location.raw_source_name | Science Advances |
| primary_location.landing_page_url | https://doi.org/10.1126/sciadv.adx4082 |
| publication_date | 2025-11-26 |
| publication_year | 2025 |
| referenced_works | https://openalex.org/W4312933868, https://openalex.org/W2163922914, https://openalex.org/W4312910992, https://openalex.org/W2999634272, https://openalex.org/W3173798466, https://openalex.org/W4389518382, https://openalex.org/W2282821441, https://openalex.org/W2757251151, https://openalex.org/W2096192437, https://openalex.org/W4411337932, https://openalex.org/W4411403346, https://openalex.org/W3202070718, https://openalex.org/W2296719434, https://openalex.org/W3217648845, https://openalex.org/W2100664256, https://openalex.org/W2517041761, https://openalex.org/W3018638193, https://openalex.org/W4392251648, https://openalex.org/W3039883906, https://openalex.org/W2031342017, https://openalex.org/W3181414820, https://openalex.org/W4327743697, https://openalex.org/W3216556018, https://openalex.org/W4231610351, https://openalex.org/W2046686077, https://openalex.org/W4399365598, https://openalex.org/W2108598243, https://openalex.org/W2100056901, https://openalex.org/W4404396281, https://openalex.org/W4200501644, https://openalex.org/W4415800446, https://openalex.org/W4367051110, https://openalex.org/W4388002357, https://openalex.org/W4213187648, https://openalex.org/W4401132252, https://openalex.org/W4404783034, https://openalex.org/W4389520749, https://openalex.org/W2995523160, https://openalex.org/W4399521940, https://openalex.org/W4400949264, https://openalex.org/W2294798173, https://openalex.org/W2001619934, https://openalex.org/W4386576685, https://openalex.org/W1901129140, https://openalex.org/W3102690631, https://openalex.org/W2370342765, https://openalex.org/W4312388283 |
| referenced_works_count | 47 |
| abstract_inverted_index.a | 33 |
| abstract_inverted_index.Of | 89 |
| abstract_inverted_index.as | 75 |
| abstract_inverted_index.be | 62 |
| abstract_inverted_index.by | 16, 41, 56, 111 |
| abstract_inverted_index.in | 82 |
| abstract_inverted_index.is | 3, 102 |
| abstract_inverted_index.of | 24 |
| abstract_inverted_index.on | 21 |
| abstract_inverted_index.to | 5, 61 |
| abstract_inverted_index.we | 96 |
| abstract_inverted_index.For | 52 |
| abstract_inverted_index.and | 70, 108 |
| abstract_inverted_index.any | 6 |
| abstract_inverted_index.are | 59 |
| abstract_inverted_index.for | 93 |
| abstract_inverted_index.now | 31 |
| abstract_inverted_index.the | 22, 90 |
| abstract_inverted_index.was | 13 |
| abstract_inverted_index.DNNs | 37 |
| abstract_inverted_index.data | 8, 19, 45, 84 |
| abstract_inverted_index.deep | 27 |
| abstract_inverted_index.find | 97 |
| abstract_inverted_index.into | 46 |
| abstract_inverted_index.many | 65, 91 |
| abstract_inverted_index.real | 106 |
| abstract_inverted_index.such | 74 |
| abstract_inverted_index.that | 68, 100 |
| abstract_inverted_index.this | 94 |
| abstract_inverted_index.with | 86 |
| abstract_inverted_index.While | 10 |
| abstract_inverted_index.basis | 23 |
| abstract_inverted_index.input | 44, 83 |
| abstract_inverted_index.offer | 32 |
| abstract_inverted_index.their | 43 |
| abstract_inverted_index.there | 101 |
| abstract_inverted_index.those | 109 |
| abstract_inverted_index.(DNNs) | 30 |
| abstract_inverted_index.across | 64 |
| abstract_inverted_index.called | 50 |
| abstract_inverted_index.domain | 25 |
| abstract_inverted_index.hidden | 47 |
| abstract_inverted_index.neural | 28 |
| abstract_inverted_index.simple | 69 |
| abstract_inverted_index.useful | 63 |
| abstract_inverted_index.between | 105 |
| abstract_inverted_index.feature | 11, 48 |
| abstract_inverted_index.samples | 107 |
| abstract_inverted_index.trained | 60 |
| abstract_inverted_index.uncover | 79 |
| abstract_inverted_index.vectors | 49, 54 |
| abstract_inverted_index.analysis | 78 |
| abstract_inverted_index.critical | 4 |
| abstract_inverted_index.engineer | 39 |
| abstract_inverted_index.evidence | 99 |
| abstract_inverted_index.features | 2, 40 |
| abstract_inverted_index.inherent | 80 |
| abstract_inverted_index.networks | 29 |
| abstract_inverted_index.produced | 55 |
| abstract_inverted_index.addressed | 15 |
| abstract_inverted_index.analysis. | 9 |
| abstract_inverted_index.approach. | 36 |
| abstract_inverted_index.carefully | 17 |
| abstract_inverted_index.different | 35 |
| abstract_inverted_index.embedding | 53 |
| abstract_inverted_index.empirical | 98 |
| abstract_inverted_index.generated | 110 |
| abstract_inverted_index.intrinsic | 103 |
| abstract_inverted_index.principal | 76 |
| abstract_inverted_index.radically | 34 |
| abstract_inverted_index.artificial | 112 |
| abstract_inverted_index.components | 77 |
| abstract_inverted_index.concordant | 85 |
| abstract_inverted_index.expertise, | 26 |
| abstract_inverted_index.foundation | 57 |
| abstract_inverted_index.framework, | 95 |
| abstract_inverted_index.implicitly | 38 |
| abstract_inverted_index.techniques | 73 |
| abstract_inverted_index.demonstrate | 67 |
| abstract_inverted_index.embeddings. | 51 |
| abstract_inverted_index.engineering | 12 |
| abstract_inverted_index.Constructing | 0 |
| abstract_inverted_index.applications | 92 |
| abstract_inverted_index.handcrafting | 18 |
| abstract_inverted_index.high-quality | 1 |
| abstract_inverted_index.historically | 14 |
| abstract_inverted_index.quantitative | 7 |
| abstract_inverted_index.separability | 104 |
| abstract_inverted_index.transforming | 42 |
| abstract_inverted_index.well-studied | 71 |
| abstract_inverted_index.contexts—we | 66 |
| abstract_inverted_index.explanations. | 88 |
| abstract_inverted_index.heterogeneity | 81 |
| abstract_inverted_index.intelligence. | 113 |
| abstract_inverted_index.models—which | 58 |
| abstract_inverted_index.representations | 20 |
| abstract_inverted_index.human-understandable | 87 |
| abstract_inverted_index.dimensionality-reduction | 72 |
| cited_by_percentile_year | |
| countries_distinct_count | 1 |
| institutions_distinct_count | 3 |
| citation_normalized_percentile |