Latent Thought Models with Variational Bayes Inference-Time Computation Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2502.01567
We propose a novel class of language models, Latent Thought Models (LTMs), which incorporate explicit latent thought vectors that follow an explicit prior model in latent space. These latent thought vectors guide the autoregressive generation of ground tokens through a Transformer decoder. Training employs a dual-rate optimization process within the classical variational Bayes framework: fast learning of local variational parameters for the posterior distribution of latent vectors (inference-time computation), and slow learning of global decoder parameters. Empirical studies reveal that LTMs possess additional scaling dimensions beyond traditional Large Language Models (LLMs), such as the number of iterations in inference-time computation and number of latent thought vectors. Higher sample efficiency can be achieved by increasing training compute per token, with further gains possible by trading model size for more inference steps. Designed based on these scaling properties, LTMs demonstrate superior sample and parameter efficiency compared to autoregressive models and discrete diffusion models. They significantly outperform these counterparts in validation perplexity and zero-shot language modeling tasks. Additionally, LTMs exhibit emergent few-shot in-context reasoning capabilities that scale with model size, and achieve competitive performance in conditional and unconditional text generation.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2502.01567
- https://arxiv.org/pdf/2502.01567
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4407184045
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4407184045Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2502.01567Digital Object Identifier
- Title
-
Latent Thought Models with Variational Bayes Inference-Time ComputationWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-02-03Full publication date if available
- Authors
-
Deqian Kong, Minglu Zhao, Dehong Xu, Bo Pang, Shu Wang, E Honig, Zhangzhang Si, Chuan Li, Jianwen Xie, Sirui Xie, Ying WuList of authors in order
- Landing page
-
https://arxiv.org/abs/2502.01567Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2502.01567Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2502.01567Direct OA link when available
- Concepts
-
Inference, Scalability, Computer science, Artificial intelligence, Language model, Natural language processing, DatabaseTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4407184045 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2502.01567 |
| ids.doi | https://doi.org/10.48550/arxiv.2502.01567 |
| ids.openalex | https://openalex.org/W4407184045 |
| fwci | |
| type | preprint |
| title | Latent Thought Models with Variational Bayes Inference-Time Computation |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10028 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9932000041007996 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Topic Modeling |
| topics[1].id | https://openalex.org/T10181 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9240000247955322 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Natural Language Processing Techniques |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C2776214188 |
| concepts[0].level | 2 |
| concepts[0].score | 0.736868143081665 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q408386 |
| concepts[0].display_name | Inference |
| concepts[1].id | https://openalex.org/C48044578 |
| concepts[1].level | 2 |
| concepts[1].score | 0.5869746804237366 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q727490 |
| concepts[1].display_name | Scalability |
| concepts[2].id | https://openalex.org/C41008148 |
| concepts[2].level | 0 |
| concepts[2].score | 0.579572856426239 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[2].display_name | Computer science |
| concepts[3].id | https://openalex.org/C154945302 |
| concepts[3].level | 1 |
| concepts[3].score | 0.4501476585865021 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[3].display_name | Artificial intelligence |
| concepts[4].id | https://openalex.org/C137293760 |
| concepts[4].level | 2 |
| concepts[4].score | 0.43721163272857666 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q3621696 |
| concepts[4].display_name | Language model |
| concepts[5].id | https://openalex.org/C204321447 |
| concepts[5].level | 1 |
| concepts[5].score | 0.41556647419929504 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[5].display_name | Natural language processing |
| concepts[6].id | https://openalex.org/C77088390 |
| concepts[6].level | 1 |
| concepts[6].score | 0.05797320604324341 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q8513 |
| concepts[6].display_name | Database |
| keywords[0].id | https://openalex.org/keywords/inference |
| keywords[0].score | 0.736868143081665 |
| keywords[0].display_name | Inference |
| keywords[1].id | https://openalex.org/keywords/scalability |
| keywords[1].score | 0.5869746804237366 |
| keywords[1].display_name | Scalability |
| keywords[2].id | https://openalex.org/keywords/computer-science |
| keywords[2].score | 0.579572856426239 |
| keywords[2].display_name | Computer science |
| keywords[3].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[3].score | 0.4501476585865021 |
| keywords[3].display_name | Artificial intelligence |
| keywords[4].id | https://openalex.org/keywords/language-model |
| keywords[4].score | 0.43721163272857666 |
| keywords[4].display_name | Language model |
| keywords[5].id | https://openalex.org/keywords/natural-language-processing |
| keywords[5].score | 0.41556647419929504 |
| keywords[5].display_name | Natural language processing |
| keywords[6].id | https://openalex.org/keywords/database |
| keywords[6].score | 0.05797320604324341 |
| keywords[6].display_name | Database |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2502.01567 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2502.01567 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2502.01567 |
| locations[1].id | doi:10.48550/arxiv.2502.01567 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2502.01567 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5101209671 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Deqian Kong |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Kong, Deqian |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5101414990 |
| authorships[1].author.orcid | https://orcid.org/0009-0005-3632-4167 |
| authorships[1].author.display_name | Minglu Zhao |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Zhao, Minglu |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5099054886 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Dehong Xu |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Xu, Dehong |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5101799605 |
| authorships[3].author.orcid | https://orcid.org/0000-0003-4521-6369 |
| authorships[3].author.display_name | Bo Pang |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Pang, Bo |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5041713522 |
| authorships[4].author.orcid | https://orcid.org/0009-0001-7554-1835 |
| authorships[4].author.display_name | Shu Wang |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Wang, Shu |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5012169627 |
| authorships[5].author.orcid | https://orcid.org/0009-0001-4591-3546 |
| authorships[5].author.display_name | E Honig |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Honig, Edouardo |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5013974468 |
| authorships[6].author.orcid | |
| authorships[6].author.display_name | Zhangzhang Si |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Si, Zhangzhang |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5037372838 |
| authorships[7].author.orcid | https://orcid.org/0009-0004-8041-2538 |
| authorships[7].author.display_name | Chuan Li |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Li, Chuan |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A5104112430 |
| authorships[8].author.orcid | |
| authorships[8].author.display_name | Jianwen Xie |
| authorships[8].author_position | middle |
| authorships[8].raw_author_name | Xie, Jianwen |
| authorships[8].is_corresponding | False |
| authorships[9].author.id | https://openalex.org/A5003810400 |
| authorships[9].author.orcid | https://orcid.org/0000-0002-0295-2588 |
| authorships[9].author.display_name | Sirui Xie |
| authorships[9].author_position | middle |
| authorships[9].raw_author_name | Xie, Sirui |
| authorships[9].is_corresponding | False |
| authorships[10].author.id | https://openalex.org/A5101780958 |
| authorships[10].author.orcid | https://orcid.org/0009-0001-6768-5118 |
| authorships[10].author.display_name | Ying Wu |
| authorships[10].author_position | last |
| authorships[10].raw_author_name | Wu, Ying Nian |
| authorships[10].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2502.01567 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-02-06T00:00:00 |
| display_name | Latent Thought Models with Variational Bayes Inference-Time Computation |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10028 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9932000041007996 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Topic Modeling |
| related_works | https://openalex.org/W2389214306, https://openalex.org/W2965083567, https://openalex.org/W4235240664, https://openalex.org/W1838576100, https://openalex.org/W2095886385, https://openalex.org/W2889616422, https://openalex.org/W2089704382, https://openalex.org/W1983399550, https://openalex.org/W97075385, https://openalex.org/W3204019825 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2502.01567 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2502.01567 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2502.01567 |
| primary_location.id | pmh:oai:arXiv.org:2502.01567 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2502.01567 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2502.01567 |
| publication_date | 2025-02-03 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 2, 39, 44 |
| abstract_inverted_index.We | 0 |
| abstract_inverted_index.an | 20 |
| abstract_inverted_index.as | 92 |
| abstract_inverted_index.be | 110 |
| abstract_inverted_index.by | 112, 122 |
| abstract_inverted_index.in | 24, 97, 156, 181 |
| abstract_inverted_index.of | 5, 35, 56, 64, 72, 95, 102 |
| abstract_inverted_index.on | 132 |
| abstract_inverted_index.to | 144 |
| abstract_inverted_index.and | 69, 100, 140, 147, 159, 177, 183 |
| abstract_inverted_index.can | 109 |
| abstract_inverted_index.for | 60, 126 |
| abstract_inverted_index.per | 116 |
| abstract_inverted_index.the | 32, 49, 61, 93 |
| abstract_inverted_index.LTMs | 80, 136, 165 |
| abstract_inverted_index.They | 151 |
| abstract_inverted_index.fast | 54 |
| abstract_inverted_index.more | 127 |
| abstract_inverted_index.size | 125 |
| abstract_inverted_index.slow | 70 |
| abstract_inverted_index.such | 91 |
| abstract_inverted_index.text | 185 |
| abstract_inverted_index.that | 18, 79, 172 |
| abstract_inverted_index.with | 118, 174 |
| abstract_inverted_index.Bayes | 52 |
| abstract_inverted_index.Large | 87 |
| abstract_inverted_index.These | 27 |
| abstract_inverted_index.based | 131 |
| abstract_inverted_index.class | 4 |
| abstract_inverted_index.gains | 120 |
| abstract_inverted_index.guide | 31 |
| abstract_inverted_index.local | 57 |
| abstract_inverted_index.model | 23, 124, 175 |
| abstract_inverted_index.novel | 3 |
| abstract_inverted_index.prior | 22 |
| abstract_inverted_index.scale | 173 |
| abstract_inverted_index.size, | 176 |
| abstract_inverted_index.these | 133, 154 |
| abstract_inverted_index.which | 12 |
| abstract_inverted_index.Higher | 106 |
| abstract_inverted_index.Latent | 8 |
| abstract_inverted_index.Models | 10, 89 |
| abstract_inverted_index.beyond | 85 |
| abstract_inverted_index.follow | 19 |
| abstract_inverted_index.global | 73 |
| abstract_inverted_index.ground | 36 |
| abstract_inverted_index.latent | 15, 25, 28, 65, 103 |
| abstract_inverted_index.models | 146 |
| abstract_inverted_index.number | 94, 101 |
| abstract_inverted_index.reveal | 78 |
| abstract_inverted_index.sample | 107, 139 |
| abstract_inverted_index.space. | 26 |
| abstract_inverted_index.steps. | 129 |
| abstract_inverted_index.tasks. | 163 |
| abstract_inverted_index.token, | 117 |
| abstract_inverted_index.tokens | 37 |
| abstract_inverted_index.within | 48 |
| abstract_inverted_index.(LLMs), | 90 |
| abstract_inverted_index.(LTMs), | 11 |
| abstract_inverted_index.Thought | 9 |
| abstract_inverted_index.achieve | 178 |
| abstract_inverted_index.compute | 115 |
| abstract_inverted_index.decoder | 74 |
| abstract_inverted_index.employs | 43 |
| abstract_inverted_index.exhibit | 166 |
| abstract_inverted_index.further | 119 |
| abstract_inverted_index.models, | 7 |
| abstract_inverted_index.models. | 150 |
| abstract_inverted_index.possess | 81 |
| abstract_inverted_index.process | 47 |
| abstract_inverted_index.propose | 1 |
| abstract_inverted_index.scaling | 83, 134 |
| abstract_inverted_index.studies | 77 |
| abstract_inverted_index.thought | 16, 29, 104 |
| abstract_inverted_index.through | 38 |
| abstract_inverted_index.trading | 123 |
| abstract_inverted_index.vectors | 17, 30, 66 |
| abstract_inverted_index.Designed | 130 |
| abstract_inverted_index.Language | 88 |
| abstract_inverted_index.Training | 42 |
| abstract_inverted_index.achieved | 111 |
| abstract_inverted_index.compared | 143 |
| abstract_inverted_index.decoder. | 41 |
| abstract_inverted_index.discrete | 148 |
| abstract_inverted_index.emergent | 167 |
| abstract_inverted_index.explicit | 14, 21 |
| abstract_inverted_index.few-shot | 168 |
| abstract_inverted_index.language | 6, 161 |
| abstract_inverted_index.learning | 55, 71 |
| abstract_inverted_index.modeling | 162 |
| abstract_inverted_index.possible | 121 |
| abstract_inverted_index.superior | 138 |
| abstract_inverted_index.training | 114 |
| abstract_inverted_index.vectors. | 105 |
| abstract_inverted_index.Empirical | 76 |
| abstract_inverted_index.classical | 50 |
| abstract_inverted_index.diffusion | 149 |
| abstract_inverted_index.dual-rate | 45 |
| abstract_inverted_index.inference | 128 |
| abstract_inverted_index.parameter | 141 |
| abstract_inverted_index.posterior | 62 |
| abstract_inverted_index.reasoning | 170 |
| abstract_inverted_index.zero-shot | 160 |
| abstract_inverted_index.additional | 82 |
| abstract_inverted_index.dimensions | 84 |
| abstract_inverted_index.efficiency | 108, 142 |
| abstract_inverted_index.framework: | 53 |
| abstract_inverted_index.generation | 34 |
| abstract_inverted_index.in-context | 169 |
| abstract_inverted_index.increasing | 113 |
| abstract_inverted_index.iterations | 96 |
| abstract_inverted_index.outperform | 153 |
| abstract_inverted_index.parameters | 59 |
| abstract_inverted_index.perplexity | 158 |
| abstract_inverted_index.validation | 157 |
| abstract_inverted_index.Transformer | 40 |
| abstract_inverted_index.competitive | 179 |
| abstract_inverted_index.computation | 99 |
| abstract_inverted_index.conditional | 182 |
| abstract_inverted_index.demonstrate | 137 |
| abstract_inverted_index.generation. | 186 |
| abstract_inverted_index.incorporate | 13 |
| abstract_inverted_index.parameters. | 75 |
| abstract_inverted_index.performance | 180 |
| abstract_inverted_index.properties, | 135 |
| abstract_inverted_index.traditional | 86 |
| abstract_inverted_index.variational | 51, 58 |
| abstract_inverted_index.capabilities | 171 |
| abstract_inverted_index.counterparts | 155 |
| abstract_inverted_index.distribution | 63 |
| abstract_inverted_index.optimization | 46 |
| abstract_inverted_index.Additionally, | 164 |
| abstract_inverted_index.computation), | 68 |
| abstract_inverted_index.significantly | 152 |
| abstract_inverted_index.unconditional | 184 |
| abstract_inverted_index.autoregressive | 33, 145 |
| abstract_inverted_index.inference-time | 98 |
| abstract_inverted_index.(inference-time | 67 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 11 |
| citation_normalized_percentile |