Bayesian Paragraph Vectors Article Swipe
YOU?
·
· 2017
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.1711.03946
Word2vec (Mikolov et al., 2013) has proven to be successful in natural language processing by capturing the semantic relationships between different words. Built on top of single-word embeddings, paragraph vectors (Le and Mikolov, 2014) find fixed-length representations for pieces of text with arbitrary lengths, such as documents, paragraphs, and sentences. In this work, we propose a novel interpretation for neural-network-based paragraph vectors by developing an unsupervised generative model whose maximum likelihood solution corresponds to traditional paragraph vectors. This probabilistic formulation allows us to go beyond point estimates of parameters and to perform Bayesian posterior inference. We find that the entropy of paragraph vectors decreases with the length of documents, and that information about posterior uncertainty improves performance in supervised learning tasks such as sentiment analysis and paraphrase detection.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/1711.03946
- https://arxiv.org/pdf/1711.03946
- OA Status
- green
- Cited By
- 4
- References
- 17
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W2767262701
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W2767262701Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.1711.03946Digital Object Identifier
- Title
-
Bayesian Paragraph VectorsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2017Year of publication
- Publication date
-
2017-11-10Full publication date if available
- Authors
-
Geng Ji, Robert Bamler, Erik B. Sudderth, Stephan MandtList of authors in order
- Landing page
-
https://arxiv.org/abs/1711.03946Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/1711.03946Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/1711.03946Direct OA link when available
- Concepts
-
Paragraph, Bayesian probability, Computer science, Artificial intelligence, World Wide WebTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
4Total citation count in OpenAlex
- Citations by year (recent)
-
2019: 4Per-year citation counts (last 5 years)
- References (count)
-
17Number of works referenced by this work
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W2767262701 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.1711.03946 |
| ids.doi | https://doi.org/10.48550/arxiv.1711.03946 |
| ids.mag | 2767262701 |
| ids.openalex | https://openalex.org/W2767262701 |
| fwci | |
| type | preprint |
| title | Bayesian Paragraph Vectors |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10028 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9991000294685364 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Topic Modeling |
| topics[1].id | https://openalex.org/T11303 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9916999936103821 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Bayesian Modeling and Causal Inference |
| topics[2].id | https://openalex.org/T12072 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.991100013256073 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Machine Learning and Algorithms |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C2777206241 |
| concepts[0].level | 2 |
| concepts[0].score | 0.795383870601654 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q194431 |
| concepts[0].display_name | Paragraph |
| concepts[1].id | https://openalex.org/C107673813 |
| concepts[1].level | 2 |
| concepts[1].score | 0.5812726616859436 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q812534 |
| concepts[1].display_name | Bayesian probability |
| concepts[2].id | https://openalex.org/C41008148 |
| concepts[2].level | 0 |
| concepts[2].score | 0.4446234107017517 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[2].display_name | Computer science |
| concepts[3].id | https://openalex.org/C154945302 |
| concepts[3].level | 1 |
| concepts[3].score | 0.3745230436325073 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[3].display_name | Artificial intelligence |
| concepts[4].id | https://openalex.org/C136764020 |
| concepts[4].level | 1 |
| concepts[4].score | 0.10180148482322693 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q466 |
| concepts[4].display_name | World Wide Web |
| keywords[0].id | https://openalex.org/keywords/paragraph |
| keywords[0].score | 0.795383870601654 |
| keywords[0].display_name | Paragraph |
| keywords[1].id | https://openalex.org/keywords/bayesian-probability |
| keywords[1].score | 0.5812726616859436 |
| keywords[1].display_name | Bayesian probability |
| keywords[2].id | https://openalex.org/keywords/computer-science |
| keywords[2].score | 0.4446234107017517 |
| keywords[2].display_name | Computer science |
| keywords[3].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[3].score | 0.3745230436325073 |
| keywords[3].display_name | Artificial intelligence |
| keywords[4].id | https://openalex.org/keywords/world-wide-web |
| keywords[4].score | 0.10180148482322693 |
| keywords[4].display_name | World Wide Web |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:1711.03946 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/1711.03946 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/1711.03946 |
| locations[1].id | doi:10.48550/arxiv.1711.03946 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.1711.03946 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5112194243 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Geng Ji |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Geng Ji |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5045460222 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-3135-8107 |
| authorships[1].author.display_name | Robert Bamler |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Robert Bamler |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5076761279 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-0595-9726 |
| authorships[2].author.display_name | Erik B. Sudderth |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Erik B. Sudderth |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5036302820 |
| authorships[3].author.orcid | https://orcid.org/0000-0001-7836-7839 |
| authorships[3].author.display_name | Stephan Mandt |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Stephan Mandt |
| authorships[3].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/1711.03946 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Bayesian Paragraph Vectors |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10028 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9991000294685364 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Topic Modeling |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2748952813, https://openalex.org/W2377059580, https://openalex.org/W4200355488, https://openalex.org/W127000293, https://openalex.org/W3215892509, https://openalex.org/W2928616779, https://openalex.org/W2412592434, https://openalex.org/W2010523086, https://openalex.org/W4244602709 |
| cited_by_count | 4 |
| counts_by_year[0].year | 2019 |
| counts_by_year[0].cited_by_count | 4 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:1711.03946 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/1711.03946 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/1711.03946 |
| primary_location.id | pmh:oai:arXiv.org:1711.03946 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/1711.03946 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/1711.03946 |
| publication_date | 2017-11-10 |
| publication_year | 2017 |
| referenced_works | https://openalex.org/W2950726992, https://openalex.org/W1909320841, https://openalex.org/W1980776243, https://openalex.org/W2250539671, https://openalex.org/W1814992895, https://openalex.org/W2949547296, https://openalex.org/W2605035112, https://openalex.org/W1959608418, https://openalex.org/W2539792571, https://openalex.org/W3101380508, https://openalex.org/W2963173382, https://openalex.org/W3122775348, https://openalex.org/W2153579005, https://openalex.org/W2113459411, https://openalex.org/W2131571251, https://openalex.org/W2964231305, https://openalex.org/W2950577311 |
| referenced_works_count | 17 |
| abstract_inverted_index.a | 55 |
| abstract_inverted_index.In | 50 |
| abstract_inverted_index.We | 95 |
| abstract_inverted_index.an | 64 |
| abstract_inverted_index.as | 45, 122 |
| abstract_inverted_index.be | 8 |
| abstract_inverted_index.by | 14, 62 |
| abstract_inverted_index.et | 2 |
| abstract_inverted_index.go | 83 |
| abstract_inverted_index.in | 10, 117 |
| abstract_inverted_index.of | 25, 39, 87, 100, 107 |
| abstract_inverted_index.on | 23 |
| abstract_inverted_index.to | 7, 73, 82, 90 |
| abstract_inverted_index.us | 81 |
| abstract_inverted_index.we | 53 |
| abstract_inverted_index.(Le | 30 |
| abstract_inverted_index.and | 31, 48, 89, 109, 125 |
| abstract_inverted_index.for | 37, 58 |
| abstract_inverted_index.has | 5 |
| abstract_inverted_index.the | 16, 98, 105 |
| abstract_inverted_index.top | 24 |
| abstract_inverted_index.This | 77 |
| abstract_inverted_index.al., | 3 |
| abstract_inverted_index.find | 34, 96 |
| abstract_inverted_index.such | 44, 121 |
| abstract_inverted_index.text | 40 |
| abstract_inverted_index.that | 97, 110 |
| abstract_inverted_index.this | 51 |
| abstract_inverted_index.with | 41, 104 |
| abstract_inverted_index.2013) | 4 |
| abstract_inverted_index.2014) | 33 |
| abstract_inverted_index.Built | 22 |
| abstract_inverted_index.about | 112 |
| abstract_inverted_index.model | 67 |
| abstract_inverted_index.novel | 56 |
| abstract_inverted_index.point | 85 |
| abstract_inverted_index.tasks | 120 |
| abstract_inverted_index.whose | 68 |
| abstract_inverted_index.work, | 52 |
| abstract_inverted_index.allows | 80 |
| abstract_inverted_index.beyond | 84 |
| abstract_inverted_index.length | 106 |
| abstract_inverted_index.pieces | 38 |
| abstract_inverted_index.proven | 6 |
| abstract_inverted_index.words. | 21 |
| abstract_inverted_index.between | 19 |
| abstract_inverted_index.entropy | 99 |
| abstract_inverted_index.maximum | 69 |
| abstract_inverted_index.natural | 11 |
| abstract_inverted_index.perform | 91 |
| abstract_inverted_index.propose | 54 |
| abstract_inverted_index.vectors | 29, 61, 102 |
| abstract_inverted_index.(Mikolov | 1 |
| abstract_inverted_index.Bayesian | 92 |
| abstract_inverted_index.Mikolov, | 32 |
| abstract_inverted_index.Word2vec | 0 |
| abstract_inverted_index.analysis | 124 |
| abstract_inverted_index.improves | 115 |
| abstract_inverted_index.language | 12 |
| abstract_inverted_index.learning | 119 |
| abstract_inverted_index.lengths, | 43 |
| abstract_inverted_index.semantic | 17 |
| abstract_inverted_index.solution | 71 |
| abstract_inverted_index.vectors. | 76 |
| abstract_inverted_index.arbitrary | 42 |
| abstract_inverted_index.capturing | 15 |
| abstract_inverted_index.decreases | 103 |
| abstract_inverted_index.different | 20 |
| abstract_inverted_index.estimates | 86 |
| abstract_inverted_index.paragraph | 28, 60, 75, 101 |
| abstract_inverted_index.posterior | 93, 113 |
| abstract_inverted_index.sentiment | 123 |
| abstract_inverted_index.detection. | 127 |
| abstract_inverted_index.developing | 63 |
| abstract_inverted_index.documents, | 46, 108 |
| abstract_inverted_index.generative | 66 |
| abstract_inverted_index.inference. | 94 |
| abstract_inverted_index.likelihood | 70 |
| abstract_inverted_index.parameters | 88 |
| abstract_inverted_index.paraphrase | 126 |
| abstract_inverted_index.processing | 13 |
| abstract_inverted_index.sentences. | 49 |
| abstract_inverted_index.successful | 9 |
| abstract_inverted_index.supervised | 118 |
| abstract_inverted_index.corresponds | 72 |
| abstract_inverted_index.embeddings, | 27 |
| abstract_inverted_index.formulation | 79 |
| abstract_inverted_index.information | 111 |
| abstract_inverted_index.paragraphs, | 47 |
| abstract_inverted_index.performance | 116 |
| abstract_inverted_index.single-word | 26 |
| abstract_inverted_index.traditional | 74 |
| abstract_inverted_index.uncertainty | 114 |
| abstract_inverted_index.fixed-length | 35 |
| abstract_inverted_index.unsupervised | 65 |
| abstract_inverted_index.probabilistic | 78 |
| abstract_inverted_index.relationships | 18 |
| abstract_inverted_index.interpretation | 57 |
| abstract_inverted_index.representations | 36 |
| abstract_inverted_index.neural-network-based | 59 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/4 |
| sustainable_development_goals[0].score | 0.7200000286102295 |
| sustainable_development_goals[0].display_name | Quality Education |
| citation_normalized_percentile |