Replication Data for: Topics, Concepts, and Measurement: A Crowdsourced Procedure for Validating Topics as Measures Article Swipe
YOU?
·
· 2021
· Open Access
·
· DOI: https://doi.org/10.7910/dvn/s02ebf
Topic models, as developed in computer science, are effective tools for exploring and summarizing large document collections. When applied in social science research, however, they are commonly used for measurement, a task that requires careful validation to ensure that the model outputs actually capture the desired concept of interest. In this paper, we review current practices for topic validation in the field and show that extensive model validation is increasingly rare, or at least not systematically reported. To supplement current practices, we refine an existing crowd-sourcing method for validating topic quality (Chang et al., 2009) and go on to create new procedures for validating conceptual labels provided by the researcher. We illustrate our method with an analysis of Facebook posts by U.S. Senators and provide software and guidance for researchers wishing to validate their own topic models. While tailored, case-specific validation exercises will always be best, we aim to improve standard practices by providing general-purpose tools to validate topics as measures.
Related Topics
- Type
- dataset
- Language
- en
- Landing Page
- https://doi.org/10.7910/dvn/s02ebf
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4398883917
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4398883917Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.7910/dvn/s02ebfDigital Object Identifier
- Title
-
Replication Data for: Topics, Concepts, and Measurement: A Crowdsourced Procedure for Validating Topics as MeasuresWork title
- Type
-
datasetOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2021Year of publication
- Publication date
-
2021-04-27Full publication date if available
- Authors
-
Luwei Ying, Jacob Montgomery, Brandon StewartList of authors in order
- Landing page
-
https://doi.org/10.7910/dvn/s02ebfPublisher landing page
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://doi.org/10.7910/dvn/s02ebfDirect OA link when available
- Concepts
-
Replication (statistics), Computer science, Data science, Statistics, MathematicsTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4398883917 |
|---|---|
| doi | https://doi.org/10.7910/dvn/s02ebf |
| ids.doi | https://doi.org/10.7910/dvn/s02ebf |
| ids.openalex | https://openalex.org/W4398883917 |
| fwci | |
| type | dataset |
| title | Replication Data for: Topics, Concepts, and Measurement: A Crowdsourced Procedure for Validating Topics as Measures |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T13274 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.28839999437332153 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1710 |
| topics[0].subfield.display_name | Information Systems |
| topics[0].display_name | Expert finding and Q&A systems |
| topics[1].id | https://openalex.org/T11719 |
| topics[1].field.id | https://openalex.org/fields/18 |
| topics[1].field.display_name | Decision Sciences |
| topics[1].score | 0.2485000044107437 |
| topics[1].domain.id | https://openalex.org/domains/2 |
| topics[1].domain.display_name | Social Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1803 |
| topics[1].subfield.display_name | Management Science and Operations Research |
| topics[1].display_name | Data Quality and Management |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C12590798 |
| concepts[0].level | 2 |
| concepts[0].score | 0.8835293054580688 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q3933199 |
| concepts[0].display_name | Replication (statistics) |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.5813887119293213 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C2522767166 |
| concepts[2].level | 1 |
| concepts[2].score | 0.532718300819397 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q2374463 |
| concepts[2].display_name | Data science |
| concepts[3].id | https://openalex.org/C105795698 |
| concepts[3].level | 1 |
| concepts[3].score | 0.13115018606185913 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q12483 |
| concepts[3].display_name | Statistics |
| concepts[4].id | https://openalex.org/C33923547 |
| concepts[4].level | 0 |
| concepts[4].score | 0.13097882270812988 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[4].display_name | Mathematics |
| keywords[0].id | https://openalex.org/keywords/replication |
| keywords[0].score | 0.8835293054580688 |
| keywords[0].display_name | Replication (statistics) |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.5813887119293213 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/data-science |
| keywords[2].score | 0.532718300819397 |
| keywords[2].display_name | Data science |
| keywords[3].id | https://openalex.org/keywords/statistics |
| keywords[3].score | 0.13115018606185913 |
| keywords[3].display_name | Statistics |
| keywords[4].id | https://openalex.org/keywords/mathematics |
| keywords[4].score | 0.13097882270812988 |
| keywords[4].display_name | Mathematics |
| language | en |
| locations[0].id | doi:10.7910/dvn/s02ebf |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4377196806 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | False |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | Harvard Dataverse |
| locations[0].source.host_organization | https://openalex.org/I136199984 |
| locations[0].source.host_organization_name | Harvard University |
| locations[0].source.host_organization_lineage | https://openalex.org/I136199984 |
| locations[0].license | public-domain |
| locations[0].pdf_url | |
| locations[0].version | |
| locations[0].raw_type | dataset |
| locations[0].license_id | https://openalex.org/licenses/public-domain |
| locations[0].is_accepted | False |
| locations[0].is_published | |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | https://doi.org/10.7910/dvn/s02ebf |
| indexed_in | datacite |
| authorships[0].author.id | https://openalex.org/A5049746675 |
| authorships[0].author.orcid | https://orcid.org/0000-0001-7307-4834 |
| authorships[0].author.display_name | Luwei Ying |
| authorships[0].countries | US |
| authorships[0].affiliations[0].institution_ids | https://openalex.org/I204465549 |
| authorships[0].affiliations[0].raw_affiliation_string | (Washington University in St. Louis) |
| authorships[0].institutions[0].id | https://openalex.org/I204465549 |
| authorships[0].institutions[0].ror | https://ror.org/01yc7t268 |
| authorships[0].institutions[0].type | education |
| authorships[0].institutions[0].lineage | https://openalex.org/I204465549 |
| authorships[0].institutions[0].country_code | US |
| authorships[0].institutions[0].display_name | Washington University in St. Louis |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Luwei Ying |
| authorships[0].is_corresponding | False |
| authorships[0].raw_affiliation_strings | (Washington University in St. Louis) |
| authorships[1].author.id | https://openalex.org/A5049220804 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-5632-2437 |
| authorships[1].author.display_name | Jacob Montgomery |
| authorships[1].countries | US |
| authorships[1].affiliations[0].institution_ids | https://openalex.org/I204465549 |
| authorships[1].affiliations[0].raw_affiliation_string | (Washington University in St. Louis) |
| authorships[1].institutions[0].id | https://openalex.org/I204465549 |
| authorships[1].institutions[0].ror | https://ror.org/01yc7t268 |
| authorships[1].institutions[0].type | education |
| authorships[1].institutions[0].lineage | https://openalex.org/I204465549 |
| authorships[1].institutions[0].country_code | US |
| authorships[1].institutions[0].display_name | Washington University in St. Louis |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Jacob M. Montgomery |
| authorships[1].is_corresponding | False |
| authorships[1].raw_affiliation_strings | (Washington University in St. Louis) |
| authorships[2].author.id | https://openalex.org/A5113226689 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Brandon Stewart |
| authorships[2].countries | US |
| authorships[2].affiliations[0].institution_ids | https://openalex.org/I20089843 |
| authorships[2].affiliations[0].raw_affiliation_string | (Princeton University) |
| authorships[2].institutions[0].id | https://openalex.org/I20089843 |
| authorships[2].institutions[0].ror | https://ror.org/00hx57361 |
| authorships[2].institutions[0].type | education |
| authorships[2].institutions[0].lineage | https://openalex.org/I20089843 |
| authorships[2].institutions[0].country_code | US |
| authorships[2].institutions[0].display_name | Princeton University |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | Brandon M. Stewart |
| authorships[2].is_corresponding | False |
| authorships[2].raw_affiliation_strings | (Princeton University) |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://doi.org/10.7910/dvn/s02ebf |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Replication Data for: Topics, Concepts, and Measurement: A Crowdsourced Procedure for Validating Topics as Measures |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T13274 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.28839999437332153 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1710 |
| primary_topic.subfield.display_name | Information Systems |
| primary_topic.display_name | Expert finding and Q&A systems |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W2358668433, https://openalex.org/W4396701345, https://openalex.org/W2376932109, https://openalex.org/W4205713785, https://openalex.org/W2001405890, https://openalex.org/W3016766501, https://openalex.org/W4398287560 |
| cited_by_count | 0 |
| locations_count | 1 |
| best_oa_location.id | doi:10.7910/dvn/s02ebf |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4377196806 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | False |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | Harvard Dataverse |
| best_oa_location.source.host_organization | https://openalex.org/I136199984 |
| best_oa_location.source.host_organization_name | Harvard University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I136199984 |
| best_oa_location.license | public-domain |
| best_oa_location.pdf_url | |
| best_oa_location.version | |
| best_oa_location.raw_type | dataset |
| best_oa_location.license_id | https://openalex.org/licenses/public-domain |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | https://doi.org/10.7910/dvn/s02ebf |
| primary_location.id | doi:10.7910/dvn/s02ebf |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4377196806 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | False |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | Harvard Dataverse |
| primary_location.source.host_organization | https://openalex.org/I136199984 |
| primary_location.source.host_organization_name | Harvard University |
| primary_location.source.host_organization_lineage | https://openalex.org/I136199984 |
| primary_location.license | public-domain |
| primary_location.pdf_url | |
| primary_location.version | |
| primary_location.raw_type | dataset |
| primary_location.license_id | https://openalex.org/licenses/public-domain |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | https://doi.org/10.7910/dvn/s02ebf |
| publication_date | 2021-04-27 |
| publication_year | 2021 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 30 |
| abstract_inverted_index.In | 49 |
| abstract_inverted_index.To | 77 |
| abstract_inverted_index.We | 110 |
| abstract_inverted_index.an | 83, 115 |
| abstract_inverted_index.as | 2, 159 |
| abstract_inverted_index.at | 72 |
| abstract_inverted_index.be | 144 |
| abstract_inverted_index.by | 107, 120, 152 |
| abstract_inverted_index.et | 92 |
| abstract_inverted_index.go | 96 |
| abstract_inverted_index.in | 4, 19, 59 |
| abstract_inverted_index.is | 68 |
| abstract_inverted_index.of | 47, 117 |
| abstract_inverted_index.on | 97 |
| abstract_inverted_index.or | 71 |
| abstract_inverted_index.to | 36, 98, 131, 148, 156 |
| abstract_inverted_index.we | 52, 81, 146 |
| abstract_inverted_index.aim | 147 |
| abstract_inverted_index.and | 12, 62, 95, 123, 126 |
| abstract_inverted_index.are | 7, 25 |
| abstract_inverted_index.for | 10, 28, 56, 87, 102, 128 |
| abstract_inverted_index.new | 100 |
| abstract_inverted_index.not | 74 |
| abstract_inverted_index.our | 112 |
| abstract_inverted_index.own | 134 |
| abstract_inverted_index.the | 39, 44, 60, 108 |
| abstract_inverted_index.U.S. | 121 |
| abstract_inverted_index.When | 17 |
| abstract_inverted_index.al., | 93 |
| abstract_inverted_index.show | 63 |
| abstract_inverted_index.task | 31 |
| abstract_inverted_index.that | 32, 38, 64 |
| abstract_inverted_index.they | 24 |
| abstract_inverted_index.this | 50 |
| abstract_inverted_index.used | 27 |
| abstract_inverted_index.will | 142 |
| abstract_inverted_index.with | 114 |
| abstract_inverted_index.2009) | 94 |
| abstract_inverted_index.Topic | 0 |
| abstract_inverted_index.While | 137 |
| abstract_inverted_index.best, | 145 |
| abstract_inverted_index.field | 61 |
| abstract_inverted_index.large | 14 |
| abstract_inverted_index.least | 73 |
| abstract_inverted_index.model | 40, 66 |
| abstract_inverted_index.posts | 119 |
| abstract_inverted_index.rare, | 70 |
| abstract_inverted_index.their | 133 |
| abstract_inverted_index.tools | 9, 155 |
| abstract_inverted_index.topic | 57, 89, 135 |
| abstract_inverted_index.(Chang | 91 |
| abstract_inverted_index.always | 143 |
| abstract_inverted_index.create | 99 |
| abstract_inverted_index.ensure | 37 |
| abstract_inverted_index.labels | 105 |
| abstract_inverted_index.method | 86, 113 |
| abstract_inverted_index.paper, | 51 |
| abstract_inverted_index.refine | 82 |
| abstract_inverted_index.review | 53 |
| abstract_inverted_index.social | 20 |
| abstract_inverted_index.topics | 158 |
| abstract_inverted_index.applied | 18 |
| abstract_inverted_index.capture | 43 |
| abstract_inverted_index.careful | 34 |
| abstract_inverted_index.concept | 46 |
| abstract_inverted_index.current | 54, 79 |
| abstract_inverted_index.desired | 45 |
| abstract_inverted_index.improve | 149 |
| abstract_inverted_index.models, | 1 |
| abstract_inverted_index.models. | 136 |
| abstract_inverted_index.outputs | 41 |
| abstract_inverted_index.provide | 124 |
| abstract_inverted_index.quality | 90 |
| abstract_inverted_index.science | 21 |
| abstract_inverted_index.wishing | 130 |
| abstract_inverted_index.Facebook | 118 |
| abstract_inverted_index.Senators | 122 |
| abstract_inverted_index.actually | 42 |
| abstract_inverted_index.analysis | 116 |
| abstract_inverted_index.commonly | 26 |
| abstract_inverted_index.computer | 5 |
| abstract_inverted_index.document | 15 |
| abstract_inverted_index.existing | 84 |
| abstract_inverted_index.guidance | 127 |
| abstract_inverted_index.however, | 23 |
| abstract_inverted_index.provided | 106 |
| abstract_inverted_index.requires | 33 |
| abstract_inverted_index.science, | 6 |
| abstract_inverted_index.software | 125 |
| abstract_inverted_index.standard | 150 |
| abstract_inverted_index.validate | 132, 157 |
| abstract_inverted_index.developed | 3 |
| abstract_inverted_index.effective | 8 |
| abstract_inverted_index.exercises | 141 |
| abstract_inverted_index.exploring | 11 |
| abstract_inverted_index.extensive | 65 |
| abstract_inverted_index.interest. | 48 |
| abstract_inverted_index.measures. | 160 |
| abstract_inverted_index.practices | 55, 151 |
| abstract_inverted_index.providing | 153 |
| abstract_inverted_index.reported. | 76 |
| abstract_inverted_index.research, | 22 |
| abstract_inverted_index.tailored, | 138 |
| abstract_inverted_index.conceptual | 104 |
| abstract_inverted_index.illustrate | 111 |
| abstract_inverted_index.practices, | 80 |
| abstract_inverted_index.procedures | 101 |
| abstract_inverted_index.supplement | 78 |
| abstract_inverted_index.validating | 88, 103 |
| abstract_inverted_index.validation | 35, 58, 67, 140 |
| abstract_inverted_index.researcher. | 109 |
| abstract_inverted_index.researchers | 129 |
| abstract_inverted_index.summarizing | 13 |
| abstract_inverted_index.collections. | 16 |
| abstract_inverted_index.increasingly | 69 |
| abstract_inverted_index.measurement, | 29 |
| abstract_inverted_index.case-specific | 139 |
| abstract_inverted_index.crowd-sourcing | 85 |
| abstract_inverted_index.systematically | 75 |
| abstract_inverted_index.general-purpose | 154 |
| cited_by_percentile_year | |
| countries_distinct_count | 1 |
| institutions_distinct_count | 3 |
| citation_normalized_percentile |