Replication Data for: Why Propensity Scores Should Not Be Used for Matching Article Swipe
YOU?
·
· 2018
· Open Access
·
· DOI: https://doi.org/10.7910/dvn/c0dbae
We show that propensity score matching (PSM), an enormously popular method of preprocessing data for causal inference, often accomplishes the opposite of its intended goal --- increasing imbalance, inefficiency, model dependence, and bias. PSM supposedly makes it easier to find matches by projecting a large number of covariates to a scalar propensity score and applying a single model to produce an unbiased estimate. However, in observational analysis the data generation process is rarely known and so users often try many models before choosing one to present. The weakness of PSM comes from its attempts to approximate a completely randomized experiment, rather than, as with other matching methods, a more efficient fully blocked randomized experiment. PSM is thus uniquely blind to the often large portion of imbalance that can be eliminated by approximating full blocking with other matching methods. Moreover, in data balanced enough to approximate complete randomization, either to begin with or after pruning some observations, PSM approximates random matching which, we show, increases imbalance even relative to the original data. Although these results suggest researchers always replace PSM with one of the other available matching methods, propensity scores have many other productive uses.
Related Topics
- Type
- dataset
- Language
- en
- Landing Page
- https://doi.org/10.7910/DVN/C0DBAE
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4398342447
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4398342447Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.7910/dvn/c0dbaeDigital Object Identifier
- Title
-
Replication Data for: Why Propensity Scores Should Not Be Used for MatchingWork title
- Type
-
datasetOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2018Year of publication
- Publication date
-
2018-01-01Full publication date if available
- Authors
-
Gary King, Richard A. NielsenList of authors in order
- Landing page
-
https://doi.org/10.7910/DVN/C0DBAEPublisher landing page
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://doi.org/10.7910/dvn/c0dbaeDirect OA link when available
- Concepts
-
Replication (statistics), Matching (statistics), Propensity score matching, Computer science, Psychology, Statistics, MathematicsTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4398342447 |
|---|---|
| doi | https://doi.org/10.7910/dvn/c0dbae |
| ids.doi | https://doi.org/10.7910/dvn/c0dbae |
| ids.openalex | https://openalex.org/W4398342447 |
| fwci | |
| type | dataset |
| title | Replication Data for: Why Propensity Scores Should Not Be Used for Matching |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10845 |
| topics[0].field.id | https://openalex.org/fields/26 |
| topics[0].field.display_name | Mathematics |
| topics[0].score | 0.4765999913215637 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/2613 |
| topics[0].subfield.display_name | Statistics and Probability |
| topics[0].display_name | Advanced Causal Inference Techniques |
| topics[1].id | https://openalex.org/T11303 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.4196000099182129 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Bayesian Modeling and Causal Inference |
| topics[2].id | https://openalex.org/T10136 |
| topics[2].field.id | https://openalex.org/fields/26 |
| topics[2].field.display_name | Mathematics |
| topics[2].score | 0.40709999203681946 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/2613 |
| topics[2].subfield.display_name | Statistics and Probability |
| topics[2].display_name | Statistical Methods and Inference |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C12590798 |
| concepts[0].level | 2 |
| concepts[0].score | 0.8795894980430603 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q3933199 |
| concepts[0].display_name | Replication (statistics) |
| concepts[1].id | https://openalex.org/C165064840 |
| concepts[1].level | 2 |
| concepts[1].score | 0.6249352693557739 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q1321061 |
| concepts[1].display_name | Matching (statistics) |
| concepts[2].id | https://openalex.org/C17923572 |
| concepts[2].level | 2 |
| concepts[2].score | 0.5748071670532227 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q7250160 |
| concepts[2].display_name | Propensity score matching |
| concepts[3].id | https://openalex.org/C41008148 |
| concepts[3].level | 0 |
| concepts[3].score | 0.4943685531616211 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[3].display_name | Computer science |
| concepts[4].id | https://openalex.org/C15744967 |
| concepts[4].level | 0 |
| concepts[4].score | 0.4140927195549011 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q9418 |
| concepts[4].display_name | Psychology |
| concepts[5].id | https://openalex.org/C105795698 |
| concepts[5].level | 1 |
| concepts[5].score | 0.2995528280735016 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q12483 |
| concepts[5].display_name | Statistics |
| concepts[6].id | https://openalex.org/C33923547 |
| concepts[6].level | 0 |
| concepts[6].score | 0.22264614701271057 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[6].display_name | Mathematics |
| keywords[0].id | https://openalex.org/keywords/replication |
| keywords[0].score | 0.8795894980430603 |
| keywords[0].display_name | Replication (statistics) |
| keywords[1].id | https://openalex.org/keywords/matching |
| keywords[1].score | 0.6249352693557739 |
| keywords[1].display_name | Matching (statistics) |
| keywords[2].id | https://openalex.org/keywords/propensity-score-matching |
| keywords[2].score | 0.5748071670532227 |
| keywords[2].display_name | Propensity score matching |
| keywords[3].id | https://openalex.org/keywords/computer-science |
| keywords[3].score | 0.4943685531616211 |
| keywords[3].display_name | Computer science |
| keywords[4].id | https://openalex.org/keywords/psychology |
| keywords[4].score | 0.4140927195549011 |
| keywords[4].display_name | Psychology |
| keywords[5].id | https://openalex.org/keywords/statistics |
| keywords[5].score | 0.2995528280735016 |
| keywords[5].display_name | Statistics |
| keywords[6].id | https://openalex.org/keywords/mathematics |
| keywords[6].score | 0.22264614701271057 |
| keywords[6].display_name | Mathematics |
| language | en |
| locations[0].id | pmh:doi:10.7910/DVN/C0DBAE |
| locations[0].is_oa | False |
| locations[0].source | |
| locations[0].license | |
| locations[0].pdf_url | |
| locations[0].version | submittedVersion |
| locations[0].raw_type | |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | https://doi.org/10.7910/DVN/C0DBAE |
| locations[1].id | doi:10.7910/dvn/c0dbae |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4377196806 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | False |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | Harvard Dataverse |
| locations[1].source.host_organization | https://openalex.org/I136199984 |
| locations[1].source.host_organization_name | Harvard University |
| locations[1].source.host_organization_lineage | https://openalex.org/I136199984 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | dataset |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.7910/dvn/c0dbae |
| indexed_in | datacite |
| authorships[0].author.id | https://openalex.org/A5109724351 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Gary King |
| authorships[0].countries | US |
| authorships[0].affiliations[0].institution_ids | https://openalex.org/I2801851002 |
| authorships[0].affiliations[0].raw_affiliation_string | (Harvard University) |
| authorships[0].institutions[0].id | https://openalex.org/I2801851002 |
| authorships[0].institutions[0].ror | https://ror.org/006v7bf86 |
| authorships[0].institutions[0].type | other |
| authorships[0].institutions[0].lineage | https://openalex.org/I136199984, https://openalex.org/I2801851002 |
| authorships[0].institutions[0].country_code | US |
| authorships[0].institutions[0].display_name | Harvard University Press |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Gary King |
| authorships[0].is_corresponding | False |
| authorships[0].raw_affiliation_strings | (Harvard University) |
| authorships[1].author.id | https://openalex.org/A5005611176 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Richard A. Nielsen |
| authorships[1].author_position | last |
| authorships[1].raw_author_name | Richard Nielsen |
| authorships[1].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://doi.org/10.7910/dvn/c0dbae |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Replication Data for: Why Propensity Scores Should Not Be Used for Matching |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10845 |
| primary_topic.field.id | https://openalex.org/fields/26 |
| primary_topic.field.display_name | Mathematics |
| primary_topic.score | 0.4765999913215637 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/2613 |
| primary_topic.subfield.display_name | Statistics and Probability |
| primary_topic.display_name | Advanced Causal Inference Techniques |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2748952813, https://openalex.org/W2026576563, https://openalex.org/W3196761963, https://openalex.org/W2036193982, https://openalex.org/W213628847, https://openalex.org/W2065417422, https://openalex.org/W4232168831, https://openalex.org/W4253956144, https://openalex.org/W3023923059 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | doi:10.7910/dvn/c0dbae |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4377196806 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | False |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | Harvard Dataverse |
| best_oa_location.source.host_organization | https://openalex.org/I136199984 |
| best_oa_location.source.host_organization_name | Harvard University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I136199984 |
| best_oa_location.license | |
| best_oa_location.pdf_url | |
| best_oa_location.version | |
| best_oa_location.raw_type | dataset |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | https://doi.org/10.7910/dvn/c0dbae |
| primary_location.id | pmh:doi:10.7910/DVN/C0DBAE |
| primary_location.is_oa | False |
| primary_location.source | |
| primary_location.license | |
| primary_location.pdf_url | |
| primary_location.version | submittedVersion |
| primary_location.raw_type | |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | https://doi.org/10.7910/DVN/C0DBAE |
| publication_date | 2018-01-01 |
| publication_year | 2018 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 43, 49, 55, 96, 107 |
| abstract_inverted_index.We | 0 |
| abstract_inverted_index.an | 7, 60 |
| abstract_inverted_index.as | 102 |
| abstract_inverted_index.be | 128 |
| abstract_inverted_index.by | 41, 130 |
| abstract_inverted_index.in | 64, 139 |
| abstract_inverted_index.is | 71, 115 |
| abstract_inverted_index.it | 36 |
| abstract_inverted_index.of | 11, 21, 46, 88, 124, 181 |
| abstract_inverted_index.or | 151 |
| abstract_inverted_index.so | 75 |
| abstract_inverted_index.to | 38, 48, 58, 84, 94, 119, 143, 148, 167 |
| abstract_inverted_index.we | 161 |
| abstract_inverted_index.--- | 25 |
| abstract_inverted_index.PSM | 33, 89, 114, 156, 178 |
| abstract_inverted_index.The | 86 |
| abstract_inverted_index.and | 31, 53, 74 |
| abstract_inverted_index.can | 127 |
| abstract_inverted_index.for | 14 |
| abstract_inverted_index.its | 22, 92 |
| abstract_inverted_index.one | 83, 180 |
| abstract_inverted_index.the | 19, 67, 120, 168, 182 |
| abstract_inverted_index.try | 78 |
| abstract_inverted_index.data | 13, 68, 140 |
| abstract_inverted_index.even | 165 |
| abstract_inverted_index.find | 39 |
| abstract_inverted_index.from | 91 |
| abstract_inverted_index.full | 132 |
| abstract_inverted_index.goal | 24 |
| abstract_inverted_index.have | 189 |
| abstract_inverted_index.many | 79, 190 |
| abstract_inverted_index.more | 108 |
| abstract_inverted_index.show | 1 |
| abstract_inverted_index.some | 154 |
| abstract_inverted_index.that | 2, 126 |
| abstract_inverted_index.thus | 116 |
| abstract_inverted_index.with | 103, 134, 150, 179 |
| abstract_inverted_index.after | 152 |
| abstract_inverted_index.begin | 149 |
| abstract_inverted_index.bias. | 32 |
| abstract_inverted_index.blind | 118 |
| abstract_inverted_index.comes | 90 |
| abstract_inverted_index.data. | 170 |
| abstract_inverted_index.fully | 110 |
| abstract_inverted_index.known | 73 |
| abstract_inverted_index.large | 44, 122 |
| abstract_inverted_index.makes | 35 |
| abstract_inverted_index.model | 29, 57 |
| abstract_inverted_index.often | 17, 77, 121 |
| abstract_inverted_index.other | 104, 135, 183, 191 |
| abstract_inverted_index.score | 4, 52 |
| abstract_inverted_index.show, | 162 |
| abstract_inverted_index.than, | 101 |
| abstract_inverted_index.these | 172 |
| abstract_inverted_index.users | 76 |
| abstract_inverted_index.uses. | 193 |
| abstract_inverted_index.(PSM), | 6 |
| abstract_inverted_index.always | 176 |
| abstract_inverted_index.before | 81 |
| abstract_inverted_index.causal | 15 |
| abstract_inverted_index.easier | 37 |
| abstract_inverted_index.either | 147 |
| abstract_inverted_index.enough | 142 |
| abstract_inverted_index.method | 10 |
| abstract_inverted_index.models | 80 |
| abstract_inverted_index.number | 45 |
| abstract_inverted_index.random | 158 |
| abstract_inverted_index.rarely | 72 |
| abstract_inverted_index.rather | 100 |
| abstract_inverted_index.scalar | 50 |
| abstract_inverted_index.scores | 188 |
| abstract_inverted_index.single | 56 |
| abstract_inverted_index.which, | 160 |
| abstract_inverted_index.blocked | 111 |
| abstract_inverted_index.matches | 40 |
| abstract_inverted_index.popular | 9 |
| abstract_inverted_index.portion | 123 |
| abstract_inverted_index.process | 70 |
| abstract_inverted_index.produce | 59 |
| abstract_inverted_index.pruning | 153 |
| abstract_inverted_index.replace | 177 |
| abstract_inverted_index.results | 173 |
| abstract_inverted_index.suggest | 174 |
| abstract_inverted_index.Although | 171 |
| abstract_inverted_index.However, | 63 |
| abstract_inverted_index.analysis | 66 |
| abstract_inverted_index.applying | 54 |
| abstract_inverted_index.attempts | 93 |
| abstract_inverted_index.balanced | 141 |
| abstract_inverted_index.blocking | 133 |
| abstract_inverted_index.choosing | 82 |
| abstract_inverted_index.complete | 145 |
| abstract_inverted_index.intended | 23 |
| abstract_inverted_index.matching | 5, 105, 136, 159, 185 |
| abstract_inverted_index.methods, | 106, 186 |
| abstract_inverted_index.methods. | 137 |
| abstract_inverted_index.opposite | 20 |
| abstract_inverted_index.original | 169 |
| abstract_inverted_index.present. | 85 |
| abstract_inverted_index.relative | 166 |
| abstract_inverted_index.unbiased | 61 |
| abstract_inverted_index.uniquely | 117 |
| abstract_inverted_index.weakness | 87 |
| abstract_inverted_index.Moreover, | 138 |
| abstract_inverted_index.available | 184 |
| abstract_inverted_index.efficient | 109 |
| abstract_inverted_index.estimate. | 62 |
| abstract_inverted_index.imbalance | 125, 164 |
| abstract_inverted_index.increases | 163 |
| abstract_inverted_index.completely | 97 |
| abstract_inverted_index.covariates | 47 |
| abstract_inverted_index.eliminated | 129 |
| abstract_inverted_index.enormously | 8 |
| abstract_inverted_index.generation | 69 |
| abstract_inverted_index.imbalance, | 27 |
| abstract_inverted_index.increasing | 26 |
| abstract_inverted_index.inference, | 16 |
| abstract_inverted_index.productive | 192 |
| abstract_inverted_index.projecting | 42 |
| abstract_inverted_index.propensity | 3, 51, 187 |
| abstract_inverted_index.randomized | 98, 112 |
| abstract_inverted_index.supposedly | 34 |
| abstract_inverted_index.approximate | 95, 144 |
| abstract_inverted_index.dependence, | 30 |
| abstract_inverted_index.experiment, | 99 |
| abstract_inverted_index.experiment. | 113 |
| abstract_inverted_index.researchers | 175 |
| abstract_inverted_index.accomplishes | 18 |
| abstract_inverted_index.approximates | 157 |
| abstract_inverted_index.approximating | 131 |
| abstract_inverted_index.inefficiency, | 28 |
| abstract_inverted_index.observational | 65 |
| abstract_inverted_index.observations, | 155 |
| abstract_inverted_index.preprocessing | 12 |
| abstract_inverted_index.randomization, | 146 |
| cited_by_percentile_year | |
| countries_distinct_count | 1 |
| institutions_distinct_count | 2 |
| citation_normalized_percentile |