AutoIRT: Calibrating Item Response Theory Models with Automated Machine Learning Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2409.08823
Item response theory (IRT) is a class of interpretable factor models that are widely used in computerized adaptive tests (CATs), such as language proficiency tests. Traditionally, these are fit using parametric mixed effects models on the probability of a test taker getting the correct answer to a test item (i.e., question). Neural net extensions of these models, such as BertIRT, require specialized architectures and parameter tuning. We propose a multistage fitting procedure that is compatible with out-of-the-box Automated Machine Learning (AutoML) tools. It is based on a Monte Carlo EM (MCEM) outer loop with a two stage inner loop, which trains a non-parametric AutoML grade model using item features followed by an item specific parametric model. This greatly accelerates the modeling workflow for scoring tests. We demonstrate its effectiveness by applying it to the Duolingo English Test, a high stakes, online English proficiency test. We show that the resulting model is typically more well calibrated, gets better predictive performance, and more accurate scores than existing methods (non-explanatory IRT models and explanatory IRT models like BERT-IRT). Along the way, we provide a brief survey of machine learning methods for calibration of item parameters for CATs.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2409.08823
- https://arxiv.org/pdf/2409.08823
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4403662143
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4403662143Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2409.08823Digital Object Identifier
- Title
-
AutoIRT: Calibrating Item Response Theory Models with Automated Machine LearningWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-09-13Full publication date if available
- Authors
-
James Sharpnack, Phoebe Mulcaire, Klinton Bicknell, Geoffrey T. LaFlair, Kevin YanceyList of authors in order
- Landing page
-
https://arxiv.org/abs/2409.08823Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2409.08823Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2409.08823Direct OA link when available
- Concepts
-
Item response theory, Computer science, Artificial intelligence, Machine learning, Mathematics, Statistics, PsychometricsTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4403662143 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2409.08823 |
| ids.doi | https://doi.org/10.48550/arxiv.2409.08823 |
| ids.openalex | https://openalex.org/W4403662143 |
| fwci | |
| type | preprint |
| title | AutoIRT: Calibrating Item Response Theory Models with Automated Machine Learning |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10467 |
| topics[0].field.id | https://openalex.org/fields/18 |
| topics[0].field.display_name | Decision Sciences |
| topics[0].score | 0.6290000081062317 |
| topics[0].domain.id | https://openalex.org/domains/2 |
| topics[0].domain.display_name | Social Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1803 |
| topics[0].subfield.display_name | Management Science and Operations Research |
| topics[0].display_name | Psychometric Methodologies and Testing |
| topics[1].id | https://openalex.org/T14484 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.6194000244140625 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1710 |
| topics[1].subfield.display_name | Information Systems |
| topics[1].display_name | Technology and Data Analysis |
| topics[2].id | https://openalex.org/T13748 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.5927000045776367 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1705 |
| topics[2].subfield.display_name | Computer Networks and Communications |
| topics[2].display_name | Advanced Statistical Modeling Techniques |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C19875794 |
| concepts[0].level | 3 |
| concepts[0].score | 0.7292227745056152 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q1207340 |
| concepts[0].display_name | Item response theory |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.6063409447669983 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C154945302 |
| concepts[2].level | 1 |
| concepts[2].score | 0.5248938798904419 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[2].display_name | Artificial intelligence |
| concepts[3].id | https://openalex.org/C119857082 |
| concepts[3].level | 1 |
| concepts[3].score | 0.5032097697257996 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q2539 |
| concepts[3].display_name | Machine learning |
| concepts[4].id | https://openalex.org/C33923547 |
| concepts[4].level | 0 |
| concepts[4].score | 0.15519008040428162 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[4].display_name | Mathematics |
| concepts[5].id | https://openalex.org/C105795698 |
| concepts[5].level | 1 |
| concepts[5].score | 0.13051876425743103 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q12483 |
| concepts[5].display_name | Statistics |
| concepts[6].id | https://openalex.org/C171606756 |
| concepts[6].level | 2 |
| concepts[6].score | 0.07627314329147339 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q506132 |
| concepts[6].display_name | Psychometrics |
| keywords[0].id | https://openalex.org/keywords/item-response-theory |
| keywords[0].score | 0.7292227745056152 |
| keywords[0].display_name | Item response theory |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.6063409447669983 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[2].score | 0.5248938798904419 |
| keywords[2].display_name | Artificial intelligence |
| keywords[3].id | https://openalex.org/keywords/machine-learning |
| keywords[3].score | 0.5032097697257996 |
| keywords[3].display_name | Machine learning |
| keywords[4].id | https://openalex.org/keywords/mathematics |
| keywords[4].score | 0.15519008040428162 |
| keywords[4].display_name | Mathematics |
| keywords[5].id | https://openalex.org/keywords/statistics |
| keywords[5].score | 0.13051876425743103 |
| keywords[5].display_name | Statistics |
| keywords[6].id | https://openalex.org/keywords/psychometrics |
| keywords[6].score | 0.07627314329147339 |
| keywords[6].display_name | Psychometrics |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2409.08823 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2409.08823 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2409.08823 |
| locations[1].id | doi:10.48550/arxiv.2409.08823 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2409.08823 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5023062037 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-7193-0972 |
| authorships[0].author.display_name | James Sharpnack |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Sharpnack, James |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5019045595 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Phoebe Mulcaire |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Mulcaire, Phoebe |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5029982656 |
| authorships[2].author.orcid | https://orcid.org/0000-0003-3404-7432 |
| authorships[2].author.display_name | Klinton Bicknell |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Bicknell, Klinton |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5020799829 |
| authorships[3].author.orcid | https://orcid.org/0000-0003-0306-6550 |
| authorships[3].author.display_name | Geoffrey T. LaFlair |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | LaFlair, Geoff |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5066843321 |
| authorships[4].author.orcid | https://orcid.org/0000-0001-5017-5675 |
| authorships[4].author.display_name | Kevin Yancey |
| authorships[4].author_position | last |
| authorships[4].raw_author_name | Yancey, Kevin |
| authorships[4].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2409.08823 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | AutoIRT: Calibrating Item Response Theory Models with Automated Machine Learning |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10467 |
| primary_topic.field.id | https://openalex.org/fields/18 |
| primary_topic.field.display_name | Decision Sciences |
| primary_topic.score | 0.6290000081062317 |
| primary_topic.domain.id | https://openalex.org/domains/2 |
| primary_topic.domain.display_name | Social Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1803 |
| primary_topic.subfield.display_name | Management Science and Operations Research |
| primary_topic.display_name | Psychometric Methodologies and Testing |
| related_works | https://openalex.org/W2961085424, https://openalex.org/W4306674287, https://openalex.org/W3046775127, https://openalex.org/W3107602296, https://openalex.org/W4394896187, https://openalex.org/W3170094116, https://openalex.org/W4386462264, https://openalex.org/W4364306694, https://openalex.org/W4312192474, https://openalex.org/W4283697347 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2409.08823 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2409.08823 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2409.08823 |
| primary_location.id | pmh:oai:arXiv.org:2409.08823 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2409.08823 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2409.08823 |
| publication_date | 2024-09-13 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 5, 38, 46, 68, 86, 94, 101, 137, 180 |
| abstract_inverted_index.EM | 89 |
| abstract_inverted_index.It | 82 |
| abstract_inverted_index.We | 66, 125, 144 |
| abstract_inverted_index.an | 111 |
| abstract_inverted_index.as | 21, 58 |
| abstract_inverted_index.by | 110, 129 |
| abstract_inverted_index.in | 15 |
| abstract_inverted_index.is | 4, 73, 83, 150 |
| abstract_inverted_index.it | 131 |
| abstract_inverted_index.of | 7, 37, 54, 183, 189 |
| abstract_inverted_index.on | 34, 85 |
| abstract_inverted_index.to | 45, 132 |
| abstract_inverted_index.we | 178 |
| abstract_inverted_index.IRT | 167, 171 |
| abstract_inverted_index.and | 63, 159, 169 |
| abstract_inverted_index.are | 12, 27 |
| abstract_inverted_index.fit | 28 |
| abstract_inverted_index.for | 122, 187, 192 |
| abstract_inverted_index.its | 127 |
| abstract_inverted_index.net | 52 |
| abstract_inverted_index.the | 35, 42, 119, 133, 147, 176 |
| abstract_inverted_index.two | 95 |
| abstract_inverted_index.Item | 0 |
| abstract_inverted_index.This | 116 |
| abstract_inverted_index.gets | 155 |
| abstract_inverted_index.high | 138 |
| abstract_inverted_index.item | 48, 107, 112, 190 |
| abstract_inverted_index.like | 173 |
| abstract_inverted_index.loop | 92 |
| abstract_inverted_index.more | 152, 160 |
| abstract_inverted_index.show | 145 |
| abstract_inverted_index.such | 20, 57 |
| abstract_inverted_index.test | 39, 47 |
| abstract_inverted_index.than | 163 |
| abstract_inverted_index.that | 11, 72, 146 |
| abstract_inverted_index.used | 14 |
| abstract_inverted_index.way, | 177 |
| abstract_inverted_index.well | 153 |
| abstract_inverted_index.with | 75, 93 |
| abstract_inverted_index.(IRT) | 3 |
| abstract_inverted_index.Along | 175 |
| abstract_inverted_index.CATs. | 193 |
| abstract_inverted_index.Carlo | 88 |
| abstract_inverted_index.Monte | 87 |
| abstract_inverted_index.Test, | 136 |
| abstract_inverted_index.based | 84 |
| abstract_inverted_index.brief | 181 |
| abstract_inverted_index.class | 6 |
| abstract_inverted_index.grade | 104 |
| abstract_inverted_index.inner | 97 |
| abstract_inverted_index.loop, | 98 |
| abstract_inverted_index.mixed | 31 |
| abstract_inverted_index.model | 105, 149 |
| abstract_inverted_index.outer | 91 |
| abstract_inverted_index.stage | 96 |
| abstract_inverted_index.taker | 40 |
| abstract_inverted_index.test. | 143 |
| abstract_inverted_index.tests | 18 |
| abstract_inverted_index.these | 26, 55 |
| abstract_inverted_index.using | 29, 106 |
| abstract_inverted_index.which | 99 |
| abstract_inverted_index.(MCEM) | 90 |
| abstract_inverted_index.(i.e., | 49 |
| abstract_inverted_index.AutoML | 103 |
| abstract_inverted_index.Neural | 51 |
| abstract_inverted_index.answer | 44 |
| abstract_inverted_index.better | 156 |
| abstract_inverted_index.factor | 9 |
| abstract_inverted_index.model. | 115 |
| abstract_inverted_index.models | 10, 33, 168, 172 |
| abstract_inverted_index.online | 140 |
| abstract_inverted_index.scores | 162 |
| abstract_inverted_index.survey | 182 |
| abstract_inverted_index.tests. | 24, 124 |
| abstract_inverted_index.theory | 2 |
| abstract_inverted_index.tools. | 81 |
| abstract_inverted_index.trains | 100 |
| abstract_inverted_index.widely | 13 |
| abstract_inverted_index.(CATs), | 19 |
| abstract_inverted_index.English | 135, 141 |
| abstract_inverted_index.Machine | 78 |
| abstract_inverted_index.correct | 43 |
| abstract_inverted_index.effects | 32 |
| abstract_inverted_index.fitting | 70 |
| abstract_inverted_index.getting | 41 |
| abstract_inverted_index.greatly | 117 |
| abstract_inverted_index.machine | 184 |
| abstract_inverted_index.methods | 165, 186 |
| abstract_inverted_index.models, | 56 |
| abstract_inverted_index.propose | 67 |
| abstract_inverted_index.provide | 179 |
| abstract_inverted_index.require | 60 |
| abstract_inverted_index.scoring | 123 |
| abstract_inverted_index.stakes, | 139 |
| abstract_inverted_index.tuning. | 65 |
| abstract_inverted_index.(AutoML) | 80 |
| abstract_inverted_index.BertIRT, | 59 |
| abstract_inverted_index.Duolingo | 134 |
| abstract_inverted_index.Learning | 79 |
| abstract_inverted_index.accurate | 161 |
| abstract_inverted_index.adaptive | 17 |
| abstract_inverted_index.applying | 130 |
| abstract_inverted_index.existing | 164 |
| abstract_inverted_index.features | 108 |
| abstract_inverted_index.followed | 109 |
| abstract_inverted_index.language | 22 |
| abstract_inverted_index.learning | 185 |
| abstract_inverted_index.modeling | 120 |
| abstract_inverted_index.response | 1 |
| abstract_inverted_index.specific | 113 |
| abstract_inverted_index.workflow | 121 |
| abstract_inverted_index.Automated | 77 |
| abstract_inverted_index.parameter | 64 |
| abstract_inverted_index.procedure | 71 |
| abstract_inverted_index.resulting | 148 |
| abstract_inverted_index.typically | 151 |
| abstract_inverted_index.BERT-IRT). | 174 |
| abstract_inverted_index.compatible | 74 |
| abstract_inverted_index.extensions | 53 |
| abstract_inverted_index.multistage | 69 |
| abstract_inverted_index.parameters | 191 |
| abstract_inverted_index.parametric | 30, 114 |
| abstract_inverted_index.predictive | 157 |
| abstract_inverted_index.question). | 50 |
| abstract_inverted_index.accelerates | 118 |
| abstract_inverted_index.calibrated, | 154 |
| abstract_inverted_index.calibration | 188 |
| abstract_inverted_index.demonstrate | 126 |
| abstract_inverted_index.explanatory | 170 |
| abstract_inverted_index.probability | 36 |
| abstract_inverted_index.proficiency | 23, 142 |
| abstract_inverted_index.specialized | 61 |
| abstract_inverted_index.computerized | 16 |
| abstract_inverted_index.performance, | 158 |
| abstract_inverted_index.architectures | 62 |
| abstract_inverted_index.effectiveness | 128 |
| abstract_inverted_index.interpretable | 8 |
| abstract_inverted_index.Traditionally, | 25 |
| abstract_inverted_index.non-parametric | 102 |
| abstract_inverted_index.out-of-the-box | 76 |
| abstract_inverted_index.(non-explanatory | 166 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 5 |
| citation_normalized_percentile |