Human Control: Definitions and Algorithms Article Swipe
Ryan M. Carey
,
Tom Everitt
·
YOU?
·
· 2023
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2305.19861
YOU?
·
· 2023
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2305.19861
How can humans stay in control of advanced artificial intelligence systems? One proposal is corrigibility, which requires the agent to follow the instructions of a human overseer, without inappropriately influencing them. In this paper, we formally define a variant of corrigibility called shutdown instructability, and show that it implies appropriate shutdown behavior, retention of human autonomy, and avoidance of user harm. We also analyse the related concepts of non-obstruction and shutdown alignment, three previously proposed algorithms for human control, and one new algorithm.
Related Topics
Concepts
Metadata
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2305.19861
- https://arxiv.org/pdf/2305.19861
- OA Status
- green
- Cited By
- 1
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4379089460
All OpenAlex metadata
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4379089460Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2305.19861Digital Object Identifier
- Title
-
Human Control: Definitions and AlgorithmsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2023Year of publication
- Publication date
-
2023-05-31Full publication date if available
- Authors
-
Ryan M. Carey, Tom EverittList of authors in order
- Landing page
-
https://arxiv.org/abs/2305.19861Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2305.19861Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2305.19861Direct OA link when available
- Concepts
-
Shutdown, Harm, Computer science, Control (management), Autonomy, Algorithm, Artificial intelligence, Psychology, Engineering, Social psychology, Political science, Nuclear engineering, LawTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2024: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4379089460 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2305.19861 |
| ids.doi | https://doi.org/10.48550/arxiv.2305.19861 |
| ids.openalex | https://openalex.org/W4379089460 |
| fwci | |
| type | preprint |
| title | Human Control: Definitions and Algorithms |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10525 |
| topics[0].field.id | https://openalex.org/fields/32 |
| topics[0].field.display_name | Psychology |
| topics[0].score | 0.9079999923706055 |
| topics[0].domain.id | https://openalex.org/domains/2 |
| topics[0].domain.display_name | Social Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/3207 |
| topics[0].subfield.display_name | Social Psychology |
| topics[0].display_name | Human-Automation Interaction and Safety |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C2780263472 |
| concepts[0].level | 2 |
| concepts[0].score | 0.904884934425354 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q331902 |
| concepts[0].display_name | Shutdown |
| concepts[1].id | https://openalex.org/C2777363581 |
| concepts[1].level | 2 |
| concepts[1].score | 0.6855061054229736 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q15098235 |
| concepts[1].display_name | Harm |
| concepts[2].id | https://openalex.org/C41008148 |
| concepts[2].level | 0 |
| concepts[2].score | 0.6204217672348022 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[2].display_name | Computer science |
| concepts[3].id | https://openalex.org/C2775924081 |
| concepts[3].level | 2 |
| concepts[3].score | 0.6050634384155273 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q55608371 |
| concepts[3].display_name | Control (management) |
| concepts[4].id | https://openalex.org/C65414064 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5956887006759644 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q484105 |
| concepts[4].display_name | Autonomy |
| concepts[5].id | https://openalex.org/C11413529 |
| concepts[5].level | 1 |
| concepts[5].score | 0.423421710729599 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q8366 |
| concepts[5].display_name | Algorithm |
| concepts[6].id | https://openalex.org/C154945302 |
| concepts[6].level | 1 |
| concepts[6].score | 0.3658026456832886 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[6].display_name | Artificial intelligence |
| concepts[7].id | https://openalex.org/C15744967 |
| concepts[7].level | 0 |
| concepts[7].score | 0.21011632680892944 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q9418 |
| concepts[7].display_name | Psychology |
| concepts[8].id | https://openalex.org/C127413603 |
| concepts[8].level | 0 |
| concepts[8].score | 0.14015617966651917 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q11023 |
| concepts[8].display_name | Engineering |
| concepts[9].id | https://openalex.org/C77805123 |
| concepts[9].level | 1 |
| concepts[9].score | 0.09519746899604797 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q161272 |
| concepts[9].display_name | Social psychology |
| concepts[10].id | https://openalex.org/C17744445 |
| concepts[10].level | 0 |
| concepts[10].score | 0.07739639282226562 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q36442 |
| concepts[10].display_name | Political science |
| concepts[11].id | https://openalex.org/C116915560 |
| concepts[11].level | 1 |
| concepts[11].score | 0.0 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q83504 |
| concepts[11].display_name | Nuclear engineering |
| concepts[12].id | https://openalex.org/C199539241 |
| concepts[12].level | 1 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q7748 |
| concepts[12].display_name | Law |
| keywords[0].id | https://openalex.org/keywords/shutdown |
| keywords[0].score | 0.904884934425354 |
| keywords[0].display_name | Shutdown |
| keywords[1].id | https://openalex.org/keywords/harm |
| keywords[1].score | 0.6855061054229736 |
| keywords[1].display_name | Harm |
| keywords[2].id | https://openalex.org/keywords/computer-science |
| keywords[2].score | 0.6204217672348022 |
| keywords[2].display_name | Computer science |
| keywords[3].id | https://openalex.org/keywords/control |
| keywords[3].score | 0.6050634384155273 |
| keywords[3].display_name | Control (management) |
| keywords[4].id | https://openalex.org/keywords/autonomy |
| keywords[4].score | 0.5956887006759644 |
| keywords[4].display_name | Autonomy |
| keywords[5].id | https://openalex.org/keywords/algorithm |
| keywords[5].score | 0.423421710729599 |
| keywords[5].display_name | Algorithm |
| keywords[6].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[6].score | 0.3658026456832886 |
| keywords[6].display_name | Artificial intelligence |
| keywords[7].id | https://openalex.org/keywords/psychology |
| keywords[7].score | 0.21011632680892944 |
| keywords[7].display_name | Psychology |
| keywords[8].id | https://openalex.org/keywords/engineering |
| keywords[8].score | 0.14015617966651917 |
| keywords[8].display_name | Engineering |
| keywords[9].id | https://openalex.org/keywords/social-psychology |
| keywords[9].score | 0.09519746899604797 |
| keywords[9].display_name | Social psychology |
| keywords[10].id | https://openalex.org/keywords/political-science |
| keywords[10].score | 0.07739639282226562 |
| keywords[10].display_name | Political science |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2305.19861 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2305.19861 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2305.19861 |
| locations[1].id | doi:10.48550/arxiv.2305.19861 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2305.19861 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5061187034 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-0186-2265 |
| authorships[0].author.display_name | Ryan M. Carey |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Carey, Ryan |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5020224050 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-1210-9866 |
| authorships[1].author.display_name | Tom Everitt |
| authorships[1].author_position | last |
| authorships[1].raw_author_name | Everitt, Tom |
| authorships[1].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2305.19861 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Human Control: Definitions and Algorithms |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10525 |
| primary_topic.field.id | https://openalex.org/fields/32 |
| primary_topic.field.display_name | Psychology |
| primary_topic.score | 0.9079999923706055 |
| primary_topic.domain.id | https://openalex.org/domains/2 |
| primary_topic.domain.display_name | Social Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/3207 |
| primary_topic.subfield.display_name | Social Psychology |
| primary_topic.display_name | Human-Automation Interaction and Safety |
| related_works | https://openalex.org/W284773248, https://openalex.org/W3021445509, https://openalex.org/W2387852758, https://openalex.org/W4243903966, https://openalex.org/W4242784450, https://openalex.org/W2341076016, https://openalex.org/W2374473947, https://openalex.org/W1603859412, https://openalex.org/W311515923, https://openalex.org/W2051172214 |
| cited_by_count | 1 |
| counts_by_year[0].year | 2024 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2305.19861 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2305.19861 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2305.19861 |
| primary_location.id | pmh:oai:arXiv.org:2305.19861 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2305.19861 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2305.19861 |
| publication_date | 2023-05-31 |
| publication_year | 2023 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 24, 37 |
| abstract_inverted_index.In | 31 |
| abstract_inverted_index.We | 61 |
| abstract_inverted_index.in | 4 |
| abstract_inverted_index.is | 13 |
| abstract_inverted_index.it | 47 |
| abstract_inverted_index.of | 6, 23, 39, 53, 58, 67 |
| abstract_inverted_index.to | 19 |
| abstract_inverted_index.we | 34 |
| abstract_inverted_index.How | 0 |
| abstract_inverted_index.One | 11 |
| abstract_inverted_index.and | 44, 56, 69, 79 |
| abstract_inverted_index.can | 1 |
| abstract_inverted_index.for | 76 |
| abstract_inverted_index.new | 81 |
| abstract_inverted_index.one | 80 |
| abstract_inverted_index.the | 17, 21, 64 |
| abstract_inverted_index.also | 62 |
| abstract_inverted_index.show | 45 |
| abstract_inverted_index.stay | 3 |
| abstract_inverted_index.that | 46 |
| abstract_inverted_index.this | 32 |
| abstract_inverted_index.user | 59 |
| abstract_inverted_index.agent | 18 |
| abstract_inverted_index.harm. | 60 |
| abstract_inverted_index.human | 25, 54, 77 |
| abstract_inverted_index.them. | 30 |
| abstract_inverted_index.three | 72 |
| abstract_inverted_index.which | 15 |
| abstract_inverted_index.called | 41 |
| abstract_inverted_index.define | 36 |
| abstract_inverted_index.follow | 20 |
| abstract_inverted_index.humans | 2 |
| abstract_inverted_index.paper, | 33 |
| abstract_inverted_index.analyse | 63 |
| abstract_inverted_index.control | 5 |
| abstract_inverted_index.implies | 48 |
| abstract_inverted_index.related | 65 |
| abstract_inverted_index.variant | 38 |
| abstract_inverted_index.without | 27 |
| abstract_inverted_index.advanced | 7 |
| abstract_inverted_index.concepts | 66 |
| abstract_inverted_index.control, | 78 |
| abstract_inverted_index.formally | 35 |
| abstract_inverted_index.proposal | 12 |
| abstract_inverted_index.proposed | 74 |
| abstract_inverted_index.requires | 16 |
| abstract_inverted_index.shutdown | 42, 50, 70 |
| abstract_inverted_index.systems? | 10 |
| abstract_inverted_index.autonomy, | 55 |
| abstract_inverted_index.avoidance | 57 |
| abstract_inverted_index.behavior, | 51 |
| abstract_inverted_index.overseer, | 26 |
| abstract_inverted_index.retention | 52 |
| abstract_inverted_index.algorithm. | 82 |
| abstract_inverted_index.algorithms | 75 |
| abstract_inverted_index.alignment, | 71 |
| abstract_inverted_index.artificial | 8 |
| abstract_inverted_index.previously | 73 |
| abstract_inverted_index.appropriate | 49 |
| abstract_inverted_index.influencing | 29 |
| abstract_inverted_index.instructions | 22 |
| abstract_inverted_index.intelligence | 9 |
| abstract_inverted_index.corrigibility | 40 |
| abstract_inverted_index.corrigibility, | 14 |
| abstract_inverted_index.inappropriately | 28 |
| abstract_inverted_index.non-obstruction | 68 |
| abstract_inverted_index.instructability, | 43 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 2 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/16 |
| sustainable_development_goals[0].score | 0.8100000023841858 |
| sustainable_development_goals[0].display_name | Peace, Justice and strong institutions |
| citation_normalized_percentile |