NEXUS: On Explaining Confounding Bias Article Swipe
YOU?
·
· 2023
· Open Access
·
· DOI: https://doi.org/10.1145/3555041.3589728
When analyzing large datasets, analysts are often interested in the explanations for unexpected results produced by their queries. In this work, we focus on aggregate SQL queries that expose correlations in the data. A major challenge that hinders the interpretation of such queries is confounding bias, which can lead to an unexpected association between variables. For example, a SQL query computes the average Covid-19 death rate in each country, may expose a puzzling correlation between the country and the death rate. In this work, we demonstrate NEXUS, a system that generates explanations in terms of a set of potential confounding variables that explain the unexpected correlation observed in a query. NEXUS mines candidate confounding variables from external sources since, in many real-life scenarios, the explanations are not solely contained in the input data. For instance, NEXUS might extract data about factors explaining the association between countries and the Covid-19 death rate, such as information about countries' economies and health outcomes. We will demonstrate the utility of NEXUS for investigating unexpected query results by interacting with the SIGMOD'23 participants, who will act as data analysts.
Related Topics
- Type
- article
- Language
- en
- Landing Page
- https://doi.org/10.1145/3555041.3589728
- https://dl.acm.org/doi/pdf/10.1145/3555041.3589728
- OA Status
- gold
- Cited By
- 1
- References
- 3
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4379390291
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4379390291Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.1145/3555041.3589728Digital Object Identifier
- Title
-
NEXUS: On Explaining Confounding BiasWork title
- Type
-
articleOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2023Year of publication
- Publication date
-
2023-06-04Full publication date if available
- Authors
-
Brit Youngmann, Michael Cafarella, Yuval Moskovitch, Babak SalimiList of authors in order
- Landing page
-
https://doi.org/10.1145/3555041.3589728Publisher landing page
- PDF URL
-
https://dl.acm.org/doi/pdf/10.1145/3555041.3589728Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
goldOpen access status per OpenAlex
- OA URL
-
https://dl.acm.org/doi/pdf/10.1145/3555041.3589728Direct OA link when available
- Concepts
-
Nexus (standard), Confounding, Computer science, Set (abstract data type), SQL, Econometrics, Data mining, Statistics, Database, Mathematics, Embedded system, Programming languageTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2024: 1Per-year citation counts (last 5 years)
- References (count)
-
3Number of works referenced by this work
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4379390291 |
|---|---|
| doi | https://doi.org/10.1145/3555041.3589728 |
| ids.doi | https://doi.org/10.1145/3555041.3589728 |
| ids.openalex | https://openalex.org/W4379390291 |
| fwci | 0.32348904 |
| type | article |
| title | NEXUS: On Explaining Confounding Bias |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | 174 |
| biblio.first_page | 171 |
| topics[0].id | https://openalex.org/T11719 |
| topics[0].field.id | https://openalex.org/fields/18 |
| topics[0].field.display_name | Decision Sciences |
| topics[0].score | 0.9876000285148621 |
| topics[0].domain.id | https://openalex.org/domains/2 |
| topics[0].domain.display_name | Social Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1803 |
| topics[0].subfield.display_name | Management Science and Operations Research |
| topics[0].display_name | Data Quality and Management |
| topics[1].id | https://openalex.org/T11512 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9781000018119812 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Anomaly Detection Techniques and Applications |
| topics[2].id | https://openalex.org/T11819 |
| topics[2].field.id | https://openalex.org/fields/27 |
| topics[2].field.display_name | Medicine |
| topics[2].score | 0.9758999943733215 |
| topics[2].domain.id | https://openalex.org/domains/4 |
| topics[2].domain.display_name | Health Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/2713 |
| topics[2].subfield.display_name | Epidemiology |
| topics[2].display_name | Data-Driven Disease Surveillance |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C148609458 |
| concepts[0].level | 2 |
| concepts[0].score | 0.8538668155670166 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q7021281 |
| concepts[0].display_name | Nexus (standard) |
| concepts[1].id | https://openalex.org/C77350462 |
| concepts[1].level | 2 |
| concepts[1].score | 0.7243086099624634 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q1125472 |
| concepts[1].display_name | Confounding |
| concepts[2].id | https://openalex.org/C41008148 |
| concepts[2].level | 0 |
| concepts[2].score | 0.6548054814338684 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[2].display_name | Computer science |
| concepts[3].id | https://openalex.org/C177264268 |
| concepts[3].level | 2 |
| concepts[3].score | 0.546829104423523 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q1514741 |
| concepts[3].display_name | Set (abstract data type) |
| concepts[4].id | https://openalex.org/C510870499 |
| concepts[4].level | 2 |
| concepts[4].score | 0.4864904284477234 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q47607 |
| concepts[4].display_name | SQL |
| concepts[5].id | https://openalex.org/C149782125 |
| concepts[5].level | 1 |
| concepts[5].score | 0.368634968996048 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q160039 |
| concepts[5].display_name | Econometrics |
| concepts[6].id | https://openalex.org/C124101348 |
| concepts[6].level | 1 |
| concepts[6].score | 0.32635802030563354 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q172491 |
| concepts[6].display_name | Data mining |
| concepts[7].id | https://openalex.org/C105795698 |
| concepts[7].level | 1 |
| concepts[7].score | 0.2308470904827118 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q12483 |
| concepts[7].display_name | Statistics |
| concepts[8].id | https://openalex.org/C77088390 |
| concepts[8].level | 1 |
| concepts[8].score | 0.17892968654632568 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q8513 |
| concepts[8].display_name | Database |
| concepts[9].id | https://openalex.org/C33923547 |
| concepts[9].level | 0 |
| concepts[9].score | 0.11198928952217102 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[9].display_name | Mathematics |
| concepts[10].id | https://openalex.org/C149635348 |
| concepts[10].level | 1 |
| concepts[10].score | 0.0 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q193040 |
| concepts[10].display_name | Embedded system |
| concepts[11].id | https://openalex.org/C199360897 |
| concepts[11].level | 1 |
| concepts[11].score | 0.0 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q9143 |
| concepts[11].display_name | Programming language |
| keywords[0].id | https://openalex.org/keywords/nexus |
| keywords[0].score | 0.8538668155670166 |
| keywords[0].display_name | Nexus (standard) |
| keywords[1].id | https://openalex.org/keywords/confounding |
| keywords[1].score | 0.7243086099624634 |
| keywords[1].display_name | Confounding |
| keywords[2].id | https://openalex.org/keywords/computer-science |
| keywords[2].score | 0.6548054814338684 |
| keywords[2].display_name | Computer science |
| keywords[3].id | https://openalex.org/keywords/set |
| keywords[3].score | 0.546829104423523 |
| keywords[3].display_name | Set (abstract data type) |
| keywords[4].id | https://openalex.org/keywords/sql |
| keywords[4].score | 0.4864904284477234 |
| keywords[4].display_name | SQL |
| keywords[5].id | https://openalex.org/keywords/econometrics |
| keywords[5].score | 0.368634968996048 |
| keywords[5].display_name | Econometrics |
| keywords[6].id | https://openalex.org/keywords/data-mining |
| keywords[6].score | 0.32635802030563354 |
| keywords[6].display_name | Data mining |
| keywords[7].id | https://openalex.org/keywords/statistics |
| keywords[7].score | 0.2308470904827118 |
| keywords[7].display_name | Statistics |
| keywords[8].id | https://openalex.org/keywords/database |
| keywords[8].score | 0.17892968654632568 |
| keywords[8].display_name | Database |
| keywords[9].id | https://openalex.org/keywords/mathematics |
| keywords[9].score | 0.11198928952217102 |
| keywords[9].display_name | Mathematics |
| language | en |
| locations[0].id | doi:10.1145/3555041.3589728 |
| locations[0].is_oa | True |
| locations[0].source | |
| locations[0].license | |
| locations[0].pdf_url | https://dl.acm.org/doi/pdf/10.1145/3555041.3589728 |
| locations[0].version | publishedVersion |
| locations[0].raw_type | proceedings-article |
| locations[0].license_id | |
| locations[0].is_accepted | True |
| locations[0].is_published | True |
| locations[0].raw_source_name | Companion of the 2023 International Conference on Management of Data |
| locations[0].landing_page_url | https://doi.org/10.1145/3555041.3589728 |
| indexed_in | crossref |
| authorships[0].author.id | https://openalex.org/A5026215174 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-0031-5550 |
| authorships[0].author.display_name | Brit Youngmann |
| authorships[0].countries | US |
| authorships[0].affiliations[0].institution_ids | https://openalex.org/I126820664 |
| authorships[0].affiliations[0].raw_affiliation_string | CSAIL MIT, Cambridge, MA, USA |
| authorships[0].institutions[0].id | https://openalex.org/I126820664 |
| authorships[0].institutions[0].ror | https://ror.org/022x6qg61 |
| authorships[0].institutions[0].type | education |
| authorships[0].institutions[0].lineage | https://openalex.org/I126820664 |
| authorships[0].institutions[0].country_code | US |
| authorships[0].institutions[0].display_name | Vassar College |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Brit Youngmann |
| authorships[0].is_corresponding | False |
| authorships[0].raw_affiliation_strings | CSAIL MIT, Cambridge, MA, USA |
| authorships[1].author.id | https://openalex.org/A5039133265 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-6122-0590 |
| authorships[1].author.display_name | Michael Cafarella |
| authorships[1].countries | US |
| authorships[1].affiliations[0].institution_ids | https://openalex.org/I126820664 |
| authorships[1].affiliations[0].raw_affiliation_string | CSAIL MIT, Cambridge, MA, USA |
| authorships[1].institutions[0].id | https://openalex.org/I126820664 |
| authorships[1].institutions[0].ror | https://ror.org/022x6qg61 |
| authorships[1].institutions[0].type | education |
| authorships[1].institutions[0].lineage | https://openalex.org/I126820664 |
| authorships[1].institutions[0].country_code | US |
| authorships[1].institutions[0].display_name | Vassar College |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Michael Cafarella |
| authorships[1].is_corresponding | False |
| authorships[1].raw_affiliation_strings | CSAIL MIT, Cambridge, MA, USA |
| authorships[2].author.id | https://openalex.org/A5005553562 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-0325-7392 |
| authorships[2].author.display_name | Yuval Moskovitch |
| authorships[2].countries | IL |
| authorships[2].affiliations[0].institution_ids | https://openalex.org/I124227911 |
| authorships[2].affiliations[0].raw_affiliation_string | Ben Gurion University of the Negev, Beer Sheva, Israel |
| authorships[2].institutions[0].id | https://openalex.org/I124227911 |
| authorships[2].institutions[0].ror | https://ror.org/05tkyf982 |
| authorships[2].institutions[0].type | education |
| authorships[2].institutions[0].lineage | https://openalex.org/I124227911 |
| authorships[2].institutions[0].country_code | IL |
| authorships[2].institutions[0].display_name | Ben-Gurion University of the Negev |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Yuval Moskovitch |
| authorships[2].is_corresponding | False |
| authorships[2].raw_affiliation_strings | Ben Gurion University of the Negev, Beer Sheva, Israel |
| authorships[3].author.id | https://openalex.org/A5103209063 |
| authorships[3].author.orcid | https://orcid.org/0000-0003-2485-9533 |
| authorships[3].author.display_name | Babak Salimi |
| authorships[3].countries | US |
| authorships[3].affiliations[0].institution_ids | https://openalex.org/I36258959 |
| authorships[3].affiliations[0].raw_affiliation_string | University of California, San Diego, San Diego, CA, USA |
| authorships[3].institutions[0].id | https://openalex.org/I36258959 |
| authorships[3].institutions[0].ror | https://ror.org/0168r3w48 |
| authorships[3].institutions[0].type | education |
| authorships[3].institutions[0].lineage | https://openalex.org/I36258959 |
| authorships[3].institutions[0].country_code | US |
| authorships[3].institutions[0].display_name | University of California, San Diego |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Babak Salimi |
| authorships[3].is_corresponding | False |
| authorships[3].raw_affiliation_strings | University of California, San Diego, San Diego, CA, USA |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://dl.acm.org/doi/pdf/10.1145/3555041.3589728 |
| open_access.oa_status | gold |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | NEXUS: On Explaining Confounding Bias |
| has_fulltext | True |
| is_retracted | False |
| updated_date | 2025-11-06T03:46:38.306776 |
| primary_topic.id | https://openalex.org/T11719 |
| primary_topic.field.id | https://openalex.org/fields/18 |
| primary_topic.field.display_name | Decision Sciences |
| primary_topic.score | 0.9876000285148621 |
| primary_topic.domain.id | https://openalex.org/domains/2 |
| primary_topic.domain.display_name | Social Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1803 |
| primary_topic.subfield.display_name | Management Science and Operations Research |
| primary_topic.display_name | Data Quality and Management |
| related_works | https://openalex.org/W3126909309, https://openalex.org/W4256360871, https://openalex.org/W2747100754, https://openalex.org/W3117832639, https://openalex.org/W2885513359, https://openalex.org/W2802787844, https://openalex.org/W1578170453, https://openalex.org/W2965101536, https://openalex.org/W2891178753, https://openalex.org/W3121214617 |
| cited_by_count | 1 |
| counts_by_year[0].year | 2024 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 1 |
| best_oa_location.id | doi:10.1145/3555041.3589728 |
| best_oa_location.is_oa | True |
| best_oa_location.source | |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://dl.acm.org/doi/pdf/10.1145/3555041.3589728 |
| best_oa_location.version | publishedVersion |
| best_oa_location.raw_type | proceedings-article |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | True |
| best_oa_location.is_published | True |
| best_oa_location.raw_source_name | Companion of the 2023 International Conference on Management of Data |
| best_oa_location.landing_page_url | https://doi.org/10.1145/3555041.3589728 |
| primary_location.id | doi:10.1145/3555041.3589728 |
| primary_location.is_oa | True |
| primary_location.source | |
| primary_location.license | |
| primary_location.pdf_url | https://dl.acm.org/doi/pdf/10.1145/3555041.3589728 |
| primary_location.version | publishedVersion |
| primary_location.raw_type | proceedings-article |
| primary_location.license_id | |
| primary_location.is_accepted | True |
| primary_location.is_published | True |
| primary_location.raw_source_name | Companion of the 2023 International Conference on Management of Data |
| primary_location.landing_page_url | https://doi.org/10.1145/3555041.3589728 |
| publication_date | 2023-06-04 |
| publication_year | 2023 |
| referenced_works | https://openalex.org/W3173871096, https://openalex.org/W2798682670, https://openalex.org/W3139909695 |
| referenced_works_count | 3 |
| abstract_inverted_index.A | 33 |
| abstract_inverted_index.a | 57, 71, 87, 95, 108 |
| abstract_inverted_index.In | 18, 81 |
| abstract_inverted_index.We | 160 |
| abstract_inverted_index.an | 50 |
| abstract_inverted_index.as | 152, 181 |
| abstract_inverted_index.by | 15, 172 |
| abstract_inverted_index.in | 8, 30, 66, 92, 107, 119, 129 |
| abstract_inverted_index.is | 43 |
| abstract_inverted_index.of | 40, 94, 97, 165 |
| abstract_inverted_index.on | 23 |
| abstract_inverted_index.to | 49 |
| abstract_inverted_index.we | 21, 84 |
| abstract_inverted_index.For | 55, 133 |
| abstract_inverted_index.SQL | 25, 58 |
| abstract_inverted_index.act | 180 |
| abstract_inverted_index.and | 77, 146, 157 |
| abstract_inverted_index.are | 5, 125 |
| abstract_inverted_index.can | 47 |
| abstract_inverted_index.for | 11, 167 |
| abstract_inverted_index.may | 69 |
| abstract_inverted_index.not | 126 |
| abstract_inverted_index.set | 96 |
| abstract_inverted_index.the | 9, 31, 38, 61, 75, 78, 103, 123, 130, 142, 147, 163, 175 |
| abstract_inverted_index.who | 178 |
| abstract_inverted_index.When | 0 |
| abstract_inverted_index.data | 138, 182 |
| abstract_inverted_index.each | 67 |
| abstract_inverted_index.from | 115 |
| abstract_inverted_index.lead | 48 |
| abstract_inverted_index.many | 120 |
| abstract_inverted_index.rate | 65 |
| abstract_inverted_index.such | 41, 151 |
| abstract_inverted_index.that | 27, 36, 89, 101 |
| abstract_inverted_index.this | 19, 82 |
| abstract_inverted_index.will | 161, 179 |
| abstract_inverted_index.with | 174 |
| abstract_inverted_index.NEXUS | 110, 135, 166 |
| abstract_inverted_index.about | 139, 154 |
| abstract_inverted_index.bias, | 45 |
| abstract_inverted_index.data. | 32, 132 |
| abstract_inverted_index.death | 64, 79, 149 |
| abstract_inverted_index.focus | 22 |
| abstract_inverted_index.input | 131 |
| abstract_inverted_index.large | 2 |
| abstract_inverted_index.major | 34 |
| abstract_inverted_index.might | 136 |
| abstract_inverted_index.mines | 111 |
| abstract_inverted_index.often | 6 |
| abstract_inverted_index.query | 59, 170 |
| abstract_inverted_index.rate, | 150 |
| abstract_inverted_index.rate. | 80 |
| abstract_inverted_index.terms | 93 |
| abstract_inverted_index.their | 16 |
| abstract_inverted_index.which | 46 |
| abstract_inverted_index.work, | 20, 83 |
| abstract_inverted_index.NEXUS, | 86 |
| abstract_inverted_index.expose | 28, 70 |
| abstract_inverted_index.health | 158 |
| abstract_inverted_index.query. | 109 |
| abstract_inverted_index.since, | 118 |
| abstract_inverted_index.solely | 127 |
| abstract_inverted_index.system | 88 |
| abstract_inverted_index.average | 62 |
| abstract_inverted_index.between | 53, 74, 144 |
| abstract_inverted_index.country | 76 |
| abstract_inverted_index.explain | 102 |
| abstract_inverted_index.extract | 137 |
| abstract_inverted_index.factors | 140 |
| abstract_inverted_index.hinders | 37 |
| abstract_inverted_index.queries | 26, 42 |
| abstract_inverted_index.results | 13, 171 |
| abstract_inverted_index.sources | 117 |
| abstract_inverted_index.utility | 164 |
| abstract_inverted_index.Covid-19 | 63, 148 |
| abstract_inverted_index.analysts | 4 |
| abstract_inverted_index.computes | 60 |
| abstract_inverted_index.country, | 68 |
| abstract_inverted_index.example, | 56 |
| abstract_inverted_index.external | 116 |
| abstract_inverted_index.observed | 106 |
| abstract_inverted_index.produced | 14 |
| abstract_inverted_index.puzzling | 72 |
| abstract_inverted_index.queries. | 17 |
| abstract_inverted_index.SIGMOD'23 | 176 |
| abstract_inverted_index.aggregate | 24 |
| abstract_inverted_index.analysts. | 183 |
| abstract_inverted_index.analyzing | 1 |
| abstract_inverted_index.candidate | 112 |
| abstract_inverted_index.challenge | 35 |
| abstract_inverted_index.contained | 128 |
| abstract_inverted_index.countries | 145 |
| abstract_inverted_index.datasets, | 3 |
| abstract_inverted_index.economies | 156 |
| abstract_inverted_index.generates | 90 |
| abstract_inverted_index.instance, | 134 |
| abstract_inverted_index.outcomes. | 159 |
| abstract_inverted_index.potential | 98 |
| abstract_inverted_index.real-life | 121 |
| abstract_inverted_index.variables | 100, 114 |
| abstract_inverted_index.countries' | 155 |
| abstract_inverted_index.explaining | 141 |
| abstract_inverted_index.interested | 7 |
| abstract_inverted_index.scenarios, | 122 |
| abstract_inverted_index.unexpected | 12, 51, 104, 169 |
| abstract_inverted_index.variables. | 54 |
| abstract_inverted_index.association | 52, 143 |
| abstract_inverted_index.confounding | 44, 99, 113 |
| abstract_inverted_index.correlation | 73, 105 |
| abstract_inverted_index.demonstrate | 85, 162 |
| abstract_inverted_index.information | 153 |
| abstract_inverted_index.interacting | 173 |
| abstract_inverted_index.correlations | 29 |
| abstract_inverted_index.explanations | 10, 91, 124 |
| abstract_inverted_index.investigating | 168 |
| abstract_inverted_index.participants, | 177 |
| abstract_inverted_index.interpretation | 39 |
| cited_by_percentile_year.max | 94 |
| cited_by_percentile_year.min | 90 |
| countries_distinct_count | 2 |
| institutions_distinct_count | 4 |
| citation_normalized_percentile.value | 0.57527371 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | False |