Indexing Join Inputs for Fast Queries and Maintenance Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2502.10874
In database systems, joins are often expensive despite many years of research producing numerous join algorithms. Precomputed and materialized join views deliver the best query performance, whereas traditional indexes, used as pre-sorted inputs for merge joins, permit very efficient maintenance. Neither traditional indexes nor materialized join views require blocking phases, in contrast to query-time sorting and transient indexes, e.g., hash tables in hash joins, that impose high memory requirements and possibly spill to temporary storage. Here, we introduce a hybrid of traditional indexing and materialized join views. The *merged index* can be implemented with traditional b-trees, permits high-bandwidth maintenance using log-structured merge-forests, supports all join types (inner joins, all outer joins, all semi joins), and enables non-blocking query processing. Experiments across a wide range of scenarios confirm its query performance comparable to materialized join views and maintenance efficiency comparable to traditional indexes.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2502.10874
- https://arxiv.org/pdf/2502.10874
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4407686151
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4407686151Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2502.10874Digital Object Identifier
- Title
-
Indexing Join Inputs for Fast Queries and MaintenanceWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-02-15Full publication date if available
- Authors
-
Wei Lyu, Goetz GraefeList of authors in order
- Landing page
-
https://arxiv.org/abs/2502.10874Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2502.10874Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2502.10874Direct OA link when available
- Concepts
-
Join (topology), Search engine indexing, Computer science, Database, Information retrieval, Parallel computing, Mathematics, CombinatoricsTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4407686151 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2502.10874 |
| ids.doi | https://doi.org/10.48550/arxiv.2502.10874 |
| ids.openalex | https://openalex.org/W4407686151 |
| fwci | |
| type | preprint |
| title | Indexing Join Inputs for Fast Queries and Maintenance |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10317 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.980400025844574 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1705 |
| topics[0].subfield.display_name | Computer Networks and Communications |
| topics[0].display_name | Advanced Database Systems and Queries |
| topics[1].id | https://openalex.org/T11719 |
| topics[1].field.id | https://openalex.org/fields/18 |
| topics[1].field.display_name | Decision Sciences |
| topics[1].score | 0.9298999905586243 |
| topics[1].domain.id | https://openalex.org/domains/2 |
| topics[1].domain.display_name | Social Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1803 |
| topics[1].subfield.display_name | Management Science and Operations Research |
| topics[1].display_name | Data Quality and Management |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C2776124973 |
| concepts[0].level | 2 |
| concepts[0].score | 0.872483491897583 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q3183033 |
| concepts[0].display_name | Join (topology) |
| concepts[1].id | https://openalex.org/C75165309 |
| concepts[1].level | 2 |
| concepts[1].score | 0.791415810585022 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q2258979 |
| concepts[1].display_name | Search engine indexing |
| concepts[2].id | https://openalex.org/C41008148 |
| concepts[2].level | 0 |
| concepts[2].score | 0.6429405212402344 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[2].display_name | Computer science |
| concepts[3].id | https://openalex.org/C77088390 |
| concepts[3].level | 1 |
| concepts[3].score | 0.46684014797210693 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q8513 |
| concepts[3].display_name | Database |
| concepts[4].id | https://openalex.org/C23123220 |
| concepts[4].level | 1 |
| concepts[4].score | 0.43766552209854126 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q816826 |
| concepts[4].display_name | Information retrieval |
| concepts[5].id | https://openalex.org/C173608175 |
| concepts[5].level | 1 |
| concepts[5].score | 0.3788984417915344 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q232661 |
| concepts[5].display_name | Parallel computing |
| concepts[6].id | https://openalex.org/C33923547 |
| concepts[6].level | 0 |
| concepts[6].score | 0.15879759192466736 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[6].display_name | Mathematics |
| concepts[7].id | https://openalex.org/C114614502 |
| concepts[7].level | 1 |
| concepts[7].score | 0.08079478144645691 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q76592 |
| concepts[7].display_name | Combinatorics |
| keywords[0].id | https://openalex.org/keywords/join |
| keywords[0].score | 0.872483491897583 |
| keywords[0].display_name | Join (topology) |
| keywords[1].id | https://openalex.org/keywords/search-engine-indexing |
| keywords[1].score | 0.791415810585022 |
| keywords[1].display_name | Search engine indexing |
| keywords[2].id | https://openalex.org/keywords/computer-science |
| keywords[2].score | 0.6429405212402344 |
| keywords[2].display_name | Computer science |
| keywords[3].id | https://openalex.org/keywords/database |
| keywords[3].score | 0.46684014797210693 |
| keywords[3].display_name | Database |
| keywords[4].id | https://openalex.org/keywords/information-retrieval |
| keywords[4].score | 0.43766552209854126 |
| keywords[4].display_name | Information retrieval |
| keywords[5].id | https://openalex.org/keywords/parallel-computing |
| keywords[5].score | 0.3788984417915344 |
| keywords[5].display_name | Parallel computing |
| keywords[6].id | https://openalex.org/keywords/mathematics |
| keywords[6].score | 0.15879759192466736 |
| keywords[6].display_name | Mathematics |
| keywords[7].id | https://openalex.org/keywords/combinatorics |
| keywords[7].score | 0.08079478144645691 |
| keywords[7].display_name | Combinatorics |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2502.10874 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2502.10874 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2502.10874 |
| locations[1].id | doi:10.48550/arxiv.2502.10874 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2502.10874 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5091188210 |
| authorships[0].author.orcid | https://orcid.org/0000-0001-7441-9961 |
| authorships[0].author.display_name | Wei Lyu |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Lyu, Wenhui |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5041862671 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-0194-6466 |
| authorships[1].author.display_name | Goetz Graefe |
| authorships[1].author_position | last |
| authorships[1].raw_author_name | Graefe, Goetz |
| authorships[1].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2502.10874 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Indexing Join Inputs for Fast Queries and Maintenance |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10317 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.980400025844574 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1705 |
| primary_topic.subfield.display_name | Computer Networks and Communications |
| primary_topic.display_name | Advanced Database Systems and Queries |
| related_works | https://openalex.org/W4205996836, https://openalex.org/W2151692181, https://openalex.org/W4392498349, https://openalex.org/W2093960938, https://openalex.org/W3214148052, https://openalex.org/W4392216655, https://openalex.org/W2807741550, https://openalex.org/W794462722, https://openalex.org/W2188505374, https://openalex.org/W1558342070 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2502.10874 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2502.10874 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2502.10874 |
| primary_location.id | pmh:oai:arXiv.org:2502.10874 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2502.10874 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2502.10874 |
| publication_date | 2025-02-15 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 78, 121 |
| abstract_inverted_index.In | 0 |
| abstract_inverted_index.as | 30 |
| abstract_inverted_index.be | 91 |
| abstract_inverted_index.in | 50, 61 |
| abstract_inverted_index.of | 10, 80, 124 |
| abstract_inverted_index.to | 52, 72, 131, 139 |
| abstract_inverted_index.we | 76 |
| abstract_inverted_index.The | 87 |
| abstract_inverted_index.all | 103, 108, 111 |
| abstract_inverted_index.and | 17, 55, 69, 83, 114, 135 |
| abstract_inverted_index.are | 4 |
| abstract_inverted_index.can | 90 |
| abstract_inverted_index.for | 33 |
| abstract_inverted_index.its | 127 |
| abstract_inverted_index.nor | 43 |
| abstract_inverted_index.the | 22 |
| abstract_inverted_index.best | 23 |
| abstract_inverted_index.hash | 59, 62 |
| abstract_inverted_index.high | 66 |
| abstract_inverted_index.join | 14, 19, 45, 85, 104, 133 |
| abstract_inverted_index.many | 8 |
| abstract_inverted_index.semi | 112 |
| abstract_inverted_index.that | 64 |
| abstract_inverted_index.used | 29 |
| abstract_inverted_index.very | 37 |
| abstract_inverted_index.wide | 122 |
| abstract_inverted_index.with | 93 |
| abstract_inverted_index.Here, | 75 |
| abstract_inverted_index.e.g., | 58 |
| abstract_inverted_index.joins | 3 |
| abstract_inverted_index.merge | 34 |
| abstract_inverted_index.often | 5 |
| abstract_inverted_index.outer | 109 |
| abstract_inverted_index.query | 24, 117, 128 |
| abstract_inverted_index.range | 123 |
| abstract_inverted_index.spill | 71 |
| abstract_inverted_index.types | 105 |
| abstract_inverted_index.using | 99 |
| abstract_inverted_index.views | 20, 46, 134 |
| abstract_inverted_index.years | 9 |
| abstract_inverted_index.(inner | 106 |
| abstract_inverted_index.across | 120 |
| abstract_inverted_index.hybrid | 79 |
| abstract_inverted_index.impose | 65 |
| abstract_inverted_index.index* | 89 |
| abstract_inverted_index.inputs | 32 |
| abstract_inverted_index.joins, | 35, 63, 107, 110 |
| abstract_inverted_index.memory | 67 |
| abstract_inverted_index.permit | 36 |
| abstract_inverted_index.tables | 60 |
| abstract_inverted_index.views. | 86 |
| abstract_inverted_index.*merged | 88 |
| abstract_inverted_index.Neither | 40 |
| abstract_inverted_index.confirm | 126 |
| abstract_inverted_index.deliver | 21 |
| abstract_inverted_index.despite | 7 |
| abstract_inverted_index.enables | 115 |
| abstract_inverted_index.indexes | 42 |
| abstract_inverted_index.joins), | 113 |
| abstract_inverted_index.permits | 96 |
| abstract_inverted_index.phases, | 49 |
| abstract_inverted_index.require | 47 |
| abstract_inverted_index.sorting | 54 |
| abstract_inverted_index.whereas | 26 |
| abstract_inverted_index.b-trees, | 95 |
| abstract_inverted_index.blocking | 48 |
| abstract_inverted_index.contrast | 51 |
| abstract_inverted_index.database | 1 |
| abstract_inverted_index.indexes, | 28, 57 |
| abstract_inverted_index.indexes. | 141 |
| abstract_inverted_index.indexing | 82 |
| abstract_inverted_index.numerous | 13 |
| abstract_inverted_index.possibly | 70 |
| abstract_inverted_index.research | 11 |
| abstract_inverted_index.storage. | 74 |
| abstract_inverted_index.supports | 102 |
| abstract_inverted_index.systems, | 2 |
| abstract_inverted_index.efficient | 38 |
| abstract_inverted_index.expensive | 6 |
| abstract_inverted_index.introduce | 77 |
| abstract_inverted_index.producing | 12 |
| abstract_inverted_index.scenarios | 125 |
| abstract_inverted_index.temporary | 73 |
| abstract_inverted_index.transient | 56 |
| abstract_inverted_index.comparable | 130, 138 |
| abstract_inverted_index.efficiency | 137 |
| abstract_inverted_index.pre-sorted | 31 |
| abstract_inverted_index.query-time | 53 |
| abstract_inverted_index.Experiments | 119 |
| abstract_inverted_index.Precomputed | 16 |
| abstract_inverted_index.algorithms. | 15 |
| abstract_inverted_index.implemented | 92 |
| abstract_inverted_index.maintenance | 98, 136 |
| abstract_inverted_index.performance | 129 |
| abstract_inverted_index.processing. | 118 |
| abstract_inverted_index.traditional | 27, 41, 81, 94, 140 |
| abstract_inverted_index.maintenance. | 39 |
| abstract_inverted_index.materialized | 18, 44, 84, 132 |
| abstract_inverted_index.non-blocking | 116 |
| abstract_inverted_index.performance, | 25 |
| abstract_inverted_index.requirements | 68 |
| abstract_inverted_index.high-bandwidth | 97 |
| abstract_inverted_index.log-structured | 100 |
| abstract_inverted_index.merge-forests, | 101 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 2 |
| citation_normalized_percentile |