Efficient warp execution in presence of divergence with collaborative context collection Article Swipe
YOU?
·
· 2015
· Open Access
·
· DOI: https://doi.org/10.1145/2830772.2830796
GPU's SIMD architecture is a double-edged sword confronting parallel tasks with control flow divergence. On the one hand, it provides a high performance yet power-efficient platform to accelerate applications via massive parallelism; however, on the other hand, irregularities induce inefficiencies due to the warp's lockstep traversal of all diverging execution paths. In this work, we present a software (compiler) technique named Collaborative Context Collection (CCC) that increases the warp execution efficiency when faced with thread divergence incurred either by different intra-warp task assignment or by intra-warp load imbalance. CCC collects the relevant registers of divergent threads in a warp-specific stack allocated in the fast shared memory, and restores them only when the perfect utilization of warp lanes becomes feasible. We propose code transformations to enable applicability of CCC to variety of program segments with thread divergence. We also introduce optimizations to reduce the cost of CCC and to avoid device occupancy limitation or memory divergence. We have developed a framework that automates application of CCC to CUDA generated intermediate PTX code. We evaluated CCC on real-world applications and multiple scenarios using synthetic programs. CCC improves the warp execution efficiency of real-world benchmarks by up to 56% and achieves an average speedup of 1.69x (maximum 3.08x).
Related Topics
- Type
- article
- Language
- en
- Landing Page
- https://doi.org/10.1145/2830772.2830796
- https://dl.acm.org/doi/pdf/10.1145/2830772.2830796
- OA Status
- gold
- Cited By
- 34
- References
- 51
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W2236252626
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W2236252626Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.1145/2830772.2830796Digital Object Identifier
- Title
-
Efficient warp execution in presence of divergence with collaborative context collectionWork title
- Type
-
articleOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2015Year of publication
- Publication date
-
2015-12-05Full publication date if available
- Authors
-
Farzad Khorasani, Rajiv Gupta, Laxmi N. BhuyanList of authors in order
- Landing page
-
https://doi.org/10.1145/2830772.2830796Publisher landing page
- PDF URL
-
https://dl.acm.org/doi/pdf/10.1145/2830772.2830796Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
goldOpen access status per OpenAlex
- OA URL
-
https://dl.acm.org/doi/pdf/10.1145/2830772.2830796Direct OA link when available
- Concepts
-
Computer science, Parallel computing, Thread (computing), Speedup, Compiler, Control flow, SIMD, Context switch, Instruction set, Multithreading, Execution model, CUDA, Operating system, Programming languageTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
34Total citation count in OpenAlex
- Citations by year (recent)
-
2024: 2, 2022: 3, 2021: 2, 2020: 3, 2019: 4Per-year citation counts (last 5 years)
- References (count)
-
51Number of works referenced by this work
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W2236252626 |
|---|---|
| doi | https://doi.org/10.1145/2830772.2830796 |
| ids.doi | https://doi.org/10.1145/2830772.2830796 |
| ids.mag | 2236252626 |
| ids.openalex | https://openalex.org/W2236252626 |
| fwci | 6.01265622 |
| type | article |
| title | Efficient warp execution in presence of divergence with collaborative context collection |
| awards[0].id | https://openalex.org/G8002895364 |
| awards[0].funder_id | https://openalex.org/F4320306076 |
| awards[0].display_name | |
| awards[0].funder_award_id | CCF-0905509,CNS-1157377,CCF-1318103,CCF-1524852,CCF-1423108 |
| awards[0].funder_display_name | National Science Foundation |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | 215 |
| biblio.first_page | 204 |
| topics[0].id | https://openalex.org/T10054 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9998999834060669 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1708 |
| topics[0].subfield.display_name | Hardware and Architecture |
| topics[0].display_name | Parallel Computing and Optimization Techniques |
| topics[1].id | https://openalex.org/T11181 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9993000030517578 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1705 |
| topics[1].subfield.display_name | Computer Networks and Communications |
| topics[1].display_name | Advanced Data Storage Technologies |
| topics[2].id | https://openalex.org/T10715 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9980999827384949 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1705 |
| topics[2].subfield.display_name | Computer Networks and Communications |
| topics[2].display_name | Distributed and Parallel Computing Systems |
| funders[0].id | https://openalex.org/F4320306076 |
| funders[0].ror | https://ror.org/021nxhr62 |
| funders[0].display_name | National Science Foundation |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.8674681186676025 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C173608175 |
| concepts[1].level | 1 |
| concepts[1].score | 0.7893720865249634 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q232661 |
| concepts[1].display_name | Parallel computing |
| concepts[2].id | https://openalex.org/C138101251 |
| concepts[2].level | 2 |
| concepts[2].score | 0.6750968098640442 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q213092 |
| concepts[2].display_name | Thread (computing) |
| concepts[3].id | https://openalex.org/C68339613 |
| concepts[3].level | 2 |
| concepts[3].score | 0.5920443534851074 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q1549489 |
| concepts[3].display_name | Speedup |
| concepts[4].id | https://openalex.org/C169590947 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5575736165046692 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q47506 |
| concepts[4].display_name | Compiler |
| concepts[5].id | https://openalex.org/C160191386 |
| concepts[5].level | 2 |
| concepts[5].score | 0.5255511999130249 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q868299 |
| concepts[5].display_name | Control flow |
| concepts[6].id | https://openalex.org/C150552126 |
| concepts[6].level | 2 |
| concepts[6].score | 0.4732019603252411 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q339387 |
| concepts[6].display_name | SIMD |
| concepts[7].id | https://openalex.org/C53833338 |
| concepts[7].level | 2 |
| concepts[7].score | 0.44418448209762573 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q1061424 |
| concepts[7].display_name | Context switch |
| concepts[8].id | https://openalex.org/C202491316 |
| concepts[8].level | 2 |
| concepts[8].score | 0.4283791184425354 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q272683 |
| concepts[8].display_name | Instruction set |
| concepts[9].id | https://openalex.org/C201410400 |
| concepts[9].level | 3 |
| concepts[9].score | 0.4264149069786072 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q1064412 |
| concepts[9].display_name | Multithreading |
| concepts[10].id | https://openalex.org/C2776834041 |
| concepts[10].level | 2 |
| concepts[10].score | 0.4212344288825989 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q25346349 |
| concepts[10].display_name | Execution model |
| concepts[11].id | https://openalex.org/C2778119891 |
| concepts[11].level | 2 |
| concepts[11].score | 0.4197118282318115 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q477690 |
| concepts[11].display_name | CUDA |
| concepts[12].id | https://openalex.org/C111919701 |
| concepts[12].level | 1 |
| concepts[12].score | 0.323000431060791 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q9135 |
| concepts[12].display_name | Operating system |
| concepts[13].id | https://openalex.org/C199360897 |
| concepts[13].level | 1 |
| concepts[13].score | 0.3218782842159271 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q9143 |
| concepts[13].display_name | Programming language |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.8674681186676025 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/parallel-computing |
| keywords[1].score | 0.7893720865249634 |
| keywords[1].display_name | Parallel computing |
| keywords[2].id | https://openalex.org/keywords/thread |
| keywords[2].score | 0.6750968098640442 |
| keywords[2].display_name | Thread (computing) |
| keywords[3].id | https://openalex.org/keywords/speedup |
| keywords[3].score | 0.5920443534851074 |
| keywords[3].display_name | Speedup |
| keywords[4].id | https://openalex.org/keywords/compiler |
| keywords[4].score | 0.5575736165046692 |
| keywords[4].display_name | Compiler |
| keywords[5].id | https://openalex.org/keywords/control-flow |
| keywords[5].score | 0.5255511999130249 |
| keywords[5].display_name | Control flow |
| keywords[6].id | https://openalex.org/keywords/simd |
| keywords[6].score | 0.4732019603252411 |
| keywords[6].display_name | SIMD |
| keywords[7].id | https://openalex.org/keywords/context-switch |
| keywords[7].score | 0.44418448209762573 |
| keywords[7].display_name | Context switch |
| keywords[8].id | https://openalex.org/keywords/instruction-set |
| keywords[8].score | 0.4283791184425354 |
| keywords[8].display_name | Instruction set |
| keywords[9].id | https://openalex.org/keywords/multithreading |
| keywords[9].score | 0.4264149069786072 |
| keywords[9].display_name | Multithreading |
| keywords[10].id | https://openalex.org/keywords/execution-model |
| keywords[10].score | 0.4212344288825989 |
| keywords[10].display_name | Execution model |
| keywords[11].id | https://openalex.org/keywords/cuda |
| keywords[11].score | 0.4197118282318115 |
| keywords[11].display_name | CUDA |
| keywords[12].id | https://openalex.org/keywords/operating-system |
| keywords[12].score | 0.323000431060791 |
| keywords[12].display_name | Operating system |
| keywords[13].id | https://openalex.org/keywords/programming-language |
| keywords[13].score | 0.3218782842159271 |
| keywords[13].display_name | Programming language |
| language | en |
| locations[0].id | doi:10.1145/2830772.2830796 |
| locations[0].is_oa | True |
| locations[0].source | |
| locations[0].license | cc-by |
| locations[0].pdf_url | https://dl.acm.org/doi/pdf/10.1145/2830772.2830796 |
| locations[0].version | publishedVersion |
| locations[0].raw_type | proceedings-article |
| locations[0].license_id | https://openalex.org/licenses/cc-by |
| locations[0].is_accepted | True |
| locations[0].is_published | True |
| locations[0].raw_source_name | Proceedings of the 48th International Symposium on Microarchitecture |
| locations[0].landing_page_url | https://doi.org/10.1145/2830772.2830796 |
| indexed_in | crossref |
| authorships[0].author.id | https://openalex.org/A5036836220 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Farzad Khorasani |
| authorships[0].countries | US |
| authorships[0].affiliations[0].institution_ids | https://openalex.org/I103635307 |
| authorships[0].affiliations[0].raw_affiliation_string | University of California, Riverside, CA |
| authorships[0].institutions[0].id | https://openalex.org/I103635307 |
| authorships[0].institutions[0].ror | https://ror.org/03nawhv43 |
| authorships[0].institutions[0].type | education |
| authorships[0].institutions[0].lineage | https://openalex.org/I103635307 |
| authorships[0].institutions[0].country_code | US |
| authorships[0].institutions[0].display_name | University of California, Riverside |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Farzad Khorasani |
| authorships[0].is_corresponding | False |
| authorships[0].raw_affiliation_strings | University of California, Riverside, CA |
| authorships[1].author.id | https://openalex.org/A5100699251 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-9348-3974 |
| authorships[1].author.display_name | Rajiv Gupta |
| authorships[1].countries | US |
| authorships[1].affiliations[0].institution_ids | https://openalex.org/I103635307 |
| authorships[1].affiliations[0].raw_affiliation_string | University of California, Riverside, CA |
| authorships[1].institutions[0].id | https://openalex.org/I103635307 |
| authorships[1].institutions[0].ror | https://ror.org/03nawhv43 |
| authorships[1].institutions[0].type | education |
| authorships[1].institutions[0].lineage | https://openalex.org/I103635307 |
| authorships[1].institutions[0].country_code | US |
| authorships[1].institutions[0].display_name | University of California, Riverside |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Rajiv Gupta |
| authorships[1].is_corresponding | False |
| authorships[1].raw_affiliation_strings | University of California, Riverside, CA |
| authorships[2].author.id | https://openalex.org/A5048949780 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-8759-0458 |
| authorships[2].author.display_name | Laxmi N. Bhuyan |
| authorships[2].countries | US |
| authorships[2].affiliations[0].institution_ids | https://openalex.org/I103635307 |
| authorships[2].affiliations[0].raw_affiliation_string | University of California, Riverside, CA |
| authorships[2].institutions[0].id | https://openalex.org/I103635307 |
| authorships[2].institutions[0].ror | https://ror.org/03nawhv43 |
| authorships[2].institutions[0].type | education |
| authorships[2].institutions[0].lineage | https://openalex.org/I103635307 |
| authorships[2].institutions[0].country_code | US |
| authorships[2].institutions[0].display_name | University of California, Riverside |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | Laxmi N. Bhuyan |
| authorships[2].is_corresponding | False |
| authorships[2].raw_affiliation_strings | University of California, Riverside, CA |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://dl.acm.org/doi/pdf/10.1145/2830772.2830796 |
| open_access.oa_status | gold |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Efficient warp execution in presence of divergence with collaborative context collection |
| has_fulltext | True |
| is_retracted | False |
| updated_date | 2025-11-06T03:46:38.306776 |
| primary_topic.id | https://openalex.org/T10054 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9998999834060669 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1708 |
| primary_topic.subfield.display_name | Hardware and Architecture |
| primary_topic.display_name | Parallel Computing and Optimization Techniques |
| related_works | https://openalex.org/W1995705225, https://openalex.org/W4248655967, https://openalex.org/W2138520521, https://openalex.org/W2184902834, https://openalex.org/W2110105483, https://openalex.org/W2107831078, https://openalex.org/W4248145683, https://openalex.org/W1672168401, https://openalex.org/W2100579514, https://openalex.org/W2156983793 |
| cited_by_count | 34 |
| counts_by_year[0].year | 2024 |
| counts_by_year[0].cited_by_count | 2 |
| counts_by_year[1].year | 2022 |
| counts_by_year[1].cited_by_count | 3 |
| counts_by_year[2].year | 2021 |
| counts_by_year[2].cited_by_count | 2 |
| counts_by_year[3].year | 2020 |
| counts_by_year[3].cited_by_count | 3 |
| counts_by_year[4].year | 2019 |
| counts_by_year[4].cited_by_count | 4 |
| counts_by_year[5].year | 2018 |
| counts_by_year[5].cited_by_count | 8 |
| counts_by_year[6].year | 2017 |
| counts_by_year[6].cited_by_count | 9 |
| counts_by_year[7].year | 2016 |
| counts_by_year[7].cited_by_count | 3 |
| locations_count | 1 |
| best_oa_location.id | doi:10.1145/2830772.2830796 |
| best_oa_location.is_oa | True |
| best_oa_location.source | |
| best_oa_location.license | cc-by |
| best_oa_location.pdf_url | https://dl.acm.org/doi/pdf/10.1145/2830772.2830796 |
| best_oa_location.version | publishedVersion |
| best_oa_location.raw_type | proceedings-article |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by |
| best_oa_location.is_accepted | True |
| best_oa_location.is_published | True |
| best_oa_location.raw_source_name | Proceedings of the 48th International Symposium on Microarchitecture |
| best_oa_location.landing_page_url | https://doi.org/10.1145/2830772.2830796 |
| primary_location.id | doi:10.1145/2830772.2830796 |
| primary_location.is_oa | True |
| primary_location.source | |
| primary_location.license | cc-by |
| primary_location.pdf_url | https://dl.acm.org/doi/pdf/10.1145/2830772.2830796 |
| primary_location.version | publishedVersion |
| primary_location.raw_type | proceedings-article |
| primary_location.license_id | https://openalex.org/licenses/cc-by |
| primary_location.is_accepted | True |
| primary_location.is_published | True |
| primary_location.raw_source_name | Proceedings of the 48th International Symposium on Microarchitecture |
| primary_location.landing_page_url | https://doi.org/10.1145/2830772.2830796 |
| publication_date | 2015-12-05 |
| publication_year | 2015 |
| referenced_works | https://openalex.org/W2169880332, https://openalex.org/W2156831150, https://openalex.org/W1971997351, https://openalex.org/W2155568054, https://openalex.org/W3146509083, https://openalex.org/W1973538724, https://openalex.org/W2167675119, https://openalex.org/W2151686327, https://openalex.org/W2013247896, https://openalex.org/W2016706026, https://openalex.org/W2156180003, https://openalex.org/W1504291959, https://openalex.org/W4300125772, https://openalex.org/W2067313328, https://openalex.org/W27773700, https://openalex.org/W981516807, https://openalex.org/W2432978112, https://openalex.org/W2094945791, https://openalex.org/W2146591355, https://openalex.org/W2090584832, https://openalex.org/W1978155891, https://openalex.org/W2168921806, https://openalex.org/W2125979435, https://openalex.org/W2098505406, https://openalex.org/W2135947393, https://openalex.org/W2132598270, https://openalex.org/W2171399035, https://openalex.org/W1985291160, https://openalex.org/W2012630996, https://openalex.org/W4236883517, https://openalex.org/W2061313045, https://openalex.org/W2043420024, https://openalex.org/W1965061255, https://openalex.org/W1994316441, https://openalex.org/W2295329047, https://openalex.org/W6640247899, https://openalex.org/W2144061463, https://openalex.org/W1970815868, https://openalex.org/W2492496105, https://openalex.org/W2123440268, https://openalex.org/W1994688997, https://openalex.org/W3013490664, https://openalex.org/W2090495704, https://openalex.org/W2148443481, https://openalex.org/W2075739954, https://openalex.org/W164384110, https://openalex.org/W2748306984, https://openalex.org/W142653777, https://openalex.org/W3006582303, https://openalex.org/W579519726, https://openalex.org/W1919570435 |
| referenced_works_count | 51 |
| abstract_inverted_index.a | 4, 20, 56, 97, 158 |
| abstract_inverted_index.In | 51 |
| abstract_inverted_index.On | 14 |
| abstract_inverted_index.We | 119, 136, 155, 171 |
| abstract_inverted_index.an | 198 |
| abstract_inverted_index.by | 78, 84, 192 |
| abstract_inverted_index.in | 96, 101 |
| abstract_inverted_index.is | 3 |
| abstract_inverted_index.it | 18 |
| abstract_inverted_index.of | 46, 93, 114, 126, 130, 144, 163, 189, 201 |
| abstract_inverted_index.on | 33, 174 |
| abstract_inverted_index.or | 83, 152 |
| abstract_inverted_index.to | 26, 41, 123, 128, 140, 147, 165, 194 |
| abstract_inverted_index.up | 193 |
| abstract_inverted_index.we | 54 |
| abstract_inverted_index.56% | 195 |
| abstract_inverted_index.CCC | 88, 127, 145, 164, 173, 183 |
| abstract_inverted_index.PTX | 169 |
| abstract_inverted_index.all | 47 |
| abstract_inverted_index.and | 106, 146, 177, 196 |
| abstract_inverted_index.due | 40 |
| abstract_inverted_index.one | 16 |
| abstract_inverted_index.the | 15, 34, 42, 67, 90, 102, 111, 142, 185 |
| abstract_inverted_index.via | 29 |
| abstract_inverted_index.yet | 23 |
| abstract_inverted_index.CUDA | 166 |
| abstract_inverted_index.SIMD | 1 |
| abstract_inverted_index.also | 137 |
| abstract_inverted_index.code | 121 |
| abstract_inverted_index.cost | 143 |
| abstract_inverted_index.fast | 103 |
| abstract_inverted_index.flow | 12 |
| abstract_inverted_index.have | 156 |
| abstract_inverted_index.high | 21 |
| abstract_inverted_index.load | 86 |
| abstract_inverted_index.only | 109 |
| abstract_inverted_index.task | 81 |
| abstract_inverted_index.that | 65, 160 |
| abstract_inverted_index.them | 108 |
| abstract_inverted_index.this | 52 |
| abstract_inverted_index.warp | 68, 115, 186 |
| abstract_inverted_index.when | 71, 110 |
| abstract_inverted_index.with | 10, 73, 133 |
| abstract_inverted_index.(CCC) | 64 |
| abstract_inverted_index.1.69x | 202 |
| abstract_inverted_index.GPU's | 0 |
| abstract_inverted_index.avoid | 148 |
| abstract_inverted_index.code. | 170 |
| abstract_inverted_index.faced | 72 |
| abstract_inverted_index.hand, | 17, 36 |
| abstract_inverted_index.lanes | 116 |
| abstract_inverted_index.named | 60 |
| abstract_inverted_index.other | 35 |
| abstract_inverted_index.stack | 99 |
| abstract_inverted_index.sword | 6 |
| abstract_inverted_index.tasks | 9 |
| abstract_inverted_index.using | 180 |
| abstract_inverted_index.work, | 53 |
| abstract_inverted_index.device | 149 |
| abstract_inverted_index.either | 77 |
| abstract_inverted_index.enable | 124 |
| abstract_inverted_index.induce | 38 |
| abstract_inverted_index.memory | 153 |
| abstract_inverted_index.paths. | 50 |
| abstract_inverted_index.reduce | 141 |
| abstract_inverted_index.shared | 104 |
| abstract_inverted_index.thread | 74, 134 |
| abstract_inverted_index.warp's | 43 |
| abstract_inverted_index.3.08x). | 204 |
| abstract_inverted_index.Context | 62 |
| abstract_inverted_index.average | 199 |
| abstract_inverted_index.becomes | 117 |
| abstract_inverted_index.control | 11 |
| abstract_inverted_index.massive | 30 |
| abstract_inverted_index.memory, | 105 |
| abstract_inverted_index.perfect | 112 |
| abstract_inverted_index.present | 55 |
| abstract_inverted_index.program | 131 |
| abstract_inverted_index.propose | 120 |
| abstract_inverted_index.speedup | 200 |
| abstract_inverted_index.threads | 95 |
| abstract_inverted_index.variety | 129 |
| abstract_inverted_index.(maximum | 203 |
| abstract_inverted_index.achieves | 197 |
| abstract_inverted_index.collects | 89 |
| abstract_inverted_index.however, | 32 |
| abstract_inverted_index.improves | 184 |
| abstract_inverted_index.incurred | 76 |
| abstract_inverted_index.lockstep | 44 |
| abstract_inverted_index.multiple | 178 |
| abstract_inverted_index.parallel | 8 |
| abstract_inverted_index.platform | 25 |
| abstract_inverted_index.provides | 19 |
| abstract_inverted_index.relevant | 91 |
| abstract_inverted_index.restores | 107 |
| abstract_inverted_index.segments | 132 |
| abstract_inverted_index.software | 57 |
| abstract_inverted_index.allocated | 100 |
| abstract_inverted_index.automates | 161 |
| abstract_inverted_index.developed | 157 |
| abstract_inverted_index.different | 79 |
| abstract_inverted_index.divergent | 94 |
| abstract_inverted_index.diverging | 48 |
| abstract_inverted_index.evaluated | 172 |
| abstract_inverted_index.execution | 49, 69, 187 |
| abstract_inverted_index.feasible. | 118 |
| abstract_inverted_index.framework | 159 |
| abstract_inverted_index.generated | 167 |
| abstract_inverted_index.increases | 66 |
| abstract_inverted_index.introduce | 138 |
| abstract_inverted_index.occupancy | 150 |
| abstract_inverted_index.programs. | 182 |
| abstract_inverted_index.registers | 92 |
| abstract_inverted_index.scenarios | 179 |
| abstract_inverted_index.synthetic | 181 |
| abstract_inverted_index.technique | 59 |
| abstract_inverted_index.traversal | 45 |
| abstract_inverted_index.(compiler) | 58 |
| abstract_inverted_index.Collection | 63 |
| abstract_inverted_index.accelerate | 27 |
| abstract_inverted_index.assignment | 82 |
| abstract_inverted_index.benchmarks | 191 |
| abstract_inverted_index.divergence | 75 |
| abstract_inverted_index.efficiency | 70, 188 |
| abstract_inverted_index.imbalance. | 87 |
| abstract_inverted_index.intra-warp | 80, 85 |
| abstract_inverted_index.limitation | 151 |
| abstract_inverted_index.real-world | 175, 190 |
| abstract_inverted_index.application | 162 |
| abstract_inverted_index.confronting | 7 |
| abstract_inverted_index.divergence. | 13, 135, 154 |
| abstract_inverted_index.performance | 22 |
| abstract_inverted_index.utilization | 113 |
| abstract_inverted_index.applications | 28, 176 |
| abstract_inverted_index.architecture | 2 |
| abstract_inverted_index.double-edged | 5 |
| abstract_inverted_index.intermediate | 168 |
| abstract_inverted_index.parallelism; | 31 |
| abstract_inverted_index.Collaborative | 61 |
| abstract_inverted_index.applicability | 125 |
| abstract_inverted_index.optimizations | 139 |
| abstract_inverted_index.warp-specific | 98 |
| abstract_inverted_index.inefficiencies | 39 |
| abstract_inverted_index.irregularities | 37 |
| abstract_inverted_index.power-efficient | 24 |
| abstract_inverted_index.transformations | 122 |
| cited_by_percentile_year.max | 99 |
| cited_by_percentile_year.min | 93 |
| countries_distinct_count | 1 |
| institutions_distinct_count | 3 |
| citation_normalized_percentile.value | 0.97312717 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | True |