Data layout oriented compilation techniques in vectorization for multi-/many-cores Article Swipe
Single instruction, multiple data (SIMD) architectures are widely adopted in both general-purpose processors and graphic processing units for exploiting data-level parallelism. It is tedious and error-prone for programmers to write high performance code to utilize SIMD execution units on both platforms. Therefore, users often rely on automatic code generation techniques in compilers. However, it is not trivial for compilers to generate high performance code without considering the data layout of the data used in the computation. Data layout determines data access patterns, and in turn have a great impact on the memory performance of the automatically generated code for both CPUs and GPUs.\nIn this thesis, we demonstrate several data layout oriented compilation techniques for efficient vectorization. We put forward semi-automatic data layout transformation to help users to easily change their program, and exploit the best possible data layout in terms of vectorization. Our proposed vectorization based on hyper loop parallelism provides a way to take advantage the relationship between data layout and computation structure. The experimental results demonstrated that this vectorization technique can yield significant performance gain. In addition, we show that this technique is of great use to boost the memory performance on CUDA GPUs.\nWe also present pioneering work that uses loop vectorization techniques to handle nested thread-level parallelism (TLP) on CUDA GPUs. As loop vectorization prioritizes vectorizing loops with contiguous data accesses, it is of great help to achieve an efficient mapping strategy for nested TLP on CUDA GPUs.\nOur new bitslice vector computing for customizable arithmetic precision on general-purpose processors with SIMD extensions not only breaks the limit of hardware arithmetic precision but also achieves great performance. It also shows the great power of logic optimization widely used in hardware synthesis in optimizing C/C++ code with a large amount of logic operations.
Related Topics
- Type
- dissertation
- Language
- en
- Landing Page
- http://hdl.handle.net/2262/81727
- http://hdl.handle.net/2262/81727
- OA Status
- gold
- References
- 84
- Related Works
- 20
- OpenAlex ID
- https://openalex.org/W2753037357
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W2753037357Canonical identifier for this work in OpenAlex
- Title
-
Data layout oriented compilation techniques in vectorization for multi-/many-coresWork title
- Type
-
dissertationOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2017Year of publication
- Publication date
-
2017-01-01Full publication date if available
- Authors
-
Shixiong XuList of authors in order
- Landing page
-
https://hdl.handle.net/2262/81727Publisher landing page
- PDF URL
-
https://hdl.handle.net/2262/81727Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
goldOpen access status per OpenAlex
- OA URL
-
https://hdl.handle.net/2262/81727Direct OA link when available
- Concepts
-
Vectorization (mathematics), Computer science, Parallel computing, Programming languageTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- References (count)
-
84Number of works referenced by this work
- Related works (count)
-
20Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W2753037357 |
|---|---|
| doi | |
| ids.mag | 2753037357 |
| ids.openalex | https://openalex.org/W2753037357 |
| fwci | |
| type | dissertation |
| title | Data layout oriented compilation techniques in vectorization for multi-/many-cores |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10054 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 1.0 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1708 |
| topics[0].subfield.display_name | Hardware and Architecture |
| topics[0].display_name | Parallel Computing and Optimization Techniques |
| topics[1].id | https://openalex.org/T11181 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9991999864578247 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1705 |
| topics[1].subfield.display_name | Computer Networks and Communications |
| topics[1].display_name | Advanced Data Storage Technologies |
| topics[2].id | https://openalex.org/T10829 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9987999796867371 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1705 |
| topics[2].subfield.display_name | Computer Networks and Communications |
| topics[2].display_name | Interconnection Networks and Systems |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41681595 |
| concepts[0].level | 2 |
| concepts[0].score | 0.7753552198410034 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q7917855 |
| concepts[0].display_name | Vectorization (mathematics) |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.6194130182266235 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C173608175 |
| concepts[2].level | 1 |
| concepts[2].score | 0.41864538192749023 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q232661 |
| concepts[2].display_name | Parallel computing |
| concepts[3].id | https://openalex.org/C199360897 |
| concepts[3].level | 1 |
| concepts[3].score | 0.3700724244117737 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q9143 |
| concepts[3].display_name | Programming language |
| keywords[0].id | https://openalex.org/keywords/vectorization |
| keywords[0].score | 0.7753552198410034 |
| keywords[0].display_name | Vectorization (mathematics) |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.6194130182266235 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/parallel-computing |
| keywords[2].score | 0.41864538192749023 |
| keywords[2].display_name | Parallel computing |
| keywords[3].id | https://openalex.org/keywords/programming-language |
| keywords[3].score | 0.3700724244117737 |
| keywords[3].display_name | Programming language |
| language | en |
| locations[0].id | pmh:http://www.rian.ie/140880/ |
| locations[0].is_oa | True |
| locations[0].source | |
| locations[0].license | cc-by |
| locations[0].pdf_url | http://hdl.handle.net/2262/81727 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | Doctoral thesis |
| locations[0].license_id | https://openalex.org/licenses/cc-by |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | XU, SHIXIONG, Data layout oriented compilation techniques in vectorization for multi-/many-cores, Trinity College Dublin.School of Computer Science & Statistics.COMPUTER SYSTEMS, 2017 |
| locations[0].landing_page_url | http://hdl.handle.net/2262/81727 |
| locations[1].id | mag:2753037357 |
| locations[1].is_oa | False |
| locations[1].source | |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://hgpu.org/?p=17538 |
| authorships[0].author.id | https://openalex.org/A5103446380 |
| authorships[0].author.orcid | https://orcid.org/0009-0007-1818-7773 |
| authorships[0].author.display_name | Shixiong Xu |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Shixiong Xu |
| authorships[0].is_corresponding | True |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | http://hdl.handle.net/2262/81727 |
| open_access.oa_status | gold |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Data layout oriented compilation techniques in vectorization for multi-/many-cores |
| has_fulltext | True |
| is_retracted | False |
| updated_date | 2025-11-06T04:12:42.849631 |
| primary_topic.id | https://openalex.org/T10054 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 1.0 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1708 |
| primary_topic.subfield.display_name | Hardware and Architecture |
| primary_topic.display_name | Parallel Computing and Optimization Techniques |
| related_works | https://openalex.org/W2611071575, https://openalex.org/W2760795167, https://openalex.org/W2616361539, https://openalex.org/W2261770525, https://openalex.org/W2601822775, https://openalex.org/W2728976187, https://openalex.org/W2403211121, https://openalex.org/W2902462985, https://openalex.org/W2269565628, https://openalex.org/W2623213245, https://openalex.org/W2612512256, https://openalex.org/W2618701744, https://openalex.org/W2599336268, https://openalex.org/W2116285913, https://openalex.org/W2624209381, https://openalex.org/W2953557002, https://openalex.org/W3114833843, https://openalex.org/W2257965963, https://openalex.org/W2256539127, https://openalex.org/W2644132259 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:http://www.rian.ie/140880/ |
| best_oa_location.is_oa | True |
| best_oa_location.source | |
| best_oa_location.license | cc-by |
| best_oa_location.pdf_url | http://hdl.handle.net/2262/81727 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | Doctoral thesis |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | XU, SHIXIONG, Data layout oriented compilation techniques in vectorization for multi-/many-cores, Trinity College Dublin.School of Computer Science & Statistics.COMPUTER SYSTEMS, 2017 |
| best_oa_location.landing_page_url | http://hdl.handle.net/2262/81727 |
| primary_location.id | pmh:http://www.rian.ie/140880/ |
| primary_location.is_oa | True |
| primary_location.source | |
| primary_location.license | cc-by |
| primary_location.pdf_url | http://hdl.handle.net/2262/81727 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | Doctoral thesis |
| primary_location.license_id | https://openalex.org/licenses/cc-by |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | XU, SHIXIONG, Data layout oriented compilation techniques in vectorization for multi-/many-cores, Trinity College Dublin.School of Computer Science & Statistics.COMPUTER SYSTEMS, 2017 |
| primary_location.landing_page_url | http://hdl.handle.net/2262/81727 |
| publication_date | 2017-01-01 |
| publication_year | 2017 |
| referenced_works | https://openalex.org/W2159211021, https://openalex.org/W2094969361, https://openalex.org/W2171399035, https://openalex.org/W2077143534, https://openalex.org/W2073748734, https://openalex.org/W2109473404, https://openalex.org/W2121176848, https://openalex.org/W2123440268, https://openalex.org/W2010030599, https://openalex.org/W2158950986, https://openalex.org/W579519726, https://openalex.org/W2100465945, https://openalex.org/W2016158659, https://openalex.org/W2036055954, https://openalex.org/W2119299853, https://openalex.org/W2099404643, https://openalex.org/W2292182532, https://openalex.org/W2096331908, https://openalex.org/W2183609308, https://openalex.org/W2158365276, https://openalex.org/W2141683244, https://openalex.org/W2112121929, https://openalex.org/W2134705409, https://openalex.org/W1991009705, https://openalex.org/W2050165578, https://openalex.org/W2024016913, https://openalex.org/W2143708379, https://openalex.org/W2395045096, https://openalex.org/W2061313045, https://openalex.org/W2071110673, https://openalex.org/W2149573529, https://openalex.org/W2022711417, https://openalex.org/W2147193503, https://openalex.org/W1975594295, https://openalex.org/W2137958758, https://openalex.org/W2019143817, https://openalex.org/W2017113046, https://openalex.org/W2100579514, https://openalex.org/W2147654959, https://openalex.org/W2104378884, https://openalex.org/W2049890071, https://openalex.org/W1548516269, https://openalex.org/W2055312318, https://openalex.org/W2282651986, https://openalex.org/W1547830536, https://openalex.org/W2111394443, https://openalex.org/W2080592089, https://openalex.org/W2061055971, https://openalex.org/W308720167, https://openalex.org/W2016352575, https://openalex.org/W1969280207, https://openalex.org/W2035561773, https://openalex.org/W2168093120, https://openalex.org/W2124484927, https://openalex.org/W1511688816, https://openalex.org/W2037929850, https://openalex.org/W2008117760, https://openalex.org/W2916844179, https://openalex.org/W2523389475, https://openalex.org/W2139935536, https://openalex.org/W2085612442, https://openalex.org/W1965082421, https://openalex.org/W2090268225, https://openalex.org/W2126026097, https://openalex.org/W2076485607, https://openalex.org/W2138767187, https://openalex.org/W2118031182, https://openalex.org/W605824955, https://openalex.org/W2286348849, https://openalex.org/W2157273783, https://openalex.org/W1492601037, https://openalex.org/W2098688018, https://openalex.org/W2128125098, https://openalex.org/W1583453248, https://openalex.org/W2170634604, https://openalex.org/W2013156670, https://openalex.org/W2160875256, https://openalex.org/W2030898836, https://openalex.org/W2167639788, https://openalex.org/W1494930385, https://openalex.org/W2011393414, https://openalex.org/W2114067856, https://openalex.org/W2042587486, https://openalex.org/W2119241866 |
| referenced_works_count | 84 |
| abstract_inverted_index.a | 86, 151, 288 |
| abstract_inverted_index.As | 214 |
| abstract_inverted_index.In | 177 |
| abstract_inverted_index.It | 21, 269 |
| abstract_inverted_index.We | 116 |
| abstract_inverted_index.an | 231 |
| abstract_inverted_index.in | 9, 50, 73, 83, 138, 280, 283 |
| abstract_inverted_index.is | 22, 54, 184, 225 |
| abstract_inverted_index.it | 53, 224 |
| abstract_inverted_index.of | 69, 93, 140, 185, 226, 260, 275, 291 |
| abstract_inverted_index.on | 38, 45, 89, 146, 193, 211, 238, 249 |
| abstract_inverted_index.to | 28, 33, 59, 123, 126, 153, 188, 205, 229 |
| abstract_inverted_index.we | 105, 179 |
| abstract_inverted_index.Our | 142 |
| abstract_inverted_index.TLP | 237 |
| abstract_inverted_index.The | 164 |
| abstract_inverted_index.and | 13, 24, 82, 101, 131, 161 |
| abstract_inverted_index.are | 6 |
| abstract_inverted_index.but | 264 |
| abstract_inverted_index.can | 172 |
| abstract_inverted_index.for | 17, 26, 57, 98, 113, 235, 245 |
| abstract_inverted_index.new | 241 |
| abstract_inverted_index.not | 55, 255 |
| abstract_inverted_index.put | 117 |
| abstract_inverted_index.the | 66, 70, 74, 90, 94, 133, 156, 190, 258, 272 |
| abstract_inverted_index.use | 187 |
| abstract_inverted_index.way | 152 |
| abstract_inverted_index.CPUs | 100 |
| abstract_inverted_index.CUDA | 194, 212, 239 |
| abstract_inverted_index.Data | 76 |
| abstract_inverted_index.SIMD | 35, 253 |
| abstract_inverted_index.also | 196, 265, 270 |
| abstract_inverted_index.best | 134 |
| abstract_inverted_index.both | 10, 39, 99 |
| abstract_inverted_index.code | 32, 47, 63, 97, 286 |
| abstract_inverted_index.data | 3, 67, 71, 79, 108, 120, 136, 159, 222 |
| abstract_inverted_index.have | 85 |
| abstract_inverted_index.help | 124, 228 |
| abstract_inverted_index.high | 30, 61 |
| abstract_inverted_index.loop | 148, 202, 215 |
| abstract_inverted_index.only | 256 |
| abstract_inverted_index.rely | 44 |
| abstract_inverted_index.show | 180 |
| abstract_inverted_index.take | 154 |
| abstract_inverted_index.that | 168, 181, 200 |
| abstract_inverted_index.this | 103, 169, 182 |
| abstract_inverted_index.turn | 84 |
| abstract_inverted_index.used | 72, 279 |
| abstract_inverted_index.uses | 201 |
| abstract_inverted_index.with | 220, 252, 287 |
| abstract_inverted_index.work | 199 |
| abstract_inverted_index.(TLP) | 210 |
| abstract_inverted_index.C/C++ | 285 |
| abstract_inverted_index.GPUs. | 213 |
| abstract_inverted_index.based | 145 |
| abstract_inverted_index.boost | 189 |
| abstract_inverted_index.gain. | 176 |
| abstract_inverted_index.great | 87, 186, 227, 267, 273 |
| abstract_inverted_index.hyper | 147 |
| abstract_inverted_index.large | 289 |
| abstract_inverted_index.limit | 259 |
| abstract_inverted_index.logic | 276, 292 |
| abstract_inverted_index.loops | 219 |
| abstract_inverted_index.often | 43 |
| abstract_inverted_index.power | 274 |
| abstract_inverted_index.shows | 271 |
| abstract_inverted_index.terms | 139 |
| abstract_inverted_index.their | 129 |
| abstract_inverted_index.units | 16, 37 |
| abstract_inverted_index.users | 42, 125 |
| abstract_inverted_index.write | 29 |
| abstract_inverted_index.yield | 173 |
| abstract_inverted_index.(SIMD) | 4 |
| abstract_inverted_index.Single | 0 |
| abstract_inverted_index.access | 80 |
| abstract_inverted_index.amount | 290 |
| abstract_inverted_index.breaks | 257 |
| abstract_inverted_index.change | 128 |
| abstract_inverted_index.easily | 127 |
| abstract_inverted_index.handle | 206 |
| abstract_inverted_index.impact | 88 |
| abstract_inverted_index.layout | 68, 77, 109, 121, 137, 160 |
| abstract_inverted_index.memory | 91, 191 |
| abstract_inverted_index.nested | 207, 236 |
| abstract_inverted_index.vector | 243 |
| abstract_inverted_index.widely | 7, 278 |
| abstract_inverted_index.achieve | 230 |
| abstract_inverted_index.adopted | 8 |
| abstract_inverted_index.between | 158 |
| abstract_inverted_index.exploit | 132 |
| abstract_inverted_index.forward | 118 |
| abstract_inverted_index.graphic | 14 |
| abstract_inverted_index.mapping | 233 |
| abstract_inverted_index.present | 197 |
| abstract_inverted_index.results | 166 |
| abstract_inverted_index.several | 107 |
| abstract_inverted_index.tedious | 23 |
| abstract_inverted_index.thesis, | 104 |
| abstract_inverted_index.trivial | 56 |
| abstract_inverted_index.utilize | 34 |
| abstract_inverted_index.without | 64 |
| abstract_inverted_index.However, | 52 |
| abstract_inverted_index.achieves | 266 |
| abstract_inverted_index.bitslice | 242 |
| abstract_inverted_index.generate | 60 |
| abstract_inverted_index.hardware | 261, 281 |
| abstract_inverted_index.multiple | 2 |
| abstract_inverted_index.oriented | 110 |
| abstract_inverted_index.possible | 135 |
| abstract_inverted_index.program, | 130 |
| abstract_inverted_index.proposed | 143 |
| abstract_inverted_index.provides | 150 |
| abstract_inverted_index.strategy | 234 |
| abstract_inverted_index.GPUs.\nIn | 102 |
| abstract_inverted_index.GPUs.\nWe | 195 |
| abstract_inverted_index.accesses, | 223 |
| abstract_inverted_index.addition, | 178 |
| abstract_inverted_index.advantage | 155 |
| abstract_inverted_index.automatic | 46 |
| abstract_inverted_index.compilers | 58 |
| abstract_inverted_index.computing | 244 |
| abstract_inverted_index.efficient | 114, 232 |
| abstract_inverted_index.execution | 36 |
| abstract_inverted_index.generated | 96 |
| abstract_inverted_index.patterns, | 81 |
| abstract_inverted_index.precision | 248, 263 |
| abstract_inverted_index.synthesis | 282 |
| abstract_inverted_index.technique | 171, 183 |
| abstract_inverted_index.GPUs.\nOur | 240 |
| abstract_inverted_index.Therefore, | 41 |
| abstract_inverted_index.arithmetic | 247, 262 |
| abstract_inverted_index.compilers. | 51 |
| abstract_inverted_index.contiguous | 221 |
| abstract_inverted_index.data-level | 19 |
| abstract_inverted_index.determines | 78 |
| abstract_inverted_index.exploiting | 18 |
| abstract_inverted_index.extensions | 254 |
| abstract_inverted_index.generation | 48 |
| abstract_inverted_index.optimizing | 284 |
| abstract_inverted_index.pioneering | 198 |
| abstract_inverted_index.platforms. | 40 |
| abstract_inverted_index.processing | 15 |
| abstract_inverted_index.processors | 12, 251 |
| abstract_inverted_index.structure. | 163 |
| abstract_inverted_index.techniques | 49, 112, 204 |
| abstract_inverted_index.compilation | 111 |
| abstract_inverted_index.computation | 162 |
| abstract_inverted_index.considering | 65 |
| abstract_inverted_index.demonstrate | 106 |
| abstract_inverted_index.error-prone | 25 |
| abstract_inverted_index.operations. | 293 |
| abstract_inverted_index.parallelism | 149, 209 |
| abstract_inverted_index.performance | 31, 62, 92, 175, 192 |
| abstract_inverted_index.prioritizes | 217 |
| abstract_inverted_index.programmers | 27 |
| abstract_inverted_index.significant | 174 |
| abstract_inverted_index.vectorizing | 218 |
| abstract_inverted_index.computation. | 75 |
| abstract_inverted_index.customizable | 246 |
| abstract_inverted_index.demonstrated | 167 |
| abstract_inverted_index.experimental | 165 |
| abstract_inverted_index.instruction, | 1 |
| abstract_inverted_index.optimization | 277 |
| abstract_inverted_index.parallelism. | 20 |
| abstract_inverted_index.performance. | 268 |
| abstract_inverted_index.relationship | 157 |
| abstract_inverted_index.thread-level | 208 |
| abstract_inverted_index.architectures | 5 |
| abstract_inverted_index.automatically | 95 |
| abstract_inverted_index.vectorization | 144, 170, 203, 216 |
| abstract_inverted_index.semi-automatic | 119 |
| abstract_inverted_index.transformation | 122 |
| abstract_inverted_index.vectorization. | 115, 141 |
| abstract_inverted_index.general-purpose | 11, 250 |
| cited_by_percentile_year | |
| corresponding_author_ids | https://openalex.org/A5103446380 |
| countries_distinct_count | 0 |
| institutions_distinct_count | 1 |
| citation_normalized_percentile |