Daniel Lemire
YOU?
Author Swipe
View article: Faster Positional‐Population Counts for AVX2, AVX‐512, and ASIMD
Faster Positional‐Population Counts for AVX2, AVX‐512, and ASIMD Open
The positional population count operation pospopcnt counts for an array of ‐bit words how often each of the bits was set. Various applications in bioinformatics, database engineering, and digital processing exist. Building on earlier work …
View article: Scanning HTML at Tens of Gigabytes Per Second on ARM Processors
Scanning HTML at Tens of Gigabytes Per Second on ARM Processors Open
Background Modern processors feature Single Instruction, Multiple Data (SIMD) instructions capable of processing 16 bytes or more simultaneously, enabling significant performance enhancements in data‐intensive tasks. Two major Web browser …
View article: Faster Positional-Population Counts for AVX2, AVX-512, and ASIMD
Faster Positional-Population Counts for AVX2, AVX-512, and ASIMD Open
The positional population count operation pospopcnt() counts for an array of w-bit words how often each of the w bits was set. Various applications in bioinformatics, database engineering, and digital processing exist. Building on earlier …
View article: Parsing Millions of DNS Records Per Second
Parsing Millions of DNS Records Per Second Open
Objectives To enhance the throughput of DNS parsing by addressing the computational expense of processing large plain text DNS zone files. To specifically increase the speed of parsing DNS zone files compared to existing state‐of‐the‐art p…
View article: Parsing Millions of DNS Records per Second
Parsing Millions of DNS Records per Second Open
The Domain Name System (DNS) plays a critical role in the functioning of the Internet. It provides a hierarchical name space for locating resources. Data is typically stored in plain text files, possibly spanning gigabytes. Frequent parsin…
View article: Batched Ranged Random Integer Generation
Batched Ranged Random Integer Generation Open
Pseudorandom values are often generated as 64-bit binary words.These random words need to be converted into ranged values without statistical bias.We present an efficient algorithm to generate multiple independent uniformly-random bounded …
View article: Batched ranged random integer generation
Batched ranged random integer generation Open
Summary Pseudorandom values are often generated as 64‐bit binary words. These random words need to be converted into ranged values without statistical bias. We present an efficient algorithm to generate multiple independent uniformly‐rando…
View article: Will AI Flood Us with Irrelevant Papers?
Will AI Flood Us with Irrelevant Papers? Open
Raising questions about the future of citation metrics and the effectiveness of peer review in a world where authorship may not solely reside with humans.
View article: On‐demand JSON: A better way to parse documents?
On‐demand JSON: A better way to parse documents? Open
Summary JSON is a popular standard for data interchange on the Internet. Ingesting JSON documents can be a performance bottleneck. A popular parsing strategy consists in converting the input text into a tree‐based data structure—sometimes …
View article: On-Demand JSON: A Better Way to Parse Documents?
On-Demand JSON: A Better Way to Parse Documents? Open
JSON is a popular standard for data interchange on the Internet. Ingesting JSON documents can be a performance bottleneck. A popular parsing strategy consists in converting the input text into a tree-based data structure -- sometimes calle…
View article: Parsing millions of URLs per second
Parsing millions of URLs per second Open
URLs are fundamental elements of web applications. By applying vector algorithms, we built a fast standard‐compliant C++ implementation. Our parser uses three times fewer instructions than competing parsers following the WHATWG standard (e…
View article: Parsing Millions of URLs per Second
Parsing Millions of URLs per Second Open
URLs are fundamental elements of web applications. By applying vector algorithms, we built a fast standard-compliant C++ implementation. Our parser uses three times fewer instructions than competing parsers following the WHATWG standard (e…
View article: Transcoding unicode characters with AVX‐512 instructions
Transcoding unicode characters with AVX‐512 instructions Open
Intel includes in its recent processors a powerful set of instructions capable of processing 512‐bit registers with a single instruction (AVX‐512). Some of these instructions have no equivalent in earlier instruction sets. We leverage thes…
View article: Parsing Millions of URLs per Second
Parsing Millions of URLs per Second Open
URLs are fundamental elements of web applications. By applying vector algorithms, we built a fast standard-compliant C++ implementation. Our parser uses three times fewer instructions than competing parsers following WHATWG URL standard (e…
View article: Exact Short Products From Truncated Multipliers
Exact Short Products From Truncated Multipliers Open
We sometimes need to compute the most significant digits of the product of small integers with a multiplier requiring much storage: e.g., a large integer (e.g., $5^{100}$) or an irrational number ($π$). We only need to access the most sign…
View article: Fast number parsing without fallback
Fast number parsing without fallback Open
Summary In recent work, Lemire (2021) presented a fast algorithm to convert number strings into binary floating‐point numbers. The algorithm has been adopted by several important systems: for example, it is part of the runtime libraries of…
View article: On-Demand JSON: A Better Way to Parse Documents?
On-Demand JSON: A Better Way to Parse Documents? Open
JSON is a popular standard for data interchange on the Internet. Ingesting JSON documents can be a performance bottleneck. A popular parsing strategy consists in converting the input text into a tree-based data structure—sometimes called a…
View article: Fast Number Parsing Without Fallback
Fast Number Parsing Without Fallback Open
In recent work, Lemire (2021) presented a fast algorithm to convert number strings into binary floating-point numbers. The algorithm has been adopted by several important systems: e.g., it is part of the runtime libraries of GCC 12, Rust 1…
View article: Transcoding Unicode Characters with AVX-512 Instructions
Transcoding Unicode Characters with AVX-512 Instructions Open
Intel includes on its recent processors a powerful set of instructions capable of processing 512-bit registers with a single instruction (AVX-512). Some of these instructions have no equivalent in earlier instruction sets. We leverage thes…
View article: Transcoding Unicode Characters with AVX-512 Instructions
Transcoding Unicode Characters with AVX-512 Instructions Open
Intel includes in its recent processors a powerful set of instructions capable of processing 512-bit registers with a single instruction (AVX-512). Some of these instructions have no equivalent in earlier instruction sets. We leverage thes…
View article: 3D Geophysical Predictive Modeling by Spectral Feature Subset Selection in Mineral Exploration
3D Geophysical Predictive Modeling by Spectral Feature Subset Selection in Mineral Exploration Open
Several technical challenges are related to data collection, inverse modeling, model fusion, and integrated interpretations in the exploration of geophysics. A fundamental problem in integrated geophysical interpretation is the proper geol…
View article: Binary Fuse Filters: Fast and Smaller Than Xor Filters
Binary Fuse Filters: Fast and Smaller Than Xor Filters Open
Bloom and cuckoo filters provide fast approximate set membership while using little memory. Engineers use them to avoid expensive disk and network accesses. The recently introduced xor filters can be faster and smaller than Bloom and cucko…
View article: Transcoding billions of Unicode characters per second with SIMD instructions
Transcoding billions of Unicode characters per second with SIMD instructions Open
In software, text is often represented using Unicode formats (UTF‐8 and UTF‐16). We frequently have to convert text from one format to the other, a process called transcoding. Popular transcoding functions are slower than state‐of‐the‐art …
View article: Transcoding Billions of Unicode Characters per Second with SIMD Instructions
Transcoding Billions of Unicode Characters per Second with SIMD Instructions Open
In software, text is often represented using Unicode formats (UTF-8 and UTF-16). We frequently have to convert text from one format to the other, a process called transcoding. Popular transcoding functions are slower than state-of-the-art …