diff options
author | Thomas Voss <mail@thomasvoss.com> | 2024-08-25 00:24:25 +0200 |
---|---|---|
committer | Thomas Voss <mail@thomasvoss.com> | 2024-08-25 00:24:25 +0200 |
commit | 3db6b4427de43ec3ab54f6cec3e1a014780d6890 (patch) | |
tree | f279489169b16f60066743fc61152029f1a081de /c/simd-unicode/README | |
parent | 696555b12c73974f27e68c1c2c022dc9e802a48d (diff) |
Add simd-unicode
Diffstat (limited to 'c/simd-unicode/README')
-rw-r--r-- | c/simd-unicode/README | 18 |
1 files changed, 18 insertions, 0 deletions
diff --git a/c/simd-unicode/README b/c/simd-unicode/README new file mode 100644 index 0000000..e072046 --- /dev/null +++ b/c/simd-unicode/README @@ -0,0 +1,18 @@ +This is an experiment in using AVX-512 instructions to efficiently lookup +Unicode properties for runes. It turns out after benchmarking that this +is actually slower than a generic non-SIMD approach. My hypothesis is +that it’s slower due to the large latency of AVX-512 gather instructions +which are required to index the Unicode lookup tables. + +UPDATE 1: After replacing the gather instructions with loads/stores and manual +for-loops to index data, the performance of the AVX-512 approach actually beats +the generic approach by a small margin (just under 10% or so). + +UPDATE 2: Due to the fact that the Unicode tables take up a very large +amount of space, it’s ideal that you use the smallest datatypes possible. +After changing both the stage₁ and stage₂ tables to be arrays of bytes as +opposed to arrays of dwords, the AVX-512 implementation using gather +instructions no longer works however the variation using manual loops +with loads/stores does work, albeit slower. On a 27 MiB file the generic +implementation takes on average 82ms while the AVX-512 implementation +takes on average 77ms. |