diff options
Diffstat (limited to 'c/simd-unicode/README')
-rw-r--r-- | c/simd-unicode/README | 18 |
1 files changed, 18 insertions, 0 deletions
diff --git a/c/simd-unicode/README b/c/simd-unicode/README new file mode 100644 index 0000000..e072046 --- /dev/null +++ b/c/simd-unicode/README @@ -0,0 +1,18 @@ +This is an experiment in using AVX-512 instructions to efficiently lookup +Unicode properties for runes. It turns out after benchmarking that this +is actually slower than a generic non-SIMD approach. My hypothesis is +that it’s slower due to the large latency of AVX-512 gather instructions +which are required to index the Unicode lookup tables. + +UPDATE 1: After replacing the gather instructions with loads/stores and manual +for-loops to index data, the performance of the AVX-512 approach actually beats +the generic approach by a small margin (just under 10% or so). + +UPDATE 2: Due to the fact that the Unicode tables take up a very large +amount of space, it’s ideal that you use the smallest datatypes possible. +After changing both the stage₁ and stage₂ tables to be arrays of bytes as +opposed to arrays of dwords, the AVX-512 implementation using gather +instructions no longer works however the variation using manual loops +with loads/stores does work, albeit slower. On a 27 MiB file the generic +implementation takes on average 82ms while the AVX-512 implementation +takes on average 77ms. |