aboutsummaryrefslogtreecommitdiff
path: root/c/simd-unicode/README
diff options
context:
space:
mode:
Diffstat (limited to 'c/simd-unicode/README')
-rw-r--r--c/simd-unicode/README18
1 files changed, 18 insertions, 0 deletions
diff --git a/c/simd-unicode/README b/c/simd-unicode/README
new file mode 100644
index 0000000..e072046
--- /dev/null
+++ b/c/simd-unicode/README
@@ -0,0 +1,18 @@
+This is an experiment in using AVX-512 instructions to efficiently lookup
+Unicode properties for runes. It turns out after benchmarking that this
+is actually slower than a generic non-SIMD approach. My hypothesis is
+that it’s slower due to the large latency of AVX-512 gather instructions
+which are required to index the Unicode lookup tables.
+
+UPDATE 1: After replacing the gather instructions with loads/stores and manual
+for-loops to index data, the performance of the AVX-512 approach actually beats
+the generic approach by a small margin (just under 10% or so).
+
+UPDATE 2: Due to the fact that the Unicode tables take up a very large
+amount of space, it’s ideal that you use the smallest datatypes possible.
+After changing both the stage₁ and stage₂ tables to be arrays of bytes as
+opposed to arrays of dwords, the AVX-512 implementation using gather
+instructions no longer works however the variation using manual loops
+with loads/stores does work, albeit slower. On a 27 MiB file the generic
+implementation takes on average 82ms while the AVX-512 implementation
+takes on average 77ms.