Add simd-unicode

author: Thomas Voss <mail@thomasvoss.com> 2024-08-25 00:24:25 +0200
committer: Thomas Voss <mail@thomasvoss.com> 2024-08-25 00:24:25 +0200
commit: 3db6b4427de43ec3ab54f6cec3e1a014780d6890 (patch)
tree: f279489169b16f60066743fc61152029f1a081de /c/simd-unicode/README
parent: 696555b12c73974f27e68c1c2c022dc9e802a48d (diff)
1 files changed, 18 insertions, 0 deletions
diff --git a/c/simd-unicode/README b/c/simd-unicode/README
new file mode 100644
index 0000000..e072046
--- /dev/null
+++ b/c/simd-unicode/README
@@ -0,0 +1,18 @@
+This is an experiment in using AVX-512 instructions to efficiently lookup
+Unicode properties for runes.  It turns out after benchmarking that this
+is actually slower than a generic non-SIMD approach.  My hypothesis is
+that it’s slower due to the large latency of AVX-512 gather instructions
+which are required to index the Unicode lookup tables.
+
+UPDATE 1:  After replacing the gather instructions with loads/stores and manual
+for-loops to index data, the performance of the AVX-512 approach actually beats
+the generic approach by a small margin (just under 10% or so).
+
+UPDATE 2:  Due to the fact that the Unicode tables take up a very large
+amount of space, it’s ideal that you use the smallest datatypes possible.
+After changing both the stage₁ and stage₂ tables to be arrays of bytes as
+opposed to arrays of dwords, the AVX-512 implementation using gather
+instructions no longer works however the variation using manual loops
+with loads/stores does work, albeit slower.  On a 27 MiB file the generic
+implementation takes on average 82ms while the AVX-512 implementation
+takes on average 77ms.
author	Thomas Voss <mail@thomasvoss.com>	2024-08-25 00:24:25 +0200
committer	Thomas Voss <mail@thomasvoss.com>	2024-08-25 00:24:25 +0200
commit	3db6b4427de43ec3ab54f6cec3e1a014780d6890 (patch)
tree	f279489169b16f60066743fc61152029f1a081de /c/simd-unicode/README
parent	696555b12c73974f27e68c1c2c022dc9e802a48d (diff)