From 3db6b4427de43ec3ab54f6cec3e1a014780d6890 Mon Sep 17 00:00:00 2001
From: Thomas Voss <mail@thomasvoss.com>
Date: Sun, 25 Aug 2024 00:24:25 +0200
Subject: Add simd-unicode

---
 c/simd-unicode/README | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)
 create mode 100644 c/simd-unicode/README

(limited to 'c/simd-unicode/README')

diff --git a/c/simd-unicode/README b/c/simd-unicode/README
new file mode 100644
index 0000000..e072046
--- /dev/null
+++ b/c/simd-unicode/README
@@ -0,0 +1,18 @@
+This is an experiment in using AVX-512 instructions to efficiently lookup
+Unicode properties for runes.  It turns out after benchmarking that this
+is actually slower than a generic non-SIMD approach.  My hypothesis is
+that it’s slower due to the large latency of AVX-512 gather instructions
+which are required to index the Unicode lookup tables.
+
+UPDATE 1:  After replacing the gather instructions with loads/stores and manual
+for-loops to index data, the performance of the AVX-512 approach actually beats
+the generic approach by a small margin (just under 10% or so).
+
+UPDATE 2:  Due to the fact that the Unicode tables take up a very large
+amount of space, it’s ideal that you use the smallest datatypes possible.
+After changing both the stage₁ and stage₂ tables to be arrays of bytes as
+opposed to arrays of dwords, the AVX-512 implementation using gather
+instructions no longer works however the variation using manual loops
+with loads/stores does work, albeit slower.  On a 27 MiB file the generic
+implementation takes on average 82ms while the AVX-512 implementation
+takes on average 77ms.
-- 
cgit v1.2.3