From 3db6b4427de43ec3ab54f6cec3e1a014780d6890 Mon Sep 17 00:00:00 2001 From: Thomas Voss Date: Sun, 25 Aug 2024 00:24:25 +0200 Subject: Add simd-unicode --- c/simd-unicode/README | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) create mode 100644 c/simd-unicode/README (limited to 'c/simd-unicode/README') diff --git a/c/simd-unicode/README b/c/simd-unicode/README new file mode 100644 index 0000000..e072046 --- /dev/null +++ b/c/simd-unicode/README @@ -0,0 +1,18 @@ +This is an experiment in using AVX-512 instructions to efficiently lookup +Unicode properties for runes. It turns out after benchmarking that this +is actually slower than a generic non-SIMD approach. My hypothesis is +that it’s slower due to the large latency of AVX-512 gather instructions +which are required to index the Unicode lookup tables. + +UPDATE 1: After replacing the gather instructions with loads/stores and manual +for-loops to index data, the performance of the AVX-512 approach actually beats +the generic approach by a small margin (just under 10% or so). + +UPDATE 2: Due to the fact that the Unicode tables take up a very large +amount of space, it’s ideal that you use the smallest datatypes possible. +After changing both the stage₁ and stage₂ tables to be arrays of bytes as +opposed to arrays of dwords, the AVX-512 implementation using gather +instructions no longer works however the variation using manual loops +with loads/stores does work, albeit slower. On a 27 MiB file the generic +implementation takes on average 82ms while the AVX-512 implementation +takes on average 77ms. -- cgit v1.2.3