This is an experiment in using AVX-512 instructions to efficiently lookup
Unicode properties for runes.  It turns out after benchmarking that this
is actually slower than a generic non-SIMD approach.  My hypothesis is
that it’s slower due to the large latency of AVX-512 gather instructions
which are required to index the Unicode lookup tables.

UPDATE 1:  After replacing the gather instructions with loads/stores and manual
for-loops to index data, the performance of the AVX-512 approach actually beats
the generic approach by a small margin (just under 10% or so).

UPDATE 2:  Due to the fact that the Unicode tables take up a very large
amount of space, it’s ideal that you use the smallest datatypes possible.
After changing both the stage₁ and stage₂ tables to be arrays of bytes as
opposed to arrays of dwords, the AVX-512 implementation using gather
instructions no longer works however the variation using manual loops
with loads/stores does work, albeit slower.  On a 27 MiB file the generic
implementation takes on average 82ms while the AVX-512 implementation
takes on average 77ms.