From 80547771dec846193f8ed963e93290baeae6fa5d Mon Sep 17 00:00:00 2001 From: Thomas Voss Date: Sat, 4 May 2024 01:06:05 +0200 Subject: Add tests for u8lower() and fix ‘ς’ bug MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- test/data/LowercaseTest | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) create mode 100644 test/data/LowercaseTest (limited to 'test/data') diff --git a/test/data/LowercaseTest b/test/data/LowercaseTest new file mode 100644 index 0000000..0deb9c4 --- /dev/null +++ b/test/data/LowercaseTest @@ -0,0 +1,35 @@ +# Empty input +;; + +# Latin alphabet +LOREM IPSUM DOLOR SIT AMET, CONSECTETUR ADIPISCING ELIT.;lorem ipsum dolor sit amet, consectetur adipiscing elit.; + +# Greek alphabet; handle sigma properly +Σ;ς; +ΤΟ ΓΡΆΜΜΑ ΣΊΓΜΑ ΈΧΕΙ ΔΎΟ ΠΕΖΟΎΣ ΤΎΠΟΥΣ;το γράμμα σίγμα έχει δύο πεζούς τύπους; + +# Cyrillic alphabet +СЛАВА УКРАЇНІ ПРОТИ РОСІЙСЬКОЇ АГРЕСІЇ!;слава україні проти російської агресії!; + +# In lithuanian we need to retain the dot above ‘i’ and ‘j’ when there’s an +# accent above the uppercased variant. Also test with both single-codepoint +# variants (i.e. U+00CC LATIN CAPITAL I WITH GRAVE) and variants that use +# combining-characters. +Į̃;į̃; +Į̃;į̇̃;LT +J́;j́; +J́;j̇́;LT +Į̃J́;į̃j́; +Į̃J́;į̇̃j̇́;LT +RÀSTI, MÈSTI, KÌLO;ràsti, mèsti, kìlo; +RÀSTI, MÈSTI, KÌLO;ràsti, mèsti, ki̇̀lo;LT +RÀSTI, MÈSTI, KÌLO;ràsti, mèsti, kìlo; +RÀSTI, MÈSTI, KÌLO;ràsti, mèsti, ki̇̀lo;LT + +# Add U+0307 COMBINING DOT ABOVE after ‘i’ when lowercasing ‘İ’ in non-Azeri and +# -Turkish locales +İSTANBUL’LUYUM;i̇stanbul’luyum; +İSTANBUL’LUYUM;istanbul’luyum;AZ + +# Uncased language +안녕하세요, 월드!;안녕하세요, 월드!; -- cgit v1.2.3