diff options
author | Thomas Voss <mail@thomasvoss.com> | 2024-05-04 01:06:05 +0200 |
---|---|---|
committer | Thomas Voss <mail@thomasvoss.com> | 2024-05-04 01:06:05 +0200 |
commit | 80547771dec846193f8ed963e93290baeae6fa5d (patch) | |
tree | 7e55fa818acb5c97af2cd8671d2a8b09b73a6492 /test/data/LowercaseTest | |
parent | a81d691fe639b2db877c915fbb60ce326307aab9 (diff) |
Add tests for u8lower() and fix ‘ς’ bug
Diffstat (limited to 'test/data/LowercaseTest')
-rw-r--r-- | test/data/LowercaseTest | 35 |
1 files changed, 35 insertions, 0 deletions
diff --git a/test/data/LowercaseTest b/test/data/LowercaseTest new file mode 100644 index 0000000..0deb9c4 --- /dev/null +++ b/test/data/LowercaseTest @@ -0,0 +1,35 @@ +# Empty input +;; + +# Latin alphabet +LOREM IPSUM DOLOR SIT AMET, CONSECTETUR ADIPISCING ELIT.;lorem ipsum dolor sit amet, consectetur adipiscing elit.; + +# Greek alphabet; handle sigma properly +Σ;ς; +ΤΟ ΓΡΆΜΜΑ ΣΊΓΜΑ ΈΧΕΙ ΔΎΟ ΠΕΖΟΎΣ ΤΎΠΟΥΣ;το γράμμα σίγμα έχει δύο πεζούς τύπους; + +# Cyrillic alphabet +СЛАВА УКРАЇНІ ПРОТИ РОСІЙСЬКОЇ АГРЕСІЇ!;слава україні проти російської агресії!; + +# In lithuanian we need to retain the dot above ‘i’ and ‘j’ when there’s an +# accent above the uppercased variant. Also test with both single-codepoint +# variants (i.e. U+00CC LATIN CAPITAL I WITH GRAVE) and variants that use +# combining-characters. +Į̃;į̃; +Į̃;į̇̃;LT +J́;j́; +J́;j̇́;LT +Į̃J́;į̃j́; +Į̃J́;į̇̃j̇́;LT +RÀSTI, MÈSTI, KÌLO;ràsti, mèsti, kìlo; +RÀSTI, MÈSTI, KÌLO;ràsti, mèsti, ki̇̀lo;LT +RÀSTI, MÈSTI, KÌLO;ràsti, mèsti, kìlo; +RÀSTI, MÈSTI, KÌLO;ràsti, mèsti, ki̇̀lo;LT + +# Add U+0307 COMBINING DOT ABOVE after ‘i’ when lowercasing ‘İ’ in non-Azeri and +# -Turkish locales +İSTANBUL’LUYUM;i̇stanbul’luyum; +İSTANBUL’LUYUM;istanbul’luyum;AZ + +# Uncased language +안녕하세요, 월드!;안녕하세요, 월드!; |