From ba56b5fa8344847077b49268d2ab215f3e73d10e Mon Sep 17 00:00:00 2001
From: Thomas Voss <mail@thomasvoss.com>
Date: Sat, 4 May 2024 01:50:09 +0200
Subject: Add tests for u8title() and fix ‘ς’ bug
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 test/data/TitlecaseTest | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)
 create mode 100644 test/data/TitlecaseTest

(limited to 'test/data')

diff --git a/test/data/TitlecaseTest b/test/data/TitlecaseTest
new file mode 100644
index 0000000..24256a5
--- /dev/null
+++ b/test/data/TitlecaseTest
@@ -0,0 +1,48 @@
+# Empty input
+;;
+
+# Short input
+a;A;
+
+# If CF_ẞ gets passed for whatever reason… don’t turn into ẞ
+ß;Ss;ẞ
+
+# Latin alphabet
+LOREM IPSUM DOLOR SIT AMET, CONSECTETUR ADIPISCING ELIT.;Lorem Ipsum Dolor Sit Amet, Consectetur Adipiscing Elit.;
+lorem ipsum dolor sit amet, consectetur adipiscing elit.;Lorem Ipsum Dolor Sit Amet, Consectetur Adipiscing Elit.;
+
+# Random punctuation and numbers
+COMPLEX-LANGUAGE AND -SCRIPT;Complex-Language And -Script;
+complex-language and -script;Complex-Language And -Script;
+
+# Greek alphabet; handle sigma properly
+ΤΟ ΓΡΆΜΜΑ ΣΊΓΜΑ ΈΧΕΙ ΔΎΟ ΠΕΖΟΎΣ ΤΎΠΟΥΣ;Το Γράμμα Σίγμα Έχει Δύο Πεζούς Τύπους;
+το γράμμα σίγμα έχει δύο πεζούς τύπους;Το Γράμμα Σίγμα Έχει Δύο Πεζούς Τύπους;
+
+# Cyrillic alphabet
+СЛАВА УКРАЇНІ ПРОТИ РОСІЙСЬКОЇ АГРЕСІЇ!;Слава Україні Проти Російської Агресії!;
+слава україні проти російської агресії!;Слава Україні Проти Російської Агресії!;
+
+# In lithuanian we need to retain the dot above ‘i’ and ‘j’ when there’s an
+# accent above the uppercased variant.  Also test with both single-codepoint
+# variants (i.e. U+00CC LATIN CAPITAL I WITH GRAVE) and variants that use
+# combining-characters.
+i̇̀;İ̀;
+i̇̀;Ì;LT
+RÀSTI, MÈSTI, KÌLO;Ràsti, Mèsti, Kìlo;
+RÀSTI, MÈSTI, KÌLO;Ràsti, Mèsti, Ki̇̀lo;LT
+
+# Croatian has 3 cases
+LJUDEVIT GAJ;Ljudevit Gaj;
+ljudevit gaj;Ljudevit Gaj;
+
+# Dutch IJ needs special handling
+ijsberg en onderzeeër in de ijssel;Ijsberg En Onderzeeër In De Ijssel;
+ijsberg en onderzeeër in de ijssel;IJsberg En Onderzeeër In De IJssel;NL
+
+# Uppercase ‘i’ to ‘İ’ in Azeri/Turkish
+istanbul’luyum;Istanbul’luyum;
+istanbul’luyum;İstanbul’luyum;AZ
+
+# Uncased language
+안녕하세요, 월드!;안녕하세요, 월드!;
-- 
cgit v1.2.3