diff options
author | Thomas Voss <mail@thomasvoss.com> | 2024-01-21 03:03:58 +0100 |
---|---|---|
committer | Thomas Voss <mail@thomasvoss.com> | 2024-01-21 03:03:58 +0100 |
commit | 4f93f935dc7a981ca073a322425c3f5929ffb644 (patch) | |
tree | 4460586408ec7fdfcecf3ba4584f0435067125a6 /vendor/librune/man/u8len.3 | |
parent | 72ea25a4d73e3e026366d4165f5bc4ec9e7418cb (diff) |
Support line- & column-based match locations
Diffstat (limited to 'vendor/librune/man/u8len.3')
-rw-r--r-- | vendor/librune/man/u8len.3 | 70 |
1 files changed, 70 insertions, 0 deletions
diff --git a/vendor/librune/man/u8len.3 b/vendor/librune/man/u8len.3 new file mode 100644 index 0000000..4e58e14 --- /dev/null +++ b/vendor/librune/man/u8len.3 @@ -0,0 +1,70 @@ +.Dd January 15 2024 +.Dt U8LEN 3 +.Os +.Sh NAME +.Nm u8len +.Nd count Unicode codepoints +.Sh LIBRARY +.Lb librune +.Sh SYNOPSIS +.In utf8.h +.Ft size_t +.Fn u8len "const char8_t *s" "size_t n" +.Sh DESCRIPTION +The +.Fn u8len +function returns the number of UTF-8 encoded Unicode codepoints in the +buffer +.Fa s +of length +.Fa n +bytes. +.Pp +This function assumes that +.Fa s +contains only valid UTF-8. +.Sh RETURN VALUES +The +.Fn u8len +function returns the number of codepoints in the buffer +.Fa s . +.Sh EXAMPLES +The following call to +.Fn u8len +will return 17 while the call to +.Fn strlen +will return 22 as a result of use of multibyte-characters in +.Fa s . +.Bd -literal -offset indent +char8_t s[] = u8\(dq„Der Große Duden“\(dq; +size_t blen, cplen; + +blen = strlen((char *)s); +cplen = u8len(s, sizeof(s) - 1); +.Ed +.Sh SEE ALSO +.Xr u8glen 3 , +.Xr u8wdth 3 , +.Xr unicode 7 , +.Xr utf\-8 7 +.Sh STANDARDS +.Rs +.%A F. Yergeau +.%D November 2003 +.%R RFC 3629 +.%T UTF-8, a transformation format of ISO 10646 +.Re +.Sh AUTHORS +.An Thomas Voss Aq Mt mail@thomasvoss.com +.Sh CAVEATS +The return value of +.Fn u8len +does not necessarily represent the number of human-preceived characters +in the given buffer; +multiple codepoints may combine to form one human-preceived character +that spans a single column. +To count user-preceived codepoints +.Pq also known as graphemes , +you may want to use the +.Xr u8glen 3 +function. |