diff options
Diffstat (limited to 'man/u8len.3')
-rw-r--r-- | man/u8len.3 | 66 |
1 files changed, 66 insertions, 0 deletions
diff --git a/man/u8len.3 b/man/u8len.3 new file mode 100644 index 0000000..a2968e7 --- /dev/null +++ b/man/u8len.3 @@ -0,0 +1,66 @@ +.Dd March 10 2024 +.Dt U8LEN 3 +.Os +.Sh NAME +.Nm u8len +.Nd count Unicode codepoints +.Sh LIBRARY +.Lb mlib +.Sh SYNOPSIS +.In mbstring.h +.Ft size_t +.Fn u8len "const char8_t *s" "size_t n" +.Sh DESCRIPTION +The +.Fn u8len +function returns the number of UTF-8 encoded Unicode codepoints in the +buffer +.Fa s +of length +.Fa n +bytes. +.Pp +Invalid bytes are interpreted as having a length of 1 byte. +.Sh RETURN VALUES +The +.Fn u8len +function returns the number of codepoints in the buffer +.Fa s . +.Sh EXAMPLES +The following call to +.Fn u8len +will return 17 while the call to +.Fn strlen +will return 22 as a result of use of multibyte-characters in +.Fa s . +.Bd -literal -offset indent +struct u8view sv = U8V(u8\(dq„Der Große Duden“\(dq); +size_t blen = strlen((char *)sv.p); +size_t cplen = u8len(U8_ARGS(sv)); +.Ed +.Sh SEE ALSO +.Xr u8glen 3 , +.Xr U8V 3 , +.Xr unicode 7 , +.Xr utf\-8 7 +.Sh STANDARDS +.Rs +.%A F. Yergeau +.%D November 2003 +.%R RFC 3629 +.%T UTF-8, a transformation format of ISO 10646 +.Re +.Sh AUTHORS +.An Thomas Voss Aq Mt mail@thomasvoss.com +.Sh CAVEATS +The return value of +.Fn u8len +does not necessarily represent the number of human-preceived characters +in the given buffer; +multiple codepoints may combine to form one human-preceived character +that spans a single column. +To count user-preceived codepoints +.Pq also known as graphemes , +you may want to use the +.Xr u8glen 3 +function. |