aboutsummaryrefslogtreecommitdiff
path: root/man/u8len.3
diff options
context:
space:
mode:
Diffstat (limited to 'man/u8len.3')
-rw-r--r--man/u8len.366
1 files changed, 66 insertions, 0 deletions
diff --git a/man/u8len.3 b/man/u8len.3
new file mode 100644
index 0000000..a2968e7
--- /dev/null
+++ b/man/u8len.3
@@ -0,0 +1,66 @@
+.Dd March 10 2024
+.Dt U8LEN 3
+.Os
+.Sh NAME
+.Nm u8len
+.Nd count Unicode codepoints
+.Sh LIBRARY
+.Lb mlib
+.Sh SYNOPSIS
+.In mbstring.h
+.Ft size_t
+.Fn u8len "const char8_t *s" "size_t n"
+.Sh DESCRIPTION
+The
+.Fn u8len
+function returns the number of UTF-8 encoded Unicode codepoints in the
+buffer
+.Fa s
+of length
+.Fa n
+bytes.
+.Pp
+Invalid bytes are interpreted as having a length of 1 byte.
+.Sh RETURN VALUES
+The
+.Fn u8len
+function returns the number of codepoints in the buffer
+.Fa s .
+.Sh EXAMPLES
+The following call to
+.Fn u8len
+will return 17 while the call to
+.Fn strlen
+will return 22 as a result of use of multibyte-characters in
+.Fa s .
+.Bd -literal -offset indent
+struct u8view sv = U8V(u8\(dq„Der Große Duden“\(dq);
+size_t blen = strlen((char *)sv.p);
+size_t cplen = u8len(U8_ARGS(sv));
+.Ed
+.Sh SEE ALSO
+.Xr u8glen 3 ,
+.Xr U8V 3 ,
+.Xr unicode 7 ,
+.Xr utf\-8 7
+.Sh STANDARDS
+.Rs
+.%A F. Yergeau
+.%D November 2003
+.%R RFC 3629
+.%T UTF-8, a transformation format of ISO 10646
+.Re
+.Sh AUTHORS
+.An Thomas Voss Aq Mt mail@thomasvoss.com
+.Sh CAVEATS
+The return value of
+.Fn u8len
+does not necessarily represent the number of human-preceived characters
+in the given buffer;
+multiple codepoints may combine to form one human-preceived character
+that spans a single column.
+To count user-preceived codepoints
+.Pq also known as graphemes ,
+you may want to use the
+.Xr u8glen 3
+function.