aboutsummaryrefslogtreecommitdiff
path: root/man/u8len.3
blob: 886bf6e91918444c53b64160cd8e0057d22aa77b (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
.Dd 4 May 2024
.Dt U8LEN 3
.Os
.Sh NAME
.Nm u8len
.Nd count Unicode codepoints
.Sh LIBRARY
.Lb mlib
.Sh SYNOPSIS
.In mbstring.h
.Ft size_t
.Fn u8len "u8view_t sv"
.Sh DESCRIPTION
The
.Fn u8len
function returns the number of UTF-8 encoded Unicode codepoints in the
string view
.Fa sv .
.Pp
Invalid bytes are interpreted as having a length of 1 byte.
.Sh RETURN VALUES
The
.Fn u8len
function returns the number of codepoints in the string view
.Fa sv .
.Sh EXAMPLES
The following call to
.Fn u8len
will return 17 while the call to
.Fn strlen
will return 22 as a result of use of multibyte-characters in
.Fa sv .
.Bd -literal -offset indent
size_t n;
u8view_t sv = U8(\(dq„Der Große Duden“\(dq);

n = u8len(sv);            /* 17 */
n = strlen((char *)sv.p); /* 22 */
.Ed
.Sh SEE ALSO
.Xr U8 3 ,
.Xr u8gcnt 3 ,
.Xr u8view 3 ,
.Xr unicode 7 ,
.Xr utf\-8 7
.Sh STANDARDS
.Rs
.%A F. Yergeau
.%D November 2003
.%R RFC 3629
.%T UTF-8, a transformation format of ISO 10646
.Re
.Sh AUTHORS
.An Thomas Voss Aq Mt mail@thomasvoss.com
.Sh CAVEATS
The return value of
.Fn u8len
does not necessarily represent the number of human-preceived characters
in the given string view;
multiple codepoints may combine to form one human-preceived character.
These human-preceived characters may even take up multiple columns in a
monospaced-environment such as in a terminal emulator.
To count user-preceived characters
.Pq also known as graphemes ,
you may want to use the
.Xr u8gcnt 3
function.