blob: 42f3a0d1959eaa687a511d79622430648f4eb5b6 (
plain) (
blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
|
.Dd January 16 2024
.Dt U8WDTH 3
.Os
.Sh NAME
.Nm u8wdth
.Nd Unicode codepoint width
.Sh LIBRARY
.Lb librune
.Sh SYNOPSIS
.In mbstring.h
.Ft int
.Fn u8wdth "rune ch"
.Sh DESCRIPTION
The
.Fn u8wdth
function returns the number of bytes that would be occupied by the
Unicode-codepoint
.Fa ch
if it was encoded as UTF-8.
If
.Fa ch
is greater than
.Dv RUNE_MAX ,
a width of 0 is returned.
.Pp
If the exact UTF-8 encoded size of a codepoint is not relevant and you
simply wish to allocate a buffer capable of holding a given number of
UTF-8 codepoints,
the
.Dv U8_LEN_MAX
macro may be preferable.
.Pp
This function treats invalid codepoints smaller than
.Dv RUNE_MAX
such as UTF-16 surrogates as valid.
.Sh RETURN VALUES
The
.Fn u8wdth
function returns the number of bytes required to UTF-8 encode the
codepoint
.Fa ch .
.Sh EXAMPLES
The following example allocates a buffer which is exactly large enough to
hold the given UTF-32 string once it is converted to UTF-8.
.Bd -literal -offset indent
#define lengthof(a) (sizeof(a) / sizeof(*(a)))
size_t bufsiz = 0;
char8_t *buf;
char32_t s[] = U\(dqIJsselmeer\(dq; /* ‘IJ’ takes 2 bytes */
for (size_t i = 0; i < lengthof(s) - 1; i++)
bufsiz += u8wdth(s[i]);
buf = malloc(bufsiz);
.Ed
.Sh SEE ALSO
.Xr u8glen 3 ,
.Xr u8len 3 ,
.Xr unicode 7 ,
.Xr utf-8 7
.Sh STANDARDS
.Rs
.%A F. Yergeau
.%D November 2003
.%R RFC 3629
.%T UTF-8, a transformation format of ISO 10646
.Re
.Sh AUTHORS
.An Thomas Voss Aq Mt mail@thomasvoss.com
|