.Dd 4 May 2024
.Dt U8NEXT 3
.Os
.Sh NAME
.Nm u8next ,
.Nm u8prev
.Nd iterate over Unicode codepoints
.Sh LIBRARY
.Lb mlib
.Sh SYNOPSIS
.In mbstring.h
.Ft int
.Fn u8next "rune *ch" "struct u8view sv"
.Ft int
.Fn u8prev "rune *ch" "const char8_t **s" "const char8_t *start"
.Sh DESCRIPTION
The
.Fn u8next
function decodes the first rune in the UTF-8 encoded string view
.Fa sv
and stores the result in
.Fa ch .
It then shrinks
.Fa sv
so that the decoded rune is removed.
.Pp
The
.Fn u8prev
function takes a pointer
.Fa start
which points to the start of the string and updates
.Fa s
to point to the previous codepoint in the buffer.
The rune
.Fa ch
is set to UTF-8 codepoint pointed to by
.Fa s
after iteration.
.Pp
Both of these functions set
.Va *ch
to
.Dv RUNE_ERROR
in the case of invalid UTF-8.
.Sh RETURN VALUES
The
.Fn u8next
and
.Fn u8prev
functions return the length of the UTF-8-encoded rune iterated over in
bytes,
or 0 at the end of iteration.
.Sh EXAMPLES
The following calls to
.Fn u8next
iterate over and print all the codepoints in
.Va sv .
.Bd -literal -offset indent
#include <rune.h> /* For PRIXRUNE; see rune(3) */

int w;
rune ch;
struct u8view sv = U8("Ta’ Ħaġrat");

while (w = u8next(&ch, &sv))
	printf("U+%04" PRIXRUNE ": ‘%.*s’\en", ch, w, sv.p - w);
.Ed
.Pp
The following example is the same as the previous,
but it uses the
.Fn u8prev
function to iterate backwards.
.Bd -literal -offset indent
#include <rune.h> /* For PRIXRUNE; see rune(3) */

int w;
rune ch;
struct u8view sv = U8("Ta’ Ħaġrat");
const char8_t *s = sv.p + sv.len;

while (w = u8prev(&ch, &s, sv.p))
	printf("U+%04" PRIXRUNE ": ‘%.*s’\en", ch, w, s);
.Ed
.Sh SEE ALSO
.Xr rune 3 ,
.Xr U8 3 ,
.Xr u8gnext 3 ,
.Xr u8tor 3 ,
.Xr u8view 3type ,
.Xr RUNE_ERROR 3const ,
.Xr unicode 7 ,
.Xr utf\-8 7
.Sh STANDARDS
.Rs
.%A F. Yergeau
.%D November 2003
.%R RFC 3629
.%T UTF-8, a transformation format of ISO 10646
.Re
.Sh AUTHORS
.An Thomas Voss Aq Mt mail@thomasvoss.com