Garry's Mod Wiki

utf8

The utf8 library provides basic support for UTF-8 encoding. This library does not provide any support for Unicode other than the handling of the encoding. Any operation that needs the meaning of a character, such as character classification, is outside its scope.

Unless stated otherwise, all functions that expect a byte position as a parameter assume that the given position is either the start of a byte sequence or one plus the length of the subject string. As in the string library, negative indices count from the end of the string.

Fields

This is NOT a function, it's a pattern (a string, not a function) which matches exactly one UTF-8 byte sequence, assuming that the subject is a valid UTF-8 string.

Methods

string utf8.char( vararg codepoints )
Receives zero or more integers, converts each one to its corresponding UTF-8 byte sequence and returns a string with the concatenation of all these sequences.
vararg utf8.codepoint( string string, number startPos = 1, number endPos = 1 )
Returns the codepoints (as numbers) from all characters in the given string that start between byte position startPos and endPos. It raises an error if it meets any invalid byte sequence. This functions similarly to string. byte.
function utf8.codes( string string )
Returns an iterator (like string. gmatch) which returns both the position and codepoint of each utf8 character in the string. It raises an error if it meets any invalid byte sequence.
string utf8.force( string string )
Forces a string to contain only valid UTF-8 data. Invalid sequences are replaced with U+FFFD (the Unicode replacement character). This is a lazy way for users to ensure a string contains only valid UTF-8 data.
string utf8.GetChar( string str, number index )
A UTF-8 compatible version of string. GetChar.
number, number utf8.len( string string, number startPos = 1, number endPos = -1 )
Returns the number of UTF-8 sequences in the given string between positions startPos and endPos (both inclusive). If it finds any invalid UTF-8 byte sequence, returns false as well as the position of the first invalid byte.
number utf8.offset( string string, number n, number startPos = 1 when n>=0, -1 otherwise )
Returns the byte-index of the n'th UTF-8-character after the given startPos (nil if none). startPos defaults to 1 when n is positive and -1 when n is negative. If n is zero, this function instead returns the byte-index of the UTF-8-character startPos lies within.
string utf8.sub( string string, number StartPos, number EndPos = nil )
A UTF-8 compatible version of string. sub. Avoid using this function on large strings every tick/frame, as it may cause lags.