XFT Unicode Functions

Notes:

As mentioned, XFT accepts only Unicode-encoded strings. The first question in the minds of developers is “which Unicode encoding is required?”. As you can see here, the developers of XFT thoughtfully allow you to use whichever Unicode transformation format you want.

For many projects on Linux, the de facto standard is to use UTF-8. Linux is based on Unix which has historically used byte-based data streams. The design and subsequent use of UTF-8 on Unix allowed many Unix utilities to handle Unicode data streams with minimal changes to the existing byte-oriented code. UTF-8 continues to be the preferred Unicode transformation format on Linux today for numerous reasons1 which tend to simplify things for the Linux programmer. Thus, many projects on Linux make use of XftDrawStringUtf8() which accepts an ordinary ANSI C string containing UTF-8 text. However, if you need to, you can use XftDrawString16() to process UTF-16 strings, or XftDrawString32() to processs UCS-4 strings.

1. Advantages of UTF-8 include, but are not limited to: ① ASCII characters are transported transparently in UTF-8 streams, ② Regardless of whether you begin parsing at the start, middle, or end of a UTF-8 string, you can always quickly locate the lead bytes which serve as synchonization points in a string, ③ Comparing UTF-8 strings using ANSI C strcmp() preserves Unicode canonical sorting just as if you had done the string comparison using wcscmp() on UCS-4 strings, ④ no byte order mark (BOM) is required. One can enumerate additional arguments in favor of UTF-8 for Linux and Unix platforms: see Google's cached copy of Roman Czyborra's page on Unicode transformation formats, http://64.233.167.104/search?q=cache:Kw7QqNNqjaUJ:czyborra.com/utf/.