A Dedicated OpenType Platform ID for the FLOSS Community
A few days ago I started to "scratch an itch" that I have had for a long time ...
I was once again thinking about font management issues on the Free Desktop. Back in September of 2005 I had written a little manifesto on what I thought a font dialog on the Free Desktop ought to look like. My ideas have since matured but remain principally unchanged.
In October, 2006, I revisted these ideas and added some additional annotations to that manifesto in preparation for the Gnome Live! 2006 Text Layout Summit (See Andreas Vox's blog for a nice summary of that summit). So, in a way, a number of ideas related to fonts and font management on the Free Desktop had already resurfaced in my mind --and have more or less have remained there-- since the October Summit. And now it is January 2007.
All of us are familiar with the old adage:
“If you want something done, do it yourself; other wise it'll never get done.”
This is attributed to Alicia Figgs on ThinkExist.com. Who is Alicia Figgs?
Linus Torvalds, a character perhaps more familiar to the Free/Libre Open Source Software (FLOSS) community, has been quoted in a similiar but more amusing vein:
... the Linux philosophy is “laugh in the face of danger”. Oops. Wrong one. “Do it yourself”. That's it.
So, I decided to take Linus' advice and start writing some foundational classes that I hope will eventually become the basis for a font management system for the Free Desktop, or a font dialog along the lines of what I have suggested previously, or both. But, as I am only just beginning, I am keeping that under wraps for now ... heh, heh, heh :-). The only hint I'll give away is that the system will support arbitrary virtual font collections and arbitrarily nested collections of collections in a highly efficient manner.
While it is too early to reveal any other tidbits of my nascent software project, there is an idea that occurred to me that I would like to share with the Free Desktop and FLOSS font development communities. This idea occurred to me as I was happily coding away with the OpenType Naming Table spec in front of me. This is part of the OpenType 1.4 specification. And by the way, I double checked Part 22 of the ISO/IEC JTC 1/SC 29 N 6929 proposal regarding the Open Font Format and it is almost word-for-word identical to OpenType 1.4, except that now the name of the format is the Open Font Format Specification (OFFS). The spec lists the following “platform IDs”:
Hey, wait a minute! Where is the platform ID for Linux, an OS whose market is growing eight times faster than the server market overall?1 Where is the ID for FreeBSD, and for the wider Open Source, Free Desktop, and FLOSS font development communities? Where is the ID for what is --when these multitudinous communities are combined together-- arguably the most important collaborative community platform of our time?
I'm not kidding! Take a look at these visitation statistics by operating system for this site for the first ten days of January, 2007:
These statistics merely confirm what we already know: Linux and the Free Desktop in general already have greater market share than Apple OS X (although the release of Vista by Microsoft and new toys like Apple's iPhone may mix things up in the coming months). Either way, hardware vendors sell over $1 billion of Linux stuff every quarter.2 Linux is a major operating system and the poster child of the FLOSS movement. But no OpenType platform ID for the FLOSS community platform? This is just wrong.
I became indignant about this when I was hacking away on my new software. I opened a random font file and tried to read the Unicode strings associated with the Microsoft platform (ID 3) and discovered that the strings were encoded in UTF-16. Well, of course, I should have known that! But as I normally do not code on the Windows platform, my UTF8String class did not have any methods for dealing with UTF-16. I had never had to deal with it before. So I had never spent the extra time to write any UTF-16 conversion method.
It doesn't take long to write a method to convert UTF-16 to UTF-8 if one already has a method to convert UTF-32 to UTF-8. So, in reality, it did not take a lot of work as I had already written a UTF-32 to UTF-8 conversion method. I had only to add a check for surrogate pairs. And of course the Unicode Consortium provides example code so I didn't have to use my brain too much.
But in the back of my mind, I was thinking that the FLOSS community should have a platform ID for themselves. A platform ID that would really represent a platform agnostic ID.
This has nothing to do with the technical aspects of things. Yes, of course we can always write software to read the Windows and Mac strings. But that's not the point. No, this is, at the end of the day, more about recognizing that if the FLOSS development community had come out with the OpenType spec in the beginning, it would not look like it currently does. If OpenType had been created by the FLOSS community as a collaborative effort, there would have been a single, shared standard and no need for separate "platform IDs" or other platform-specific idiosyncracies. And the strings would probably use a UTF-8 encoding instead of UTF-16 which suffers from both endian uncertainty and only came into existance because of historical shortsightedness regarding the required size of the Unicode code space.
So, having said all that, I'd like to propose the following:
Proposal To Add A Platform ID To The OpenType Spec
To Cover The FLOSS Community Platforms
Article 1. Platform Identifier
The platform ID for the Open Source / Software Libre community shall be the number 75. This number is chosen because when you turn it upside-down, it looks very much like "SL" which of course stands for "Software Libre":
And in the wider sense, Software Libre stands for Liberty, Equality, Fraternity, Community, Choice, Democracy, Love, Peace, and the Pursuit of Software Happiness.
Note: OpenType defines platform ID 0 for Unicode. This would be the most obvious existing option for use by the FLOSS community. However, no language IDs are defined for the Unicode platform ID. The standard says:
There are currently no language IDs defined for the Unicode platform. This means that it can be used for encodings in the 'cmap' table but not for strings in the 'name' table.
OK ... but that's not very useful, is it? In practice, it therefore appears that platform ID 0 is restricted to ASCII strings if it is used at all.
The only other existing option would be platform ID 4 Custom since platform ID 2 ISO is deprecated. "Custom" suggests that this is for private use, similar to the private use area in Unicode, so this also does not seem like a viable choice.
Article 2. Transformation Format for String Encoding
The strings for platform 75 are to use the UTF-8 transformation format.
Note: UTF-8 is unambiguous across architectures differing in endianess. Because the initial 128 code points are identical with ASCII and the format can be used to store any character from any plane of the Universal Character Set (UCS) of Unicode, UTF-8 has quickly become the preferred encoding format in the FLOSS world and beyond.
Article 3. Language IDs for Strings
The FLOSS community platform will follow the recommendations of IETF RFC 3066 and use language IDs based on ISO-639.2 which is a recognized international standard (instead of using arbitrary vendor-specific language IDs). When available, the shorter two-letter ISO-639-1 codes can be used. The three-letter ISO-639-2 codes can be used whenever a two-letter code does not exist.
Note: Here we avoid using arbitrary vendor-specific (i.e., Microsoft) language IDs. The current OFFS draft only provides a reference to the Microsoft platform-specific language IDs. Not even a reference to the Macintosh "encoding ID"s remains in the OFFS draft. No mention is made of either ISO-639-1 or ISO-639-2.
A question remains however. The languageID field of a NameRecord is only a USHORT. In other words, it is not big enough to hold a three-letter ISO-639.2 code without some kind of mapping. One possibility would be to simply enumerate the ISO-639-2 letter codes from 1 to n. But that's not a very good solution, as additional language codes are likely to be added to ISO-639-2 over time.
There is another option which I like much better. And that is to borrow the USHORT encodingID field so that we have two extra bytes to use along with the two bytes from the languageID field. We can get away with this because we don't need the encodingID for anything else, since the only encoding ever used will be UTF-8. So this plan is set out in Article 4 below:
Article 4. Special Use of the Encoding ID Field with Platform 75
When the platformID is 75, the encodingID field is commandeered to serve as a container for the 3rd letter of an ISO-639-2 code. If in the future a 4th letter is required, this will also be stored in the remaining byte of the encodingID field.
Assuming a hypothetical 4-letter code ABCD to illustrate the storage graphically, we would have the following:
|high bits||low bits||high bits||low bits|
This nicely solves the problem of how to store ISO-639-2 language codes within the existing OpenType nameRecord framework. As these codes would only be used for Platform 75, they would be ignored by software which had only been designed to look at Macintosh (ID 1) or Windows (ID 3) strings.
Suggestions for Moving Forward
STEP ZERO: Discuss this on the mailing lists. Email me if you think I'm crazy.
STEP ONE: Add support for Platform 75 to George William's FontForge.
STEP TWO: Devise a set of guidelines on how Open Font developers should fill in the OpenType name fields. For example, it would be helpful to have guidelines on how to fill in the License field.
Thanks for reading and a Happy New Year to all!
-- Ed Trager, 2007.01.11
1. Wikinomics by Don Tapscott and Anthony D. Williams. Portfolio Books, New York. 2006. p. 66.
2. ibid., p. 66.