Fontaine

What is Fontaine?

Fontaine is a command-line utility that displays key meta information about font files, including but not limited to font name, style, weight, glyph count, character count, copyright, license information and orthographic coverage.

GPL License

Fontaine is an Open Source program I wrote initially for the Open Font Library project. The software is released under the GNU General Public License (GPL) v. 2 or any later version.

Getting Fontaine

Fontaine is now a project on sourceforge:

http://sourceforge.net/projects/fontaine/

Anyone may obtain the source code for Fontaine from the SVN repository:

svn checkout svn://svn.code.sf.net/p/fontaine/code/trunk fontaine-code

Building Fontaine

Fontaine uses the cross-platform cmake-based build system:

cd fontaine/trunk
cmake .
make
su -c "make install" or sudo make install

Fontaine is a new program and, as such, has not yet been tested on a large number of platforms. The software is known to build and run successfully on Linux and OSX.

Usage

fontaine <option(s)> <font file(s)>

Typical usage is:

fontaine --text some_font.ttf
fontaine --xml --hide-missing some_other_font.otf
fontaine --text --hide-fragmentary --show-missing another_font.ttf

Options

Command-line options are as follows:

Options
Long	Short	Description
--fxhtml	-Y	Produce output report in FANCY XHTML format.
--help	-h	Print help and exit
--hide-fragmentary	-r	Don't report orthographies for which the font provides only fragmentary support.
--hide-full	-f	Don't report orthographies for which the font provides full support
--hide-missing	-m	Don't report which Unicode values are missing from fragmentary and partially-supported orthographies.
--hide-partial	-p	Don't report orthographies for which the font provides only partial support
--json	-J	Produce output report in JSON format. (default)
--show-fragmentary	-R	Report orthographies for which the font provides only fragmentary support.
--show-full	-F	Report orthographies for which the font provides full support
--show-missing	-M	Report which Unicode values are missing from fragmentary and partially-supported orthographies. (default)
--show-partial	-P	Report orthographies for which the font provides only partial support
--text	-T	Produce output report in plain text format.
--version	-v	Print version and exit
--xhtml	-H	Produce output report in XHTML format.
--xml	-X	Produce output report in XML format.

Output

To facilitate different usage scenarios, Fontaine produces reports in JSON (default), XML, XHTML, and TEXT formats.

A typical output report in JSON format will look something like this:

{
"fonts":[
  {
    "commonName":"Inconsolata",
    "nativeName":"",
    "subFamily":"Medium",
    "style":"normal",
    "weight":"normal",
    "fixedWidth":"yes",
    "fixedSizes":"no",
    "copyright":"Created by Raph Levien using his own tools and FontForge. Copyright ...",
    "license":"OFL",
    "licenseUrl":"http://scripts.sil.org/OFL",
    "glyphCount":"295",
    "characterCount":"286",
    "orthographies":[
      {
        "commonName":"Basic Latin",
        "nativeName":"Basic Latin",
        "supportLevel":"full"
      },
      {
        "commonName":"Western European",
        "nativeName":"Western European",
        "supportLevel":"full"
      },
      {
        "commonName":"Euro",
        "nativeName":"Euro",
        "supportLevel":"full"
      },
      {
        "commonName":"Turkish",
        "nativeName":"Türkçe",
        "supportLevel":"full"
      },
      {
        "commonName":"Central European",
        "nativeName":"Central European",
        "supportLevel":"full"
      },
      {
        "commonName":"Pan African Latin",
        "nativeName":"Pan African Latin",
        "supportLevel":"fragmentary",
        "percentCoverage":"24"
      }
    ]
  }
]}

Typical output in TEXT format will look something like this:

Fonts:
   Font:
      Common name: id-asobi_LightOT
      Native name: id-懐遊体Light OT
      Sub family: Regular
      Style: normal
      Weight: normal
      Fixed width: no
      Fixed sizes: no
      Copyright: 井上　優　(　ｉｄｆｏｎｔ・井上デザイン　）
      License: Unknown or Proprietary License
      Glyph count: 9354
      Character count: 8207
      Orthographies:
         Orthography:
            Common name: Basic Latin
            Native name: Basic Latin
            Support level: full

         Orthography:
            Common name: Western European
            Native name: Western European
            Support level: full

         Orthography:
            Common name: Pan African Latin
            Native name: Pan African Latin
            Support level: fragmentary
            Percent coverage: 22

         Orthography:
            Common name: Basic Greek
            Native name: Ελληνικό αλφάβητο
            Support level: fragmentary
            Percent coverage: 69

         Orthography:
            Common name: Basic Cyrillic
            Native name: Кириллица
            Support level: full

         Orthography:
            Common name: Traditional Chinese
            Native name: 中文正體字
            Support level: partial
            Percent coverage: 90

         Orthography:
            Common name: Kana
            Native name: 仮名
            Support level: partial
            Percent coverage: 98

         Orthography:
            Common name: Joyo
            Native name: 日本常用漢字
            Support level: full

         Orthography:
            Common name: Japanese Jinmeiyo
            Native name: 日本人名用漢字
            Support level: partial
            Percent coverage: 99

         Orthography:
            Common name: Japanese Kokuji
            Native name: 日本国字
            Support level: partial
            Percent coverage: 88

         Orthography:
            Common name: Mathematical Operators
            Native name: Mathematical Operators
            Support level: fragmentary
            Percent coverage: 16

When you want to know what is missing when coverage is less than full, just omit “--hide-missing” or else explicitly use the “--show-missing” option. Output will look something like the following:

        Orthography:
            Common name: Japanese Kokuji
            Native name: 日本国字
            Support level: partial
            Percent coverage: 88
            Missing values: U+4e44 (乄), U+6318 (挘), U+685b (桛), U+68bb (梻)

In the source code tree, there is a base "MLR" ("markup language report") class from which specific reporting classes like JSON and XML are derived. This architecture should make it easy to create additional report formats if needed.

Orthography Groups

What do we mean when we say a font provides coverage for “Western European” languages? Of course we expect that such a font will coverage the Latin-based orthographies of the “big” languages of Western Europe, such as English, French, Spanish, German, and so on. But does that font also provide coverage for the orthographies of minority languages spoken –and presumably also written– in Western Europe? It may be difficult to say – it clearly depends on which minority languages we include.

Let's look at a perhaps less familiar case. What does it mean for a font to “provide coverage for Chinese”? Some Chinese dictionaries include well over 40,000 characters, but a modern educated Chinese person need only know perhaps 3,500 of those to be considered a fluent reader of his or her language. So, in classifying a Chinese font, should we require that the font cover 40,000 characters, or just the most common 3,500?

I asked a lot of questions like these as I began sorting out orthographic coverage categories for Fontaine. I wanted to create categories that would be meaningful to people looking for fonts to meet their needs. Since I myself can't keep ISO-8859-3 vs. ISO-8849-4 straight, it seemed obvious that a first step required avoiding jargon commonly used by standardization bodies.

Another problem is that adoption rates for scripts vary greatly. Some scripts, like Latin, are now used to write hundreds, perhaps thousands of languages. It seemed evident that creating an orthographic category for every language written in Latin might leave users "drowning" in long reports about hundreds of languages that would be largely meaningless to them. The only reasonable answer for pervasive scripts like Latin is to create orthographic groups, but these groups have to be given sensible names like Western European (instead of ISO-8859-1) and Pan African Latin. Such names provide even uninformed users with a pretty good sense of what sorts of languages might be included without burdening them with hundreds of language listings.

The orthography work thus required striking a careful balance between opposing forces -- simplicity versus specificity. Those forces operate differently on different scripts. For pervasive scripts like Latin, one has to tend toward generality at the expense of specificity. For non-pervasive scripts like Japanese, one is at liberty to provide users with a little more detail, such as how well a font covers the indigenously-invented Kokuji (国字 “national”) characters. For scripts that fall in the middle of the pack, like Arabic, providing specific coverage on major languages using extended versions of the script seemed only prudent. Thus all Arabic fonts are tested for their coverage of Farsi, Urdu, Pashto, Uighur, and Sindhi, inter alia.

Orthographic Coverage Levels

Fontaine divides coverage into four levels:

Level	Coverage
Full	100%
Partial	Greater than or equal to 80%
Fragmentary	Less than 80%
None	0% or missing the key character

Fontaine first looks for a key character (such as the letter A in Latin) that is always expected to be present in a given orthography. If the key character is missing, the program skips additional checks for that orthography. This can theoretically lead to false negatives in rare cases, but the occurrence of such a false negative almost guarantees that something is amiss with the font anyway. If the key character is present, a full check is made. If fewer than 80% of the characters needed for an orthography are present, coverage is classified as fragmentary. Incomplete coverage greater than or equal to 80% is called partial.

Sample Sentences

Fontaine's orthography database includes "sample sentences" and "sample characters" for each orthography. These can be used to create sample font specimens for given orthographies.

For alphabets, pangrams provide a compact and clever way to present all the letters in an orthography. I have provided instances of pangrams for such orthographies where I have been able to locate them.

For many other orthographies, it may be difficult or impossible to locate suitable pangrams. In these cases, the approach I have taken is to provide a representative sentence borrowed from a well-known work of literature, poetry, or some other work of important cultural value. For example, for the Thai language I included a sentence from a stele attributed to King Ramkamhaeng (พ่อขุนรามคำแหงมหาราช) of the Sukothai period. King Ramkamhaeng is credited with inventing the Thai alphabet. Although the language used on the stele is archaic, every school child in Thailand is familiar with it.

Orthographies such as Chinese and Japanese by their very nature don't support pangrams. For these orthographies, we can once again borrow an excerpt or two from a work of important literary value. For example, for Chinese I have used the first few phrases of the famous Thousand Character Classic (千字文) attributed to Zhou Xingsi (周興嗣) of the Liang Dynasty.

As of this writing, the selection of sample sentences remains incomplete. I hope that future contributions from the community will be valuable in expanding and vetting the orthography data that I have compiled so far.

Orthography References

The following sources were referenced when compiling the orthography data for Fontaine:

Characters needed for African orthographies in Latin writing system by Denis Jacquerye. http://www.africanlocalisation.net/content/characters-needed-african-orthographies-latin-writing-system

Eesti Keele Institute Letter Database. http://www.eki.ee/letter/

Hong Kong Supplementary Character Set 香港增補字符集, Office of the Government Chief Information Officer, Government of Hong Kong Special Adminstrative Region. http://www.ogcio.gov.hk/ccli/eng/hkscs/introduction.html

Japanese Jinmeiyō kanji 人名用漢字, Wikipedia. http://en.wikipedia.org/wiki/Jinmeiyō_kanji

Japanese Jōyō 常用漢字 character list. http://www.aozora.gr.jp/kanji_table/

Japanese Kokuji 国字 national characters. http://www.sljfaq.org/w/kokuji

Language Geek. http://www.languagegeek.com

List of Pangrams. Wikipedia. http://en.wikipedia.org/wiki/List_of_pangrams

Omniglot, A guide to the languages, alphabets, syllabaries and other writing systems of the world. http://www.omniglot.com/

Syriac Peshitta originally at http://www.aifoundations.org/peshitta/; now being superseded by Comprehensive Aramaic Lexicon at http://cal1.cn.huc.edu/

Unicode Code Charts. http://www.unicode.org/charts/

UnicodeSet Demo. http://unicode.org/cldr/utility/list-unicodeset.jsp

Unicode Font Guide For Free/Libre Open Source Operating Systems, http://unifont.org/fontguide/

Wikipedia, http://wikipedia.org/

Bugs

Please refer to the online resources for the Fontaine project on Sourceforge.net for current information on bugs.

Currently known bugs and feature omissions:

Bugs
Date	Description	Status
2009-03-17	i18n: gettext-based localized string replacement not working when last checked ...	open
2009-03-17	FXHTML: Fancy XHTML report is not yet fully implemented ...	open
2009-03-17	Reporting: Sample sentences and sample characters are not reported yet ...	open
2009-03-17	Orthographies: Still missing many orthography files, especially for Unicode 5x ...	open
2009-03-17	Orthographies: Still missing good sample sentences for many orthographies ...	open

Valid CSS!