Unifont.org Home >> Fontaine
Fontaine
What is Fontaine?
Fontaine is a command-line utility that displays key meta information about font files, including but not limited to font name, style, weight, glyph count, character count, copyright, license information and orthographic coverage.
Fontaine is copyright © 2009 by Edward H. Trager.
GPL License
Fontaine is an Open Source program I wrote initially for the Open Font Library project. The software is released under the GNU General Public License (GPL) v. 2 or any later version.
Getting Fontaine
Fontaine is now a project on sourceforge:
http://sourceforge.net/projects/fontaine/
Anyone may obtain the source code for Fontaine from the SVN repository:
svn checkout svn://svn.code.sf.net/p/fontaine/code/trunk fontaine-code
Building Fontaine
Fontaine uses the cross-platform cmake-based build system:
cd fontaine/trunk
cmake .
make
su -c "make install" or sudo make install
Fontaine is a new program and, as such, has not yet been tested on a large number of platforms. The software is known to build and run successfully on Linux and OSX.
Usage
fontaine <option(s)> <font file(s)>
Typical usage is:
fontaine --text some_font.ttf
fontaine --xml --hide-missing some_other_font.otf
fontaine --text --hide-fragmentary --show-missing another_font.ttf
Options
Command-line options are as follows:
Options | ||
Long | Short | Description |
--fxhtml | -Y | Produce output report in FANCY XHTML format. |
--help | -h | Print help and exit |
--hide-fragmentary | -r | Don't report orthographies for which the font provides only fragmentary support. |
--hide-full | -f | Don't report orthographies for which the font provides full support |
--hide-missing | -m | Don't report which Unicode values are missing from fragmentary and partially-supported orthographies. |
--hide-partial | -p | Don't report orthographies for which the font provides only partial support |
--json | -J | Produce output report in JSON format. (default) |
--show-fragmentary | -R | Report orthographies for which the font provides only fragmentary support. |
--show-full | -F | Report orthographies for which the font provides full support |
--show-missing | -M | Report which Unicode values are missing from fragmentary and partially-supported orthographies. (default) |
--show-partial | -P | Report orthographies for which the font provides only partial support |
--text | -T | Produce output report in plain text format. |
--version | -v | Print version and exit |
--xhtml | -H | Produce output report in XHTML format. |
--xml | -X | Produce output report in XML format. |
Output
To facilitate different usage scenarios, Fontaine produces reports in JSON (default), XML, XHTML, and TEXT formats.
A typical output report in JSON format will look something like this:
{ "fonts":[ { "commonName":"Inconsolata", "nativeName":"", "subFamily":"Medium", "style":"normal", "weight":"normal", "fixedWidth":"yes", "fixedSizes":"no", "copyright":"Created by Raph Levien using his own tools and FontForge. Copyright ...", "license":"OFL", "licenseUrl":"http://scripts.sil.org/OFL", "glyphCount":"295", "characterCount":"286", "orthographies":[ { "commonName":"Basic Latin", "nativeName":"Basic Latin", "supportLevel":"full" }, { "commonName":"Western European", "nativeName":"Western European", "supportLevel":"full" }, { "commonName":"Euro", "nativeName":"Euro", "supportLevel":"full" }, { "commonName":"Turkish", "nativeName":"Türkçe", "supportLevel":"full" }, { "commonName":"Central European", "nativeName":"Central European", "supportLevel":"full" }, { "commonName":"Pan African Latin", "nativeName":"Pan African Latin", "supportLevel":"fragmentary", "percentCoverage":"24" } ] } ]}
Typical output in TEXT format will look something like this:
Fonts: Font: Common name: id-asobi_LightOT Native name: id-懐遊体Light OT Sub family: Regular Style: normal Weight: normal Fixed width: no Fixed sizes: no Copyright: 井上 優 ( idfont・井上デザイン ) License: Unknown or Proprietary License Glyph count: 9354 Character count: 8207 Orthographies: Orthography: Common name: Basic Latin Native name: Basic Latin Support level: full Orthography: Common name: Western European Native name: Western European Support level: full Orthography: Common name: Pan African Latin Native name: Pan African Latin Support level: fragmentary Percent coverage: 22 Orthography: Common name: Basic Greek Native name: Ελληνικό αλφάβητο Support level: fragmentary Percent coverage: 69 Orthography: Common name: Basic Cyrillic Native name: Кириллица Support level: full Orthography: Common name: Traditional Chinese Native name: 中文正體字 Support level: partial Percent coverage: 90 Orthography: Common name: Kana Native name: 仮名 Support level: partial Percent coverage: 98 Orthography: Common name: Joyo Native name: 日本常用漢字 Support level: full Orthography: Common name: Japanese Jinmeiyo Native name: 日本人名用漢字 Support level: partial Percent coverage: 99 Orthography: Common name: Japanese Kokuji Native name: 日本国字 Support level: partial Percent coverage: 88 Orthography: Common name: Mathematical Operators Native name: Mathematical Operators Support level: fragmentary Percent coverage: 16
When you want to know what is missing when coverage is less than full, just omit “--hide-missing” or else explicitly use the “--show-missing” option. Output will look something like the following:
Orthography: Common name: Japanese Kokuji Native name: 日本国字 Support level: partial Percent coverage: 88 Missing values: U+4e44 (乄), U+6318 (挘), U+685b (桛), U+68bb (梻)
In the source code tree, there is a base "MLR" ("markup language report") class from which specific reporting classes like JSON and XML are derived. This architecture should make it easy to create additional report formats if needed.
Orthography Groups
What do we mean when we say a font provides coverage for “Western European” languages? Of course we expect that such a font will coverage the Latin-based orthographies of the “big” languages of Western Europe, such as English, French, Spanish, German, and so on. But does that font also provide coverage for the orthographies of minority languages spoken –and presumably also written– in Western Europe? It may be difficult to say – it clearly depends on which minority languages we include.
Let's look at a perhaps less familiar case. What does it mean for a font to “provide coverage for Chinese”? Some Chinese dictionaries include well over 40,000 characters, but a modern educated Chinese person need only know perhaps 3,500 of those to be considered a fluent reader of his or her language. So, in classifying a Chinese font, should we require that the font cover 40,000 characters, or just the most common 3,500?
I asked a lot of questions like these as I began sorting out orthographic coverage categories for Fontaine. I wanted to create categories that would be meaningful to people looking for fonts to meet their needs. Since I myself can't keep ISO-8859-3 vs. ISO-8849-4 straight, it seemed obvious that a first step required avoiding jargon commonly used by standardization bodies.
Another problem is that adoption rates for scripts vary greatly. Some scripts, like Latin, are now used to write hundreds, perhaps thousands of languages. It seemed evident that creating an orthographic category for every language written in Latin might leave users "drowning" in long reports about hundreds of languages that would be largely meaningless to them. The only reasonable answer for pervasive scripts like Latin is to create orthographic groups, but these groups have to be given sensible names like Western European (instead of ISO-8859-1) and Pan African Latin. Such names provide even uninformed users with a pretty good sense of what sorts of languages might be included without burdening them with hundreds of language listings.
The orthography work thus required striking a careful balance between opposing forces -- simplicity versus specificity. Those forces operate differently on different scripts. For pervasive scripts like Latin, one has to tend toward generality at the expense of specificity. For non-pervasive scripts like Japanese, one is at liberty to provide users with a little more detail, such as how well a font covers the indigenously-invented Kokuji (国字 “national”) characters. For scripts that fall in the middle of the pack, like Arabic, providing specific coverage on major languages using extended versions of the script seemed only prudent. Thus all Arabic fonts are tested for their coverage of Farsi, Urdu, Pashto, Uighur, and Sindhi, inter alia.
Orthographic Coverage Levels
Fontaine divides coverage into four levels:
Level | Coverage |
Full | 100% |
Partial | Greater than or equal to 80% |
Fragmentary | Less than 80% |
None | 0% or missing the key character |
Fontaine first looks for a key character (such as the letter A in Latin) that is always expected to be present in a given orthography. If the key character is missing, the program skips additional checks for that orthography. This can theoretically lead to false negatives in rare cases, but the occurrence of such a false negative almost guarantees that something is amiss with the font anyway. If the key character is present, a full check is made. If fewer than 80% of the characters needed for an orthography are present, coverage is classified as fragmentary. Incomplete coverage greater than or equal to 80% is called partial.
Sample Sentences
Fontaine's orthography database includes "sample sentences" and "sample characters" for each orthography. These can be used to create sample font specimens for given orthographies.
For alphabets, pangrams provide a compact and clever way to present all the letters in an orthography. I have provided instances of pangrams for such orthographies where I have been able to locate them.
For many other orthographies, it may be difficult or impossible to locate suitable pangrams. In these cases, the approach I have taken is to provide a representative sentence borrowed from a well-known work of literature, poetry, or some other work of important cultural value. For example, for the Thai language I included a sentence from a stele attributed to King Ramkamhaeng (พ่อขุนรามคำแหงมหาราช) of the Sukothai period. King Ramkamhaeng is credited with inventing the Thai alphabet. Although the language used on the stele is archaic, every school child in Thailand is familiar with it.
Orthographies such as Chinese and Japanese by their very nature don't support pangrams. For these orthographies, we can once again borrow an excerpt or two from a work of important literary value. For example, for Chinese I have used the first few phrases of the famous Thousand Character Classic (千字文) attributed to Zhou Xingsi (周興嗣) of the Liang Dynasty.
As of this writing, the selection of sample sentences remains incomplete. I hope that future contributions from the community will be valuable in expanding and vetting the orthography data that I have compiled so far.
Orthography References
The following sources were referenced when compiling the orthography data for Fontaine:
Characters needed for African orthographies in Latin writing system by Denis Jacquerye. http://www.africanlocalisation.net/content/characters-needed-african-orthographies-latin-writing-system
Eesti Keele Institute Letter Database. http://www.eki.ee/letter/
Frequency and Stroke Counts of Chinese Characters. Copyright © 1996-2006 by Chih-Hao Tsai. http://technology.chtsai.org/charfreq/
Hong Kong Supplementary Character Set 香港增補字符集, Office of the Government Chief Information Officer, Government of Hong Kong Special Adminstrative Region. http://www.ogcio.gov.hk/ccli/eng/hkscs/introduction.html
Japanese Jinmeiyō kanji 人名用漢字, Wikipedia. http://en.wikipedia.org/wiki/Jinmeiyō_kanji
Japanese Jōyō 常用漢字 character list. http://www.aozora.gr.jp/kanji_table/
Japanese Kokuji 国字 national characters. http://www.sljfaq.org/w/kokuji
Language Geek. http://www.languagegeek.com
List of Pangrams. Wikipedia. http://en.wikipedia.org/wiki/List_of_pangrams
Omniglot, A guide to the languages, alphabets, syllabaries and other writing systems of the world. http://www.omniglot.com/
Syriac Peshitta originally at http://www.aifoundations.org/peshitta/; now being superseded by Comprehensive Aramaic Lexicon at http://cal1.cn.huc.edu/
Systèmes alphabétiques des langues africaines copyright © 2006 by C. Chanard, LLACAN. http://sumale.vjf.cnrs.fr/phono/PhonologieN.php
Unicode Code Charts. http://www.unicode.org/charts/
UnicodeSet Demo. http://unicode.org/cldr/utility/list-unicodeset.jsp
Unicode Font Guide For Free/Libre Open Source Operating Systems, http://unifont.org/fontguide/
Wikipedia, http://wikipedia.org/
Bugs
Please refer to the online resources for the Fontaine project on Sourceforge.net for current information on bugs.
Currently known bugs and feature omissions:
Bugs | ||
Date | Description | Status |
2009-03-17 | i18n: gettext-based localized string replacement not working when last checked ... | open |
2009-03-17 | FXHTML: Fancy XHTML report is not yet fully implemented ... | open |
2009-03-17 | Reporting: Sample sentences and sample characters are not reported yet ... | open |
2009-03-17 | Orthographies: Still missing many orthography files, especially for Unicode 5x ... | open |
2009-03-17 | Orthographies: Still missing good sample sentences for many orthographies ... | open |