“简体字”不简单 Jiǎn tǐ zì bù jiǎn dān: The Complexity of Simplified Chinese - Part II
In Part I of this essay we examined how the new reality of online culture has exacerbated the problem of having two orthographies for Chinese. Chinese educated in the mainland where simplified characters are now taught in schools naturally enough are most comfortable reading texts written in simplified Chinese. Older readers educated in the mainland prior to the orthographic reforms and younger readers educated in Taiwan and Hong Kong where traditional characters continue to be used are naturally most comfortable reading texts written in traditional characters. Because in many cases two or more traditional characters are represented using a single simplified character, mapping between the two orthographies is non-trivial and requires parsing texts at the level of words, not individual characters.
How well are these new realities being handled on today's world-wide web? For example, how are semantically-equivalent queries written in one orthography or the other handled by search engines? Are both simplified and traditional results presented? And how do content providers handle the issue of orthographic conversion?
Although it is not possible to treat these questions exhaustively, in this part we will take a look at one representative example in order to get a rough idea of how things stand.
Let's Do Some Research ...
Suppose you are a Chinese college student. Your professor has asked you to write an essay about a famous Chinese writer of your choice. You like scary movies and ghost stories. You hear about a book called Strange Stories from a Chinese Studio (聊齋誌異 liaó zhaī zhì yì in Chinese) by Pu Songling 蒲松龄. It sounds like fun, so you decide to write your essay on this writer.
Like most young people today, your first choice is probably not running across campus to the library. Your first choice is more likely a little search on the internet. Today you decide to check out the Chinese edition of Wikipedia first.
Typing in Wikipedia the writer's name —“蒲松龄” in simplified characters— produces the following results:
蒲松龄 关联度：100.0% - -
蒲松齡 关联度：2.0% - -
蒙古人名 关联度：1.7% - -
泉城广场 关联度：1.4% - -
Oops: here's where we encounter the first problem. Look at the first two entries. The first entry is in simplified characters. The second entry is in traditional characters —only the third character has changed to a more complicated but recognizably similar character. The second entry is, in fact, the very same article. It differs only in being presented in traditional characters instead of simplified characters.
But take a look at the “relevancy” column —“关联度” in Chinese. The second entry —2%— is wrong. Not just a little wrong. Completely wrong. The traditional character article entitled “蒲松齡” is most definitely all about Pu Songling. As surely as the first article is. The two articles are word-for-word 100% identical. They only differ in orthography.
Well, that's the first problem. But, either way, we have now found an article about the writer. So let's take a look at it by clicking on the first link so we can look at the simplified character text.
Oops again. At the very top of the page, before we even get to the article itself, a notice:
I won't bother translating the whole thing, just the most interesting first part which says:
Our traditional-to-simplified Chinese conversion system currently exhibits exceptions. Some portion of the characters and words may have been converted incorrectly. We respectively ask you to be aware of this ...
You got that right! Even the warning message itself is displayed almost completely in traditional characters.
And when we move on to the text, we see:
跳转到： 导航, 搜索
... a mix of traditional and simplified characters.
At this point, we notice however that there is a menu at the top of the page:
This menu conveniently provides orthography display choices. The items in this menu are:
Hong Kong Traditional
... and it appears that Don't Convert is the default selection. Is this the right default? I continue to ponder the reasoning behind this choice.
When we investigate the revision history for this article, it all begins to make sense: Some contributors to this article have typed in traditional characters, and some have typed in simplified characters. The resulting document is therefore a mix of both orthographies. Ok, at least now I understand how we got this mixed-orthography document.
And when we click on the “大陆简体” choice for Mainland Simplified, it does appear that we get a mainly simplified text --except for that message at the top which curiously remains in traditional characters!
This Chinese Wikipedia “work in progress” is just one example among many which we might choose to gain at least a rough idea of the current state of affairs on the Chinese world-wide web. If nothing else, this single example from the very well-known Wikipedia project illustrates that opportunities to improve and facilitate the Chinese user's experience remain wide-open.