Monotype


hardware/software solutions
Character Versus Glyph

As a result of the rapidly maturing electronic publishing industry and its association with information technology, older typographic terms have taken on new meanings. For instance, it is now essential to distinguish between a 'character' and a 'glyph.' A character is an entity used in data interchange that only generically specifies a particular symbol. More specifically, when a character such as 'c' is transmitted, the way it is displayed at the receiving end is not strictly stipulated. It is simply sufficient that the character is recognized as a 'c.' On the other hand, a glyph is defined as the particular shape of a given character as it is displayed. We further define a character set as an ordered collection of characters, while a font is an ordered collection of glyphs. What's more, we refer to an ordered collection as an 'encoding.'

In general, it is understood that some interaction of hardware and software will translate character codes into glyphs. This interaction, which can differ substantially from one system to another, is loosely categorized under the term 'rendering process.' Though the boundaries of this term may be nebulous, the rendering process encompasses at least the following components:

operating system
locale and language settings
keyboard and display software
word processing software
type rasterizer
hardware for input and output.

horizontal rule

visual: glyphs and sets

horizontal rule

Character Set or Font Encoding?

Even though Latin-1 was devised as a character set, the one-to-one matching of its characters to the corresponding glyphs has allowed it to serve equally well as a font encoding. The same holds true of other standards in the ISO 8859 series: Latin-2, -3, -4, -5, and even Latin-Russian and Latin-Greek. In all the listed cases, the character set can serve equally well as a font encoding. This convenient extrapolation from character set to font fails dramatically, however, when applied to ISO 8859-6, the Latin-Arabic standard. In this case, characters specify generic Arabic letters without referring to the particular graphic shape (the glyph) they take when rendered on paper or on-screen. Therefore, a font that conforms literally to the character set of ISO 8859-6 would not adequately render the text encoded in that standard because it would not have all the necessary shapes for each Arabic letter. Similar problems occur in the handling of Devanagari or Thai scripts.

Even in the case of Latin script alone, character sets that allow a glyph to be composed from several elements, for example:

would force us to treat character sets and font encoding as distinct, though related, entities.

Unicode Design Principles

The issues mentioned above, among others, were very much on the minds of Unicode's inventors since the early stages of its inception. Following are some of Unicode's design principles that are fundamental to any discussion of fonts.

A. Unicode is a character set for the basic interchange of plain text. It contains no attributes regarding language, display format, color, typeface, or any other details about rendering. In this respect, Unicode-encoded text is analogous to ASCII-encoded text.

B. Unicode characters are made visible through a distinct rendering process that maps characters to glyphs. While this principle applies extensively to Semitic and Indic scripts, it also remains applicable to Latin and other scripts. For example, if a text contains the sequence of characters

one would expect the word

to be rendered.

In this case, the single glyph was represented by two successive characters in the text.

C. In order to avoid duplication of characters, Unicode encodes text by script, not by language. For instance, the Latin A is used without distinction for text in Catalan, English, Indonesian, Swedish, or Swahili.

The fact that different languages and cultures may prefer different display forms for particular letters is relegated to the rendering process, which may have further information about style, language, locale, and other pertinent attributes.

To illustrate this important design principle, consider that Devanagari script is used for Hindi, Sanskrit, and Nepali; Hebrew script for Modern Hebrew, Biblical Hebrew, Ladino, and Yiddish; Cyrillic script for Russian, Ukrainian, Belorussian, Serbian, Bulgarian, and even Azeri and Uzbek; Arabic script for Arabic, Farsi, Urdu, Kurdish, and Ottoman Turkish; Greek script for Modern Greek, Classical Greek, and Coptic.

As these examples show, any given script can represent related and unrelated languages - or living and dead languages - alike. With such diversity, there are bound to be many differences in style. Nonetheless, those differences should be handled by the rendering software, not the underlying character code.

Mapping Standard to Font

Using the above principles, Unicode cannot be directly used as a font encoding scheme. What we're really after is a font - or set of fonts - that faithfully renders Unicode-encoded text. In Unicode terminology, such fonts would be called 'Unicode-conformant.' By definition, a process is considered conformant if it can correctly interpret and render a subset of Unicode without misinterpreting or disturbing all the other subsets. Unicode conformance in a font indicates correctness of interpretation, but not necessarily breadth of coverage. Claims of Unicode conformance can, therefore, be misleading when there is no common understanding about the number of scripts and/or languages supported.

Following are some examples of Unicode conformance and non-conformance:

Software that correctly interprets only the
Devanagari subset of Unicode by displaying
the results in an appropriate font is
Unicode-conformant.
 
Likewise, a font that properly renders the
subset of Unicode corresponding to Latin-1
through Latin-6 is Unicode-conformant even
though it may display all other characters
as boxes.
 
On the other hand, a system that displays
any random 256-character subset of Unicode through the same Latin-1 font would definitely not conform to the Unicode Standard.

Because of size and other practical constraints, one could gather a group of complementary Unicode-conformant fonts into a 'family' of fonts, rather than packing all the data into a single font. Monotype's WorldType is a Unicode-conformant font and will support a character set well beyond Latin-1. It is reasonable to expect support for the following scripts/languages in a base-level Unicode-conformant font: Pan-European Latin, Cyrillic, Greek, Hebrew, and Arabic. Support for additional scripts can be added modularly as necessary. This approach leads us to naturally ask which scripts should be added, and in which order. Many factors must be taken into consideration in this regard, including the following:

bulletcustomer demand
bulletsize of population that utilizes the script
bulleteconomic, cultural, or technological importance of nations using that script
bulletexistence of related national and/or regional computer-related standards and the extent of their use.

Other criteria may apply to non-linguistic scripts such as mathematical or technical symbols.

For More Information:
To help determine your needs for multilingual and Unicode compliant fonts, or to receive printed literature on Monotype's Unicode font solutions, please contact the Monotype OEM Sales Department.

© Copyright 2007 Localisation Research Centre (LRC). All rights reserved.