
Character Versus Glyph
As a result of the rapidly maturing electronic
publishing industry and its association with information
technology, older typographic terms have taken on new
meanings. For instance, it is now essential to
distinguish between a 'character' and a 'glyph.' A
character is an entity used in data interchange that only
generically specifies a particular symbol. More
specifically, when a character such as 'c' is
transmitted, the way it is displayed at the receiving end
is not strictly stipulated. It is simply sufficient that
the character is recognized as a 'c.' On the other hand,
a glyph is defined as the particular shape of a given
character as it is displayed. We further define a
character set as an ordered collection of characters,
while a font is an ordered collection of glyphs. What's
more, we refer to an ordered collection as an 'encoding.'
In general, it is understood that some interaction of
hardware and software will translate character codes into
glyphs. This interaction, which can differ substantially
from one system to another, is loosely categorized under
the term 'rendering process.' Though the boundaries of
this term may be nebulous, the rendering process
encompasses at least the following components:
- operating system
- locale and language settings
- keyboard and display software
- word processing software
- type rasterizer
- hardware for input and output.


Character Set or Font
Encoding?
Even though Latin-1 was devised as a character set,
the one-to-one matching of its characters to the
corresponding glyphs has allowed it to serve equally well
as a font encoding. The same holds true of other
standards in the ISO 8859 series: Latin-2, -3, -4, -5,
and even Latin-Russian and Latin-Greek. In all the listed
cases, the character set can serve equally well as a font
encoding. This convenient extrapolation from character
set to font fails dramatically, however, when applied to
ISO 8859-6, the Latin-Arabic standard. In this case,
characters specify generic Arabic letters without
referring to the particular graphic shape (the glyph)
they take when rendered on paper or on-screen. Therefore,
a font that conforms literally to the character set of
ISO 8859-6 would not adequately render the text encoded
in that standard because it would not have all the
necessary shapes for each Arabic letter. Similar problems
occur in the handling of Devanagari or Thai scripts.
Even in the case of Latin script alone, character sets
that allow a glyph to be composed from several elements,
for example:
would force us to treat character sets and font
encoding as distinct, though related, entities.
Unicode Design Principles
The issues mentioned above, among others, were very
much on the minds of Unicode's inventors since the early
stages of its inception. Following are some of Unicode's
design principles that are fundamental to any discussion
of fonts.
A. Unicode is a
character set for the basic interchange of plain text. It
contains no attributes regarding language, display
format, color, typeface, or any other details about
rendering. In this respect, Unicode-encoded text is
analogous to ASCII-encoded text.
B.
Unicode characters are made visible through a distinct
rendering process that maps characters to glyphs. While
this principle applies extensively to Semitic and Indic
scripts, it also remains applicable to Latin and other
scripts. For example, if a text contains the sequence of
characters
one would expect the word
to be rendered.
In this case, the single glyph
was represented by two successive characters
in the text.
C. In order to
avoid duplication of characters, Unicode encodes text by
script, not by language. For instance, the Latin A is
used without distinction for text in Catalan, English,
Indonesian, Swedish, or Swahili.
The fact that different languages and cultures may
prefer different display forms for particular letters is
relegated to the rendering process, which may have
further information about style, language, locale, and
other pertinent attributes.
To illustrate this important design principle,
consider that Devanagari script is used for Hindi,
Sanskrit, and Nepali; Hebrew script for Modern Hebrew,
Biblical Hebrew, Ladino, and Yiddish; Cyrillic script for
Russian, Ukrainian, Belorussian, Serbian, Bulgarian, and
even Azeri and Uzbek; Arabic script for Arabic, Farsi,
Urdu, Kurdish, and Ottoman Turkish; Greek script for
Modern Greek, Classical Greek, and Coptic.
As these examples show, any given script can represent
related and unrelated languages - or living and dead
languages - alike. With such diversity, there are bound
to be many differences in style. Nonetheless, those
differences should be handled by the rendering software,
not the underlying character code.
Mapping Standard to Font
Using the above principles, Unicode cannot be directly
used as a font encoding scheme. What we're really after
is a font - or set of fonts - that faithfully renders
Unicode-encoded text. In Unicode terminology, such fonts
would be called 'Unicode-conformant.' By definition, a
process is considered conformant if it can correctly
interpret and render a subset of Unicode without
misinterpreting or disturbing all the other subsets.
Unicode conformance in a font indicates correctness of
interpretation, but not necessarily breadth of coverage.
Claims of Unicode conformance can, therefore, be
misleading when there is no common understanding about
the number of scripts and/or languages supported.
Following are some examples of Unicode conformance and
non-conformance:
- Software that correctly interprets only the
- Devanagari subset of Unicode by displaying
- the results in an appropriate font is
- Unicode-conformant.
-
- Likewise, a font that properly renders the
- subset of Unicode corresponding to Latin-1
- through Latin-6 is Unicode-conformant even
- though it may display all other characters
- as boxes.
-
- On the other hand, a system that displays
- any random 256-character subset of Unicode
through the same Latin-1 font would definitely
not conform to the Unicode Standard.
Because of size and other practical constraints, one
could gather a group of complementary Unicode-conformant
fonts into a 'family' of fonts, rather than packing all
the data into a single font. Monotype's WorldType is a
Unicode-conformant font and will support a character set
well beyond Latin-1. It is reasonable to expect support
for the following scripts/languages in a base-level
Unicode-conformant font: Pan-European Latin, Cyrillic,
Greek, Hebrew, and Arabic. Support for additional scripts
can be added modularly as necessary. This approach leads
us to naturally ask which scripts should be added, and in
which order. Many factors must be taken into
consideration in this regard, including the following: