A computer text-processing system inputs keystrokes and outputs
glyphs, small pictures that are assembled on paper or on a
computer screen. Keystrokes and glyphs do not, in general, coincide:
for example, if the system does generate ligatures, then to the two
i> will typically correspond a
single glyph. Similarly, if the system shapes Arabic glyphs in a
reasonable manner, then multiple different glyphs may correspond to
a single keystroke.
The complex transformation rules from keystrokes to glyphs are usually factored into two simpler transformations, going through the intermediary of characters. You may want to think of characters as the basic unit of data that is stored e.g. in the buffer of your text editor. While the definition of a character is intrinsically application-specific, a number of standardised collections of characters have been defined.
A coded character set is a set of characters together with a mapping from integer codes --- known as codepoints --- to characters. Examples of coded character sets include US-ASCII, ISO 8859-1, KOI8-R, and JIS X 0208(1990).
A coded character set need not use 8 bit integers to index characters. Many early mainframes used 6 bit character sets, while 16 bit (or more) character sets are necessary for ideographic writing systems.
Traditionally, typographers speak about typefaces and founts. A typeface is a particular style or design, such as Times Italic, while a fount is a molten-lead incarnation of a given typeface at a given size.
Digital fonts come in font files. A font file contains all the information necessary for generating glyphs of a given typeface, and applications using font files may access glyph information in an arbitrary order.
Digital fonts may consist of bitmap data, in which case they are said to be bitmap fonts. They may also consist of a mathematical description of glyph shapes, in which case they are said to be scalable fonts. Common formats for scalable font files are Type 1 (sometimes incorrectly called ATM fonts or PostScript fonts), Speedo and TrueType.
The glyph data in a digital font needs to be indexed somehow. How this is done depends on the font file format. In the case of Type 1 fonts, glyphs are identified by glyph names. In the case of TrueType fonts, glyphs are indexed by integers corresponding to one of a number of indexing schemes (usually Unicode --- see below).
The X11 system uses the data in font file to generate font instances, which are collections of glyphs at a given size indexed according to a given encoding.
X11 font instances are usually specified using a notation known as the
X Logical Font Description (XLFD). An XLFD starts with a dash
-', and consists of fourteen fields separated by dashes, for
Or particular interest are the last two fields `
iso8859-1', which specify the font instance's encoding.
X11 font instances may also be specified by short name. Unlike an
XLFD, a short name has no structure and is simply a conventional name
for a font instance. Two short names are of particular interest, as
they are handled specially by the server, and the server will not
start if font instances with these names cannot be opened. These are
fixed', which specifies the fallback font to use when the
requested font cannot be opened, and `
cursor', which specifies
the set of glyphs to be used by the mouse pointer.
Short names are usually implemented as aliases to XLFDs; the
fixed' and `
cursor' aliases are defined in
Unicode (http://www.unicode.org) is a coded character set with the goal of uniquely identifying all characters for all scripts, current and historical. While Unicode was explicitly not designed as a glyph encoding scheme, it is often possible to use it as such.
Unicode is an open character set, meaning that codepoint assignments may be added to Unicode at any time (once specified, though, an assignment can never be changed). For this reason, a Unicode font will be sparse, and only define glyphs for a subset of the character registry of Unicode.
The Unicode standard is defined in parallel with the international standard ISO 10646. Assignments in the two standards are always equivalent, and this document uses the terms Unicode and ISO 10646 interchangeably.
When used in X11, Unicode-encoded fonts should have the last two
fields of their XLFD set to `