UpPreviousNext

Character Sets and Code Sets

The practical problem with I18N arises from the different representation methods used by languages to build their respective linguistic elements. Most especially, ideographic languages, which can contain literally thousands of individual glyphs, cannot be fully represented using standard 8-bit code sets.

Character set

A character set is the set of all the characters required to represent the words in the language.

Code set

A code set is the set of binary values required to represent the character set of a language.

ISO8859-1/ Latin-1

ISO8859-1 (also called Latin-1), which contains the ASCII codes, is the standard code set for the representation of English, as well as several other alphabetic languages.

Code sets correspond to various languages or areas. For example:

Language or Area

Code Set

English, Western Europe

ISO8859-1

Eastern Europe

ISO8859-2

Northern Europe

ISO8859-3

Cyrillic

ISO8859-5

Hebrew

ISO8859-6

Greek

ISO8859-7,8,9

Japan

JIS X 0201

JIS X 0208

JIS X 0212

Korea

KSC5601.1987-0

UpPreviousNext