nls(7)
- NetBSD Manual Pages
NLS(7) NetBSD Miscellaneous Information Manual NLS(7)
NAME
NLS -- Native Language Support Overview
DESCRIPTION
Native Language Support (NLS) provides commands for a single worldwide
operating system base. An internationalized system has no built-in
assumptions or dependencies on language-specific or cultural-specific
conventions such as:
· Character classifications
· Character comparison rules
· Character collation order
· Numeric and monetary formatting
· Date and time formatting
· Message-text language
· Character sets
All information pertaining to cultural conventions and language is
obtained at program run time.
``Internationalization'' (often abbreviated ``i18n'') refers to the oper-
ation by which system software is developed to support multiple cultural-
specific and language-specific conventions. This is a generalization
process by which the system is untied from calling only English strings
or other English-specific conventions. ``Localization'' (often abbrevi-
ated ``l10n'') refers to the operations by which the user environment is
customized to handle its input and output appropriate for specific lan-
guage and cultural conventions. This is a specialization process, by
which generic methods already implemented in an internationalized system
are used in specific ways. The formal description of cultural conven-
tions for some country, together with all associated translations tar-
geted to the native language, is called the ``locale''.
NetBSD provides extensive support to programmers and system developers to
enable internationalized software to be developed. NetBSD also supplies
a large variety of locales for system localization.
Localization of Information
All locale information is accessible to programs at run time so that data
is processed and displayed correctly for specific cultural conventions
and language.
A locale is divided into categories. A category is a group of language-
specific and culture-specific conventions as outlined in the list above.
ISO C specifies the following six standard categories supported by
NetBSD:
LC_COLLATE string-collation order information
LC_CTYPE character classification, case conversion, and other char-
acter attributes
LC_MESSAGES the format for affirmative and negative responses
LC_MONETARY rules and symbols for formatting monetary numeric informa-
tion
LC_NUMERIC rules and symbols for formatting nonmonetary numeric
information
LC_TIME rules and symbols for formatting time and date information
Localization of the system is achieved by setting appropriate values in
environment variables to identify which locale should be used. The envi-
ronment variables have the same names as their respective locale cate-
gories. Additionally, the LANG, LC_ALL, and NLSPATH environment vari-
ables are used. The NLSPATH environment variable specifies a colon-sepa-
rated list of directory names where the message catalog files of the NLS
database are located. The LC_ALL and LANG environment variables also
determine the current locale.
The values of these environment variables contains a string format as:
language[_territory][.codeset][@modifier]
Valid values for the language field come from the ISO639 standard which
defines two-character codes for many languages. Some common language
codes are:
Language Name Code Language Family
ABKHAZIAN AB IBERO-CAUCASIAN
AFAN (OROMO) OM HAMITIC
AFAR AA HAMITIC
AFRIKAANS AF GERMANIC
ALBANIAN SQ INDO-EUROPEAN (OTHER)
AMHARIC AM SEMITIC
ARABIC AR SEMITIC
ARMENIAN HY INDO-EUROPEAN (OTHER)
ASSAMESE AS INDIAN
AYMARA AY AMERINDIAN
AZERBAIJANI AZ TURKIC/ALTAIC
BASHKIR BA TURKIC/ALTAIC
BASQUE EU BASQUE
BENGALI BN INDIAN
BHUTANI DZ ASIAN
BIHARI BH INDIAN
BISLAMA BI
BRETON BR CELTIC
BULGARIAN BG SLAVIC
BURMESE MY ASIAN
BYELORUSSIAN BE SLAVIC
CAMBODIAN KM ASIAN
CATALAN CA ROMANCE
CHINESE ZH ASIAN
CORSICAN CO ROMANCE
CROATIAN HR SLAVIC
CZECH CS SLAVIC
DANISH DA GERMANIC
DUTCH NL GERMANIC
ENGLISH EN GERMANIC
ESPERANTO EO INTERNATIONAL AUX.
ESTONIAN ET FINNO-UGRIC
FAROESE FO GERMANIC
FIJI FJ OCEANIC/INDONESIAN
FINNISH FI FINNO-UGRIC
FRENCH FR ROMANCE
FRISIAN FY GERMANIC
GALICIAN GL ROMANCE
GEORGIAN KA IBERO-CAUCASIAN
GERMAN DE GERMANIC
GREEK EL LATIN/GREEK
GREENLANDIC KL ESKIMO
GUARANI GN AMERINDIAN
GUJARATI GU INDIAN
HAUSA HA NEGRO-AFRICAN
HEBREW HE SEMITIC
HINDI HI INDIAN
HUNGARIAN HU FINNO-UGRIC
ICELANDIC IS GERMANIC
INDONESIAN ID OCEANIC/INDONESIAN
INTERLINGUA IA INTERNATIONAL AUX.
INTERLINGUE IE INTERNATIONAL AUX.
INUKTITUT IU
INUPIAK IK ESKIMO
IRISH GA CELTIC
ITALIAN IT ROMANCE
JAPANESE JA ASIAN
JAVANESE JV OCEANIC/INDONESIAN
KANNADA KN DRAVIDIAN
KASHMIRI KS INDIAN
KAZAKH KK TURKIC/ALTAIC
KINYARWANDA RW NEGRO-AFRICAN
KIRGHIZ KY TURKIC/ALTAIC
KURUNDI RN NEGRO-AFRICAN
KOREAN KO ASIAN
KURDISH KU IRANIAN
LAOTHIAN LO ASIAN
LATIN LA LATIN/GREEK
LATVIAN LV BALTIC
LINGALA LN NEGRO-AFRICAN
LITHUANIAN LT BALTIC
MACEDONIAN MK SLAVIC
MALAGASY MG OCEANIC/INDONESIAN
MALAY MS OCEANIC/INDONESIAN
MALAYALAM ML DRAVIDIAN
MALTESE MT SEMITIC
MAORI MI OCEANIC/INDONESIAN
MARATHI MR INDIAN
MOLDAVIAN MO ROMANCE
MONGOLIAN MN
NAURU NA
NEPALI NE INDIAN
NORWEGIAN NO GERMANIC
OCCITAN OC ROMANCE
ORIYA OR INDIAN
PASHTO PS IRANIAN
PERSIAN (farsi) FA IRANIAN
POLISH PL SLAVIC
PORTUGUESE PT ROMANCE
PUNJABI PA INDIAN
QUECHUA QU AMERINDIAN
RHAETO-ROMANCE RM ROMANCE
ROMANIAN RO ROMANCE
RUSSIAN RU SLAVIC
SAMOAN SM OCEANIC/INDONESIAN
SANGHO SG NEGRO-AFRICAN
SANSKRIT SA INDIAN
SCOTS GAELIC GD CELTIC
SERBIAN SR SLAVIC
SERBO-CROATIAN SH SLAVIC
SESOTHO ST NEGRO-AFRICAN
SETSWANA TN NEGRO-AFRICAN
SHONA SN NEGRO-AFRICAN
SINDHI SD INDIAN
SINGHALESE SI INDIAN
SISWATI SS NEGRO-AFRICAN
SLOVAK SK SLAVIC
SLOVENIAN SL SLAVIC
SOMALI SO HAMITIC
SPANISH ES ROMANCE
SUNDANESE SU OCEANIC/INDONESIAN
SWAHILI SW NEGRO-AFRICAN
SWEDISH SV GERMANIC
TAGALOG TL OCEANIC/INDONESIAN
TAJIK TG IRANIAN
TAMIL TA DRAVIDIAN
TATAR TT TURKIC/ALTAIC
TELUGU TE DRAVIDIAN
THAI TH ASIAN
TIBETAN BO ASIAN
TIGRINYA TI SEMITIC
TONGA TO OCEANIC/INDONESIAN
TSONGA TS NEGRO-AFRICAN
TURKISH TR TURKIC/ALTAIC
TURKMEN TK TURKIC/ALTAIC
TWI TW NEGRO-AFRICAN
UIGUR UG
UKRAINIAN UK SLAVIC
URDU UR INDIAN
UZBEK UZ TURKIC/ALTAIC
VIETNAMESE VI ASIAN
VOLAPUK VO INTERNATIONAL AUX.
WELSH CY CELTIC
WOLOF WO NEGRO-AFRICAN
XHOSA XH NEGRO-AFRICAN
YIDDISH YI GERMANIC
YORUBA YO NEGRO-AFRICAN
ZHUANG ZA
ZULU ZU NEGRO-AFRICAN
For example, the locale for the Danish language spoken in Denmark using
the ISO 8859-1 character set is da_DK.ISO8859-1. The da stands for the
Danish language and the DK stands for Denmark. The short form of da_DK
is sufficient to indicate this locale.
The environment variable settings are queried by their priority level in
the following manner:
· If the LC_ALL environment variable is set, all six categories use the
locale it specifies.
· If the LC_ALL environment variable is not set, each individual cate-
gory uses the locale specified by its corresponding environment vari-
able.
· If the LC_ALL environment variable is not set, and a value for a par-
ticular LC_* environment variable is not set, the value of the LANG
environment variable specifies the default locale for all categories.
Only the LANG environment variable should be set in /etc/profile,
since it makes it most easy for the user to override the system
default using the individual LC_* variables.
· If the LC_ALL environment variable is not set, a value for a particu-
lar LC_* environment variable is not set, and the value of the LANG
environment variable is not set, the locale for that specific cate-
gory defaults to the C locale. The C or POSIX locale assumes the
ASCII character set and defines information for the six categories.
Character Sets
A character is any symbol used for the organization, control, or repre-
sentation of data. A group of such symbols used to describe a particular
language make up a character set. It is the encoding values in a charac-
ter set that provide the interface between the system and its input and
output devices.
The following character sets are supported in NetBSD:
ASCII The American Standard Code for Information Exchange
(ASCII) standard specifies 128 Roman characters and con-
trol codes, encoded in a 7-bit character encoding
scheme.
ISO 8859 family Industry-standard character sets specified by the
ISO/IEC 8859 standard. The standard is divided into 15
numbered parts, with each part specifying broad script
similarities. Examples include Western European, Cen-
tral European, Arabic, Cyrillic, Hebrew, Greek, and
Turkish. The character sets use an 8-bit character
encoding scheme which is compatible with the ASCII char-
acter set.
Unicode The Unicode character set is the full set of known
abstract characters of all real-world scripts. It can
be used in environments where multiple scripts must be
processed simultaneously. Unicode is compatible with
ISO 8859-1 (Western European) and ASCII. Many character
encoding schemes are available for Unicode, including
UTF-8, UTF-16 and UTF-32. These encoding schemes are
multi-byte encodings. The UTF-8 encoding scheme uses
8-bit, variable-width encodings which is compatible with
ASCII. The UTF-16 encoding scheme uses 16-bit, vari-
able-width encodings. The UTF-32 encoding scheme using
32-bit, fixed-width encodings.
Font Sets
A font set contains the glyphs to be displayed on the screen for a corre-
sponding character in a character set. A display must support a suitable
font to display a character set. If suitable fonts are available to the
X server, then X clients can include support for different character
sets. xterm(1) includes support for Unicode with UTF-8 encoding. xfd(1)
is useful for displaying all the characters in an X font.
The NetBSD wscons(4) console provides support for loading fonts using the
wsfontload(8) utility. Currently, only fonts for the ISO8859-1 family of
character sets are supported.
Internationalization for Programmers
To facilitate translations of messages into various languages and to make
the translated messages available to the program based on a user's
locale, it is necessary to keep messages separate from the programs and
provide them in the form of message catalogs that a program can access at
run time.
Access to locale information is provided through the setlocale(3) and
nl_langinfo(3) interfaces. See their respective man pages for further
information.
Message source files containing application messages are created by the
programmer and converted to message catalogs. These catalogs are used by
the application to retrieve and display messages, as needed.
NetBSD supports two message catalog interfaces: the X/Open catgets(3)
interface and the Uniforum gettext(3) interface. The catgets(3) inter-
face has the advantage that it belongs to a standard which is well sup-
ported. Unfortunately the interface is complicated to use and mainte-
nance of the catalogs is difficult. The implementation also doesn't sup-
port different character sets. The gettext(3) interface has not been
standardized yet, however it is being supported by an increasing number
of systems. It also provides many additional tools which make program-
ming and catalog maintenance much easier.
Support for Multi-byte Encodings
Some character sets with multi-byte encodings may be difficult to decode,
or may contain state (i.e., adjacent characters are dependent). ISO C
specifies a set of functions using 'wide characters' which can handle
multi-byte encodings properly. The behaviour of these functions is
affected by the LC_CTYPE category of the current locale.
A wide character is specified in ISO C as being a fixed number of bits
wide and is stateless. There are two types for wide characters: wchar_t
and wint_t. wchar_t is a type which can contain one wide character and
operates like 'char' type does for one character. wint_t can contain one
wide character or WEOF (wide EOF).
There are functions that operate on wchar_t, and substitute for functions
operating on 'char'. See wmemchr(3) and towlower(3) for details. There
are some additional functions that operate on wchar_t. See wctype(3) and
wctrans(3) for details.
Wide characters should be used for all I/O processing which may rely on
locale-specific strings. The two primary issues requiring special use of
wide characters are:
· All I/O is performed using multibyte characters. Input data is
converted into wide characters immediately after reading and
data for output is converted from wide characters to multi-byte
encoding immediately before writing. Conversion is controlled
by the mbstowcs(3), mbsrtowcs(3), wcstombs(3), wcsrtombs(3),
mblen(3), mbrlen(3), and mbsinit(3).
· Wide characters are used directly for I/O, using getwchar(3),
fgetwc(3), getwc(3), ungetwc(3), fgetws(3), putwchar(3),
fputwc(3), putwc(3), and fputws(3). They are also used for
formatted I/O functions for wide characters such as fwscanf(3),
wscanf(3), swscanf(3), fwprintf(3), wprintf(3), swprintf(3),
vfwprintf(3), vwprintf(3), and vswprintf(3), and wide character
identifier of %lc, %C, %ls, %S for conventional formatted I/O
functions.
SEE ALSO
gencat(1), xfd(1), xterm(1), catgets(3), gettext(3), nl_langinfo(3),
setlocale(3), wsfontload(8)
BUGS
This man page is incomplete.
NetBSD 9.2 February 21, 2007 NetBSD 9.2
Powered by man-cgi (2021-06-01).
Maintained for NetBSD
by Kimmo Suominen.
Based on man-cgi by Panagiotis Christias.