Gentoo Wiki




There are many reasons to build a UTF-8-enabled system. Perhaps reason enough is that UTF-8 is the encoding of the future -- everybody is switching to UTF-8, and for good reason: with UTF-8, it's possible to represent truly every language in the world in a single charset. Operating systems are beginning to standardize on UTF-8 -- all versions of Mac OS X use UTF-8 internally, for example. To compatibly share text files with Mac OS X, you need to be using UTF-8, and we expect other operating systems to follow suit.

English-speaking users may not immediately see the benefit of UTF-8, as they may think they have no need for the vast array of special characters UTF-8 offers. But in most every other language, special characters are used frequently. And in today's highly international world, you may very well want to be able to represent, say, a Japanese word -- a song title, perhaps, or the name of a television show or movie. Or maybe you'd like for a person whose name contains non-Latin characters -- perhaps a Polish or Russian name -- to be able to represent their name properly. Just a thought. Even those using purely the English language will benefit from switching to UTF-8, since with it you can represent many additional wonderful and useful typographical symbols that are not available in the ASCII or ISO-8859-1 charsets.

The locales a user can choose from are built by the glibc. Usually all available locales starting from aa_DJ (Afar locale for Djibouti) over en_US (English locale for the USA) to zu_ZA.utf8 (Zulu locale for South Africa) will be installed. Unless you're working at the UN and administer a central server for all member states, it is difficult to conceive why you would need a system where all of these locales are installed.

Preparing the locales

As first step, you must know exactly which locales you need to use. All available locales can be found at /usr/share/i18n/locales and /usr/share/i18n/SUPPORTED.

Usually, the locale is the code of the language (e.g. en), followed by an underscore (_) and by the capital code of the country (US). For example en_US.

For European countries which uses new euro currency, the @euro string must be appended to the locale ID. If I don't say it explicitly, when I say locale ID, I intend the one without the euro specification.

For European countries which use euro currency, you should add two different locales, first the one without the euro specification, and then the one with it. For countries using the euro currency, the UTF-8 charmap should support the EUR symbol out of the box, there should be no need for it to be separately defined. This is not true for non-UTF-8 charmaps however, as shown in the example above for the de_DE@euro definition.

File: /etc/locale.gen
# /etc/locale.gen: list all of the locales you want to have on your system
# The format of each line:
# <locale> <charmap>
# Where <locale> is a locale located in /usr/share/i18n/locales/ and
# where <charmap> is a charmap located in /usr/share/i18n/charmaps/.
# All blank lines and lines starting with # are ignored.
# For the default list of supported combinations, see the file:
# /usr/share/i18n/SUPPORTED
# Whenever glibc is emerged, the locales listed here will be automatically
# rebuilt for you.  After updating this file, you can simply run `locale-gen`
# yourself instead of re-emerging glibc.

en_US ISO-8859-1
en_US.UTF-8 UTF-8

en_GB ISO-8859-1
en_GB.UTF-8 UTF-8

de_DE ISO-8859-1
de_DE@euro ISO-8859-15
de_DE.UTF-8 UTF-8

Create the locales

The system locales come with the glibc package. By default almost all possible locales are installed, though you can choose to install only the locales you need. You can delete /etc/ after upgrading to the latest version of glibc.

Whenever glibc is emerged, the locales listed here will be automatically rebuilt for you. After updating this file, you can simply run locale-gen yourself instead of re-emerging glibc.

An alternative to locale-gen is to manually create the *.UTF-8 locales. This can be done by using localedef.

English (US): localedef -c -f UTF-8 -i en_US en_US.UTF-8
English (GB): localedef -c -f UTF-8 -i en_US en_GB.UTF-8
German (DE): localedef -c -f UTF-8 -i de_DE de_DE.UTF-8

locale -a will give you a list of all installed locales. Note that in the output, utf8 will be weighted like as UTF-8.

At this point you can add userlocales USE flag to global flags (or to sys-libs/glibc specific flags), and re-emerge glibc.

Setting up locales

To get a list of all UTF-8 supported locales, check the output of:

 grep UTF-8 /usr/share/i18n/SUPPORTED

Lines in /usr/share/i18n/SUPPORTED have the format:

 <locale> <charmap>

You only need the <locale> part for your /etc/env.d/02locale.

Now you need to modify your LANG and LC_ALL variables in /etc/env.d/02locale for specifying the locales to use. Note that /etc/env.d/02locale is case-sensitive.

File: /etc/env.d/02locale

In this case the language is set as for the locales (numeric values, currencies, and so on) to the German one.

Please note that you must use the same charset encoding for LANG and LC_* values.

You can't use LC_ALL, because it will overwrite LC_MONETARY, no matter if LC_MONETARY is specified after or before LC_ALL. But some programs complain (see [1]) if LC_ALL isn't set.

For European countries with euro currency, you must change this. Theoretically you should use something like it_IT.UTF-8@euro, but doing this makes X complain. To workaround this, you can use this:


This way you use the added euro support only for monetary data, which is what was changed, and X will not complain because is unable to find out the locale.

If you only want to use the German charset (for umlauts and special characters) but want all messages displayed in English, you can change /etc/env.d/02locale as follows:

File: /etc/env.d/02locale

Now you should run env-update && source /etc/profile, or restart the system (or logout-login and restart every service in the system), and you'll be using UTF-8 for everything. Then logout of your X session, restart X with Ctrl-Alt-Backspace, and log back into X again.

Setting the beginning of the week

When I look at a calendar I like the weeks to start on a Monday. GNOME has a neat feature that if you click on the Clock Applet in the Panel it shows the current month's calendar with all your events from Evolution but it shows the week starting on Saturday and there is no option in the preferences to make it start on a different day.

Why does the Clock Applet have no setting for "First day of the Week"?

There did used to be a setting in the Clock Applet, but some releases ago it was removed, and the applet was modified to pick up the user's preference from the system-wide locale setting. Let us pass over the wisdom of this decision, it happened. Too bad. The unfortunate fact is that the default locale setting has the first day of the week as a Saturday, apparently due to a bug in the glibc supplied locale files (see more below).

The way to change the first day of the week in the Clock's calendar view is therefore to set the correct system locale. Too bad if you don't like the "correct" choice!

Why is the Clock Applet still not right?

After you do that you can check what locale you are running with the "locale" command. Assuming it says what you set it to above, then everything has worked fine. However, you will find that the Clock's calendar view now shows weeks starting on a Sunday. This is a marginal improvement on starting on a Saturday, and if you like them like that well good for you, and you can stop now. If, like me, you emphatically hate them like that, read on.

After some research it became clear that the poor old Gnome Clock Applet was not at fault any more, but that the default en_GB locale file supplied with glibc really does set the first day of the week to be a Sunday. Lets explore how it does that, then see how to change it.

LC_TIME settings

One of the sub-sections of your locale is for settings that affect "strftime" and other utilities that format dates and times. This subsection is called LC_TIME. If you do locale -k LC_TIME on your system you will see what LC_TIME controls. For our purposes the following settings are the important ones:

Code: locale -k LC_TIME
   day           "Sunday; Monday; Tuesday; Wednesday; Thursday; Friday; Saturday"
   % below: ndays;1stday;1stweek
   week           7;19971130;4
   first_weekday  1

This gives the names of the seven days of the week, says (redundantly) that there are seven of them, and then that the first one listed corresponds to the name of the day of the week of 1997-11-30 (which was a Sunday). Finally and crucially, the first day of the week is defined to be the first entry in the list of day names given.

For the UK, this is wrong! In the UK weeks start on Monday, following the ISO calendar standards.

There are two ways to express what it should be: either rearrange the day names or set first_weekday to 2. Either of these will do the trick.

  day           "Monday; Tuesday; Wednesday; Thursday; Friday; Saturday; Sunday"
  % below: ndays;1stday;1stweek
  week           7;19971201;4
  first_weekday  1
  day           "Sunday; Monday; Tuesday; Wednesday; Thursday; Friday; Saturday"
  % below: ndays;1stday;1stweek
  week           7;19971130;4
  first_weekday  2

but the second one involves less editing so we'll use that. Now, how do you make this change? Read on.

Note: The 'cal' command does not seem to take this into account (week start may be forced with -m or -s).

Create the locale

The tool you need is the rather nasty-to-use "localedef". Proceed as follows.

cd ~ # go to your home dir or somewhere else you can write a temp file
cp /usr/share/i18n/locales/en_GB en_GB_modified

Then edit "en_GB_modified". The locale (5) man page explains the format of the file.

1. Update the identification section if you like 2. Update all the sections except LC_TIME to be like this:

copy "en_GB"

etc. This ensures that you don't lose anything in the other sections.

3. Add the following lines to the LC_TIME section

week            7;19971130;5
first_weekday   2
first_workday   2

This assumes that you are going to leave the list of day names alone. The "5" on the end of the week line says that the first week of the year is the one containing the first occurence of the fifth day in the list (Thursday), this is the ISO rule. The other two lines set the first day of the week to the second day in the list (Monday) and incidentally the first working day to be Monday too.

4. Create a new directory for the output of locale def in your home directory.

cd ~ && mkdir locale 

5. Run

localedef -c -i en_GB_modified -f UTF-8 locale/en_GB.utf8@iso

6. (as root) move the resulting "en_GB.utf8@iso" directory from your own "locale" directory to "/usr/lib/locale"

cp -r locale/en_GB.utf8\@iso /usr/lib/locale/

6. (still as root) Update /etc/env.d/02locale to point at the new "en_GB.UTF8@iso" locale that you have created.


and then run

env-update && source /etc/profile

Then logout of your X session, restart X with Ctrl-Alt-Backspace, and log back into X again. If you are lucky then everything is now working, and your Gnome Clock Applet shows weeks starting on Monday.


There are always a few issues. Currently there are a few unknowns and problems with all this:

Gdk-WARNING **: locale not supported by Xlib
Gdk-WARNING **: cannot set locale modifiers

These are irritating and potentially harmful. To avoid this effect you can create your modified locale with standard name. You can do this with:

localedef -c -i en_GB_modified -f UTF-8 locale/en_GB.utf8

or (if you don't want Unicode)

localedef -c -i en_GB_modified -f ISO-8859-1 locale/en_GB

Additional tools


An interesting tool is app-admin/localepurge that cleans out any installed man-page or info-file in languages you do not need on your system. You should read the man-page to localepurge in any case and configure languages you intend to keep in /etc/locale.nopurge. Please read the man-pages before running localepurge.


glibc 2.3.6 r4 and above

Since glibc-2.3.6-r4 the uselocales flag has been removed and /etc/ has been replaced with /etc/locale.gen. All you need to do is simply specify which locales you want in /etc/locale.gen and then run locale-gen.

The format is basically the same as, except the / is replaced with a space. The following example shows a British system with the ISO-8859-1 and UTF-8 character sets enabled.

File: /etc/locale.gen
en_GB ISO-8859-1
en_GB.UTF-8 UTF-8

If you are using glibc <2.3.6 r4, add the userlocales USE flag to the glibc package. You can do this by adding it to /etc/portage/package.use with: echo "sys-libs/glibc userlocales" >> /etc/portage/package.use

Now specify the locales you want to be able to use in /etc/ as in the example below.

File: /etc/

The entries in the file are, like the header of the file suggests, in this format:


<locale> is a locale from the /usr/share/i18n/locales directory and <charmap> is the name of one of the files in /usr/share/i18n/charmaps/.


Undefined locales

If you need to install a locale not yet defined in /usr/share/i18n/locales, please refer to the Gentoo official UTF-8 documentation.

See also

For further information about locale-handling read the Gentoo Linux Localization Guide.

Retrieved from ""

Last modified: Sun, 07 Sep 2008 02:20:00 +0000 Hits: 21,615