Search:  
Gentoo Wiki

TIP_Convert_latin1_to_UTF-8_in_MySQL

latin1 to UTF-8 in MySQL

Since MySQL 4.1, UTF-8 is the default charset. If you have an old database, containing data encoded in latin1, and you want upgrade to a newer MySQL server, than you have to do the following:

MySQL dump

First of all, we need to dump the old data into a file.


Code: Create a MySQL dump
 $ mysqldump -h host.com --user=frog -p --default-character-set=latin1 -c \
 --insert-ignore --skip-set-charset dbname > dump.sql


Please mention, that you have to replace the user, the host and the dbname, otherwise it will result in an error :)


Convert dump

Next thing to do is, converting the characters in the MySQL dump from latin1 to UTF-8


Code: Convert dump
 $ iconv -f ISO-8859-1 -t UTF-8 dump.sql > dump_utf8.sql
 $ perl -pi -w -e 's/CHARSET=latin1/CHARSET=utf8/g;' dump_utf8.sql


If you have another source charset, you need to replace the -f option with your local character set.


Drop and create

Now it's time to drop the old database and create a new one with UTF-8 support.


Code: Drop and Create
 $ mysql --user=frog -p --execute="DROP DATABASE dbname; 
 CREATE DATABASE dbname CHARACTER SET utf8 COLLATE utf8_general_ci;"

(MySql seems to recommend utf8_unicode_ci over utf8_general_ci for 5.1+, see http://dev.mysql.com/doc/refman/5.1/de/charset-unicode-sets.html)

Import dump to databse

Last but not least, we need to import the converted data back to the new database.


Code: Import dump
 $ mysql --user=frog --max_allowed_packet=16M -p --default-character-set=utf8 dbname < dump_utf8.sql


The max_allowed_packet-option is sometimes important. If your import ends up with a "ERROR 1153 at line 42: Got a packet bigger than 'max_allowed_packet'", you need to increase the packet size. Please mention, that you also need to update /etc/mysql/my.cnf and set max_allowed_packet=16M under the [mysqld] directive

Retrieved from "http://www.gentoo-wiki.info/TIP_Convert_latin1_to_UTF-8_in_MySQL"

Last modified: Fri, 05 Sep 2008 05:49:00 +0000 Hits: 19,064