Cry MySQL How to...
Change the charset and collation order for a MySQL database
The default charset for MySQL is "Latin1", which is mostly sufficient for most Western European languages (and the default collation sequence is "latin1_swedish_ci").
I encountered an issue where a live MySQL database was using the default charset and collation, but problems were being encountered because it should have been created using the "utf8" charset and "utf8_general_ci" collation. For this reason these notes show how to set or change the charset to "utf8" and the collation to "utf8_general_ci" - if this is now what you want then you should be able to simply substitute instead your desired charset and collation order.
These notes cover how to change the charset and collation order for a single MySQL database. Unfortunately whilst changing the charset and collation order for a single table is straight forward, its a little more difficult to change it globally for an entire database but a script is provided for this.
These notes cover:
- How to see the default charset and collation order MySQL will use for a newly created table.
- How to set the default charset and collation order for MySQL.
- How to see what charset and collation order a given table is using.
- How to set the charset and collation order for a single table.
- How to set the charset and collation order for all tables.
How to see the default charset and collation order MySQL will use for a newly created table
To see the default charset that MySQL will use for newly created tables use:
show variables like "character_set_database";
To see the default collation sequence that MySQL will use for newly created tables use:
show variables like "collation_database";
How to set the default charset and collation order for MySQL
To explicity set the default charset and collation order when first creating a database use:
create database db_name character set utf8 collate utf8_general_ci;
For an existing database the default charset and collation order is changed using:
alter database db_name character set utf8 collate utf8_general_ci;
Be aware that changing the default charset and collation order only affects tables that are subsequently created. It does not affect any of the existing tables.
How to see what charset and collation order a given table is using
To see what charset and collation a given table is using, use:
select character_set_name, collation_name
from
information_schema.tables T
inner join
information_schema.collation_character_set_applicability C on (C.collation_name=T.table_collation)
where table_schema=Database() and table_name='table_name'
being sure to substitute the name of the table you are interested in for "table_name" in the above.
If you are interested in a table which is in a database other
than the current database then change: table_schema=Database()
to
table_schema='db-name'
in the above, substituting the name of the
database for 'db-name'.
Whilst its not used very often, be aware that each field can have its own charset and collation order. To view the charset and collation order of a table field use:
select character_set_name, collation_name from
information_schema.`COLUMNS` C
where table_schema = Database()
and table_name = 'table_name'
and column_name = 'field_name'
Be sure to substitute the name of the table and field in
the above. The above assumes the current database, so replace the
call to Database()
with the name of the database if the table is in
a different database.
How to set the charset and collation order for a single table
To change the character set and collation order for a single table use:
alter table db_name.table_name convert to character set utf8 collate utf8_general_ci
Needless to say, whilst this is fine for a table or two its not very practical converting a table at a time if you have a large number of tables!
How to set the charset and collation order for all tables
The following SQL creates a procedure to reset the charset and collation status for all tables that need it. The last line of the provided SQL calls the procedure.
-- Copyright (c) 2011 www.cryer.co.uk
-- Script is free to use provided this
copyright header is included.
drop procedure if exists
ResetCharsetAndCollation;
delimiter //
create procedure
ResetCharsetAndCollation(
IN dbName tinytext,
IN chrSet
tinytext,
IN colOdr tinytext)
begin
-- Cycle through all the
tables which don't have the
-- specified charset or collation
sequence.
DECLARE done int default 0;
DECLARE tableName
tinytext;
DECLARE table_cursor CURSOR FOR
select table_name
from information_schema.tables T
inner join
information_schema.collation_character_set_applicability C
on (C.collation_name=T.table_collation)
where table_schema=dbName
and (character_set_name<>chrSet or
collation_name<>colOdr);
DECLARE CONTINUE HANDLER FOR NOT FOUND
SET done=1;
OPEN table_cursor;
FETCH table_cursor INTO
tableName;
while not done do
--
Reset the charset and collation sequence for this table.
set @ddl=CONCAT('alter
table ',dbName,'.',tableName,
' convert to character set ',chrSet,
' collate ',colOdr);
prepare stmt from @ddl;
execute stmt;
deallocate prepare stmt;
FETCH table_cursor INTO tableName;
end while;
CLOSE table_cursor;
end;
//
delimiter ;
call ResetCharsetAndCollation(Database(),'utf8','utf8_general_ci');
Change the last line to specify a different charset or collation order.
Be aware that this may take a while to run, depending on the number (and size) of tables to be updated.
This SQL does not change the default charset and collation order for the database, so you might want to do that as a separate step.
These notes have been tested against MySQL version 5.1.
About the author: Brian Cryer is a dedicated software developer and webmaster. For his day job he develops websites and desktop applications as well as providing IT services. He moonlights as a technical author and consultant.