Wednesday, 28 October 2009

UTF-8 hell

Localisation is a common requirement. So why is it so nightmarish to actually get it all working? I'm developing an app with Spring MVC and MySQL via Hibernate. You'd think that this should be quite straightforward, right? Think again. I've not got time to lay this all out nicely (as usual), but here's a quick list of things you need to do:

  1. First, make sure your tables are created with UTF-8 encoding. e.g.
    CREATE TABLE `aTable` (
    `id` int(11) NOT NULL AUTO_INCREMENT,
    `name` varchar(100) DEFAULT NULL,
    `value` varchar(255) DEFAULT NULL,
    `category` varchar(45) DEFAULT NULL,
    `description` text,
    PRIMARY KEY (`id`),
    UNIQUE KEY `ind_setting_key` (`name`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8
  2. Ensure your web pages are served as UTF-8
    <meta http-equiv="Content-Type" content="text/html;
    charset=UTF-8"
    />
  3. Set content type in JSPs:
    <%@ page language="java" contentType="text/html; charset=UTF-8"     pageEncoding="UTF-8"%>

  4. Add servlet filter to web.xml to set character encoding of request:
      <filter>
    <filter-name>charsetFilter</filter-name>
    <filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>
    <init-param>
    <param-name>encoding</param-name>
    <param-value>UTF-8</param-value>
    </init-param>
    </filter>

    <filter-mapping>
    <filter-name>charsetFilter</filter-name>
    <url-pattern>/*</url-pattern>
    </filter-mapping>
  5. Ensure your forms submit UTF-8:
    <form action="targetpage.html" method="post" 
    enctype
    ="application/x-www-form-urlencoded" accept-charset="UTF-8">
  6. Set JDBC connection character set:
    jdbc:mysql://localhost:3306/db?characterEncoding=utf8
  7. When you read from the DB, issue SET NAMES 'utf8'; or you'll see garbled characters.
I think that's it...

No comments:

Post a Comment