Pablo Viquez Blog

Mi vida y cosas relacionadas

Skip to: Content | Sidebar | Footer

How to use json_encode with ISO-8859-1 data – Part2

31 July, 2009 (12:04) | iso 8859-1, PHP, utf-8, Web Development | By: Pablo Viquez

To continue with the previous post.

Problem: Function json_encode does not support ISO-8859-1 encoded data.

One solution that I did, in order to preserve the character set was to encode the
data before using the json_encode function to use just A-Z, a-z and 0-9 characters,
instead of sending text with accents or symbols.

One encoding that fits perfectly in this schema, is Base64 Content-Transfer-Encoding. (see base 64 explanation below)

This leads me to the solution: Encode the ISO-8859-1 using base64 and decoded in the client using JavaScript.

Now, this lead me to another issue, what’s the algorithim to implement the base64_decode function in JavaScript?

Luckly enough, I was looking at the site: http://phpjs.org/ which ported most of the PHP functions to JavaScript and problem solved!

Download the demo files here

Solution

So my solution will look like this (using the demo files from my previous post):

prueba_ansi.php

<?php
$id =
    base64_encode(
        utf8_encode('iso-8859-1'));
$name =
    base64_encode(
        utf8_encode('Pablo Víquez - This is an ISO-8859-1 encoded text'));
$notes =
    base64_encode(
        utf8_encode('Test with JSON encoding. á é í ó ú, ñ, Ñ.'));

$customer =
    array(
        'id'    => $id,
        'name'  => $name,
        'notes' => $notes
);

echo json_encode($customer);

As you can see, I used 2 functions to encode the data: base64_encode and utf8_encode.

I used these 2, mostly because the JavaScript implementation of base64_decode asumes
that the data is UTF-8 encoded, by doing base64_encode(utf8_encode(‘iso-8859-1′))
I’m acctually first encoding the data to be UTF-8 first and then encoding it using Base64.

base64_encode
Encodes data with MIME base64, and has the following description:

string base64_encode ( string $data )

utf8_encode
Encodes an ISO-8859-1 string to UTF-8, and has the following description:

string utf8_encode  ( string $data  )

Client Side

I used JQuery to retrieve the JSON data, and the phpjs.org implementation for the
base64_decode and utf8_decode functions.

index.html

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>JSON Consumer</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<script type="text/javascript" src="http://jqueryjs.googlecode.com/files/jquery-1.3.2.min.js"></script>
<script type="text/javascript">
$(document).ready(function(){
    $("#query_data_iso").click(
        function(){
            getIsoData();
        }
    );

    $("#query_data_utf8").click(
        function(){
            getUtfData();
        }
    );
});

function getUtfData() {
    $.ajax({
        type:       "GET",
        dataType:   "json",
        url:        "prueba_utf8.php",
        success:    function(response){
            $("#customer_data_id").html(response.id);
            $("#customer_data_name").html(response.name);
            $("#customer_data_notes").html(response.notes);
        },
    });
}

/**
 * Gets a JSON response from the server.
 * Expectes a base64 encoded response with: id, name and notes all base64 encoded.
 */
function getIsoData() {
    $.ajax({
        type:       "GET",
        dataType:   "json",
        url:        "prueba_ansi.php",
        success:    function(response){
            $("#customer_data_id").html(base64_decode(response.id));
            $("#customer_data_name").html(base64_decode(response.name));
            $("#customer_data_notes").html(base64_decode(response.notes));
        },
    });
}

/**
 * Decodes string using MIME base64 algorithm
 * @see http://phpjs.org/functions/base64_decode
 */
function base64_decode( data ) {
    var b64 = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=";
    var o1, o2, o3, h1, h2, h3, h4, bits, i = 0, ac = 0, dec = "", tmp_arr = [];

    if (!data) {
        return data;
    }

    data += '';

    do {  // unpack four hexets into three octets using index points in b64
        h1 = b64.indexOf(data.charAt(i++));
        h2 = b64.indexOf(data.charAt(i++));
        h3 = b64.indexOf(data.charAt(i++));
        h4 = b64.indexOf(data.charAt(i++));

        bits = h1<<18 | h2<<12 | h3<<6 | h4;

        o1 = bits>>16 & 0xff;
        o2 = bits>>8 & 0xff;
        o3 = bits & 0xff;

        if (h3 == 64) {
            tmp_arr[ac++] = String.fromCharCode(o1);
        } else if (h4 == 64) {
            tmp_arr[ac++] = String.fromCharCode(o1, o2);
        } else {
            tmp_arr[ac++] = String.fromCharCode(o1, o2, o3);
        }
    } while (i < data.length);

    dec = tmp_arr.join('');
    dec = this.utf8_decode(dec);
    return dec;
}

/**
 * Converts a UTF-8 encoded string to ISO-8859-1
 * @see http://phpjs.org/functions/utf8_decode
 */
function utf8_decode ( str_data ) {
    var tmp_arr = [], i = 0, ac = 0, c1 = 0, c2 = 0, c3 = 0;

    str_data += '';

    while ( i < str_data.length ) {
        c1 = str_data.charCodeAt(i);
        if (c1 < 128) {
            tmp_arr[ac++] = String.fromCharCode(c1);
            i++;
        } else if ((c1 > 191) && (c1 < 224)) {
            c2 = str_data.charCodeAt(i+1);
            tmp_arr[ac++] = String.fromCharCode(((c1 & 31) << 6) | (c2 & 63));
            i += 2;
        } else {
            c2 = str_data.charCodeAt(i+1);
            c3 = str_data.charCodeAt(i+2);
            tmp_arr[ac++] = String.fromCharCode(((c1 & 15) << 12) | ((c2 & 63) << 6) | (c3 & 63));
            i += 3;
        }
    }

    return tmp_arr.join('');
}
</script>

</head>

<body>
    <h1>ISO 8859-1, JSON consumer</h1>

    <p>
        <a href="#" id="query_data_iso">Query for ISO-8859-1 data</a> |
        <a href="#" id="query_data_utf8">Query for UTF-8</a>
    </p>
    <table border="1">
        <tr>
            <td>Id</td>
            <td>Name</td>
            <td>Notes</td>
        </tr>
        <tr>
            <td><div id="customer_data_id"></div></td>
            <td><div id="customer_data_name"></div></td>
            <td><div id="customer_data_notes"></div></td>
        </tr>
    </table>
</body>
</html>

Downsize

The downsize to this solution will be the size of the encoded transfer, this because Base64-encoded data takes about 33% more space than the original data.

References

Base64 Content-Transfer-Encoding

http://tools.ietf.org/html/rfc2045

Normally, many media types which could be usefully transported via email are
represented, in their “natural” format, as 8bit character or binary data.
Such data cannot be transmitted over some transfer protocols.

For example, email. The SMTP (simple mail transfer protocol) restricts mail
messages to 7bit US-ASCII data with lines no longer than 1000 characters
including any trailing CRLF line separator.

PHP JS

  • base64_decode – http://phpjs.org/functions/base64_decode
  • utf8_decode – http://phpjs.org/functions/utf8_decode