How to use json_encode with ISO-8859-1 data – Part2
To continue with the previous post.
Problem: Function json_encode does not support ISO-8859-1 encoded data.
One solution that I did, in order to preserve the character set was to encode the
data before using the json_encode function to use just A-Z, a-z and 0-9 characters,
instead of sending text with accents or symbols.
One encoding that fits perfectly in this schema, is Base64 Content-Transfer-Encoding. (see base 64 explanation below)
This leads me to the solution: Encode the ISO-8859-1 using base64 and decoded in the client using JavaScript.
Now, this lead me to another issue, what’s the algorithim to implement the base64_decode function in JavaScript?
Luckly enough, I was looking at the site: http://phpjs.org/ which ported most of the PHP functions to JavaScript and problem solved!
Solution
So my solution will look like this (using the demo files from my previous post):
prueba_ansi.php
<?php
$id =
base64_encode(
utf8_encode('iso-8859-1'));
$name =
base64_encode(
utf8_encode('Pablo Víquez - This is an ISO-8859-1 encoded text'));
$notes =
base64_encode(
utf8_encode('Test with JSON encoding. á é í ó ú, ñ, Ñ.'));
$customer =
array(
'id' => $id,
'name' => $name,
'notes' => $notes
);
echo json_encode($customer);
As you can see, I used 2 functions to encode the data: base64_encode and utf8_encode.
I used these 2, mostly because the JavaScript implementation of base64_decode asumes
that the data is UTF-8 encoded, by doing base64_encode(utf8_encode(‘iso-8859-1′))
I’m acctually first encoding the data to be UTF-8 first and then encoding it using Base64.
base64_encode
Encodes data with MIME base64, and has the following description:
string base64_encode ( string $data )
utf8_encode
Encodes an ISO-8859-1 string to UTF-8, and has the following description:
string utf8_encode ( string $data )
Client Side
I used JQuery to retrieve the JSON data, and the phpjs.org implementation for the
base64_decode and utf8_decode functions.
index.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>JSON Consumer</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<script type="text/javascript" src="http://jqueryjs.googlecode.com/files/jquery-1.3.2.min.js"></script>
<script type="text/javascript">
$(document).ready(function(){
$("#query_data_iso").click(
function(){
getIsoData();
}
);
$("#query_data_utf8").click(
function(){
getUtfData();
}
);
});
function getUtfData() {
$.ajax({
type: "GET",
dataType: "json",
url: "prueba_utf8.php",
success: function(response){
$("#customer_data_id").html(response.id);
$("#customer_data_name").html(response.name);
$("#customer_data_notes").html(response.notes);
},
});
}
/**
* Gets a JSON response from the server.
* Expectes a base64 encoded response with: id, name and notes all base64 encoded.
*/
function getIsoData() {
$.ajax({
type: "GET",
dataType: "json",
url: "prueba_ansi.php",
success: function(response){
$("#customer_data_id").html(base64_decode(response.id));
$("#customer_data_name").html(base64_decode(response.name));
$("#customer_data_notes").html(base64_decode(response.notes));
},
});
}
/**
* Decodes string using MIME base64 algorithm
* @see http://phpjs.org/functions/base64_decode
*/
function base64_decode( data ) {
var b64 = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=";
var o1, o2, o3, h1, h2, h3, h4, bits, i = 0, ac = 0, dec = "", tmp_arr = [];
if (!data) {
return data;
}
data += '';
do { // unpack four hexets into three octets using index points in b64
h1 = b64.indexOf(data.charAt(i++));
h2 = b64.indexOf(data.charAt(i++));
h3 = b64.indexOf(data.charAt(i++));
h4 = b64.indexOf(data.charAt(i++));
bits = h1<<18 | h2<<12 | h3<<6 | h4;
o1 = bits>>16 & 0xff;
o2 = bits>>8 & 0xff;
o3 = bits & 0xff;
if (h3 == 64) {
tmp_arr[ac++] = String.fromCharCode(o1);
} else if (h4 == 64) {
tmp_arr[ac++] = String.fromCharCode(o1, o2);
} else {
tmp_arr[ac++] = String.fromCharCode(o1, o2, o3);
}
} while (i < data.length);
dec = tmp_arr.join('');
dec = this.utf8_decode(dec);
return dec;
}
/**
* Converts a UTF-8 encoded string to ISO-8859-1
* @see http://phpjs.org/functions/utf8_decode
*/
function utf8_decode ( str_data ) {
var tmp_arr = [], i = 0, ac = 0, c1 = 0, c2 = 0, c3 = 0;
str_data += '';
while ( i < str_data.length ) {
c1 = str_data.charCodeAt(i);
if (c1 < 128) {
tmp_arr[ac++] = String.fromCharCode(c1);
i++;
} else if ((c1 > 191) && (c1 < 224)) {
c2 = str_data.charCodeAt(i+1);
tmp_arr[ac++] = String.fromCharCode(((c1 & 31) << 6) | (c2 & 63));
i += 2;
} else {
c2 = str_data.charCodeAt(i+1);
c3 = str_data.charCodeAt(i+2);
tmp_arr[ac++] = String.fromCharCode(((c1 & 15) << 12) | ((c2 & 63) << 6) | (c3 & 63));
i += 3;
}
}
return tmp_arr.join('');
}
</script>
</head>
<body>
<h1>ISO 8859-1, JSON consumer</h1>
<p>
<a href="#" id="query_data_iso">Query for ISO-8859-1 data</a> |
<a href="#" id="query_data_utf8">Query for UTF-8</a>
</p>
<table border="1">
<tr>
<td>Id</td>
<td>Name</td>
<td>Notes</td>
</tr>
<tr>
<td><div id="customer_data_id"></div></td>
<td><div id="customer_data_name"></div></td>
<td><div id="customer_data_notes"></div></td>
</tr>
</table>
</body>
</html>
Downsize
The downsize to this solution will be the size of the encoded transfer, this because Base64-encoded data takes about 33% more space than the original data.
References
Base64 Content-Transfer-Encoding
http://tools.ietf.org/html/rfc2045
Normally, many media types which could be usefully transported via email are
represented, in their “natural” format, as 8bit character or binary data.
Such data cannot be transmitted over some transfer protocols.
For example, email. The SMTP (simple mail transfer protocol) restricts mail
messages to 7bit US-ASCII data with lines no longer than 1000 characters
including any trailing CRLF line separator.
PHP JS
- base64_decode – http://phpjs.org/functions/base64_decode
- utf8_decode – http://phpjs.org/functions/utf8_decode
Comments
Pingback from Pablo Viquez’ Blog: How to use json_encode with ISO-8859-1 data ‘" Part2 | PHP
Time August 3, 2009 at 10:01 am
[...] Viquez found a few issues with character encoding and the json_encode function. He revisits this in a second look at getting it to cooperate with ISO-8859-1 data. One solution that I did, in order to preserve [...]
Comment from Charles
Time August 3, 2009 at 11:36 am
Why go through all this when you can just properly re-encode the string?
header(‘Content-type: text/javascript;charset=utf-8′);
$good_result = json_encode(iconv(‘ISO-8859-1′, ‘UTF-8′, $string));
Comment from Eric
Time August 5, 2009 at 2:54 am
Isn’t it a simpler solution ?
What I did before moving all my site to UTF8 (which on the long term has advantages) was :
- Using utf8_encode() ‘s php function to encode my strings before calling json_encode()
- decoding them on the javascript side using something like:
var decodedValue; eval(‘decodedValue = “‘+jsonObject.encodedValue+’”;’);
Eric
Comment from Peter
Time December 13, 2009 at 5:50 am
Very very bad solution !!!
Pingback from Tweets that mention Pablo Viquez Blog » How to use json_encode with ISO-8859-1 data – Part2 — Topsy.com
Time February 9, 2010 at 2:07 pm
[...] This post was mentioned on Twitter by opendir, opendir. opendir said: JS PHP JSON ISO-8859-1 / php,json,jquery: http://tinyurl.com/yke8fvg – JSON használata nem UTF-8 kódolású karakt [...]
Pingback from json decode fails on non utf-8 | ~ overfl0w ~
Time March 16, 2010 at 4:07 pm
[...] Pablo Viquez (A solution pretty much like mine but for sending data instead) :base64, encode, javascript, json, utf-8 No comments for this entry yet… [...]

Follow me on Twitter
RSS
Pingback from Pablo Viquez’ Blog: How to use json_encode with ISO-8859-1 data ‘” Part2 | Webs Developer
Time August 3, 2009 at 8:01 am
[...] Viquez found a few issues with character encoding and the json_encode function. He revisits this in a second look at getting it to cooperate with ISO-8859-1 data. One solution that I did, in order to preserve [...]