XML encoding (utf-8, ascii)

July 31, 2011

XML is a markup language similar to HTML. It was designed to transport data. Once data has been enoded, it can be easily read by many different systems. As a result, it is widely used in web services to transfer data.

Recently, I was working on a web service which required us to parse the data from the XML feed and to store it in the database. Normally this is a simple task which can be achieved by using PHP’s simple_xml library to parse the data. However, if the document has not been encoded properly, simple_xml will generate an XML.

Whenever an XML document is encoded, the encoding used should be provided in the document.
If the document was encoded with unicode, example, UTF-8, the following would be the first line of the:
<?xml version=”1.0″ encoding=”UTF-8″?>

In my case, the xml document was label as UTF-8, however it was an ascii document which contained non-acii characters. This created a major problem with the parser. The quick solution is to strip the non-ascii characters from the document.
This can be achieved with the following php code:

$jobs = file_get_contents(‘/home/mydir/doc.xml’);
$jobs = preg_replace(‘/[^(\x20-\x7F)]*/’,”, $jobs);

Ryan Wright

Ryan is a PHP/MySQL Developer. As a High School intern, he worked on applications for NASA's bird migration project at City College of New York where he learned the more intricate details of software development. After studying Computer Engineering at Polytechnic University, Ryan has been working on developing numerous web applications ranging from simple sites to more advance E-Commerce solutions and Social Networking sites.

One Response

King Vaillancourt says:

January 27, 2012 at 3:30 pm

I was just looking for this info for some time. After 6 hours of continuous Googleing, finally I got it in your web site. I wonder what’s the Google’s problem that does not rank this type of informative web sites closer to the top. Normally the top sites are full of garbage.

XML encoding (utf-8, ascii)

Ryan Wright

One Response

Leave a Reply

Categories

Search

Recent Posts

Meet the Minds Behind GEO: 5 Perspectives Guiding the Future of Search

Welcome Merrik Kressley, Accella’s New Digital Marketing Strategist

WordPress vs. Drupal vs. Storyblok: Which CMS is Actually Best for Your Association?

What We Heard (Loud and Clear) at ASAE MMC+Tech 2025

Access Our Love at First Login Webinar

Maximizing Member Value: Why Your Association Needs a Personalized Member Dashboard

Most Common Tags

Thinking Across Digital

Tell Us About Your Project