Posts with the “xml” tag

Using Microsoft MapPoint with PHP

The MapPoint service is a commercial offering by Microsoft which gives developers access to a wide variety of mapping functionality through a web service interface. Well that's how it used to be anyway. For a while MapPoint was just a web service and a technology that powered products like Autoroute, then for a while it became Microsoft Virtual Earth which did nothing apart from change what appeared on the map images that you loaded from the servers. Then Microsoft launched their Bing extravaganza which meant that it's now called Bing Maps - well it is when one logs into the control panel but the service is still called MapPoint. It's highly confusing and makes it intensely difficult to find what one wants on the Microsoft site, especially on plumbing the depths of MSDN. For the purposes of this diatribe however, MapPoint is a web service that uses the SOAP protocol.

the process could fail catastrophically, mostly due to the abject bloody-mindedness of the MapPoint service

SOAP and PHP have not always been the most accommodating of bedfellows, it wasn't until version five that PHP had the language constructs to support SOAP (namely XML and Objects) and even now they're not exactly seamlessly integrated. Version four of PHP relied upon pure-PHP to manage SOAP, while five introduced a dedicated extension and related classes. The classes themselves are basic, and lamentably the Zend Framework concentrates more on serving SOAP content than consuming. In short, talking to the MapPoint service using PHP is a pain and fraught with problems - most predominantly is that MapPoint is a service built with .NET in mind (indeed the service was originally called MapPoint .NET), PHP just happens to be supported through the open-standard nature of the protocol.

Read the rest of this entry

Deconstruction part 2

Attacking those "random" files a couple of days ago provided enough of a challenge to keep me interested for a few hours, especially as it seemed like I was treading new ground in terms of spec'ing out previously unexplored file formats. It turned out that the files had already been mapped and successfully decompressed and the only thing left to do was build an unpacker which was in the pipeline. It seemed my work wasn't exactly fruitless but other, probably smarter people had everything under control. I wasn't about to let that stop me though.

Note (2008-01-11): The full (official?) SDK for this file format has been located which includes both a packer and an unpacker as well as other tools I'm sure are useful for working on the file format. The full name of the file format is "Yaneurao" with the SDK going by the nomenclature of "yaneSDK" which is the stem for the file format signature of "yanepkDx". There is already a .NET version of the SDK so if you're interested in my deconstruction process then read on, otherwise I would recommend using the official/fully-featured SDKs.

Then, in that moment of lucid elation, I realised exactly what was going wrong.

The compression format was identified as LZSS and reading through several sites revealed that some of the data I had initially spotted but attributed to SHIFT JIS (or at one point a Unicode Byte Order Marker, perfect for a non-Unicode file) were the tell-tale signatures of LZSS; the gradual degradation into junk data was also typical of the algorithm as the further into the file the stream progresses, the more back references are present.

yanePkDX
While I hadn't heard of LZSS, it came as no surprise that it was a modified version of LZ77 which I had come across before though never toyed with. Having to dig through a dense PDF was not my idea of fun and my university days had proven that reading academic proofs rarely lead to workable implementations for me so I searched for a ready-made PHP version which (for reasons which will soon become glaringly apparent) didn't prove fruitful. After coming up against dead-ends with other languages I settled on the defacto C version which seemed most other versions I found were based off.

Read the rest of this entry

Deconstruction

Out of curiosity and a favour to someone, I decided to take a look at some random .dat files that were ripe for the translating; what ensued was a morning of head scratching, hex scrying and using some of the lesser used PHP functions.

Sample File 1, Sample File 2, Sample File 3

All screenshots taken from data1.dat, sample file 1 and the window is resized for the most appropriate screenshot rather than general workability.

so garbled that it sent a few hundred bell tones to my computer speaker

First thing I did was to crank open the lovely XVI32 hex editor and have a look at the sample files provided, their .dat extension more or less indicated they were a proprietary format and were unlikely to relinquish their secrets easily. What was known was that the files contained a header portion, a bundle of XML files in a contiguous stream and a lot of junk data. The XML files could be seen and their encoding was stated as SHIFT JIS and, after cursing its existence, I attributed the junk data to that which seemed like a good place to start.

The first eight bytes seemed to be a file signature, but Google searches for all or parts of the signature were fruitless which meant it was time to pick things apart.

The next four bytes were different for each file and at first I thought it was part of the block format that made up the header part of the file but the section repetition for the header block didn't match up so after converting it to a variety of different number formats (I'm no hex wizard and I originally thought it was only a two byte short rather than a four byte integer or long) and assumed it was an unisgned long (32 bits) in Little Endian order.

Read the rest of this entry