Posts with the “data” tag

An open letter to (anime) database owners

Dear database owner,

I’m not a huge fan of the open letter format - its use for pithy snark has long since diluted whatever potency it once had - so I’ll get right to the point: I want access to your data.

Not in a creepy stalker way, I don’t want to know your three sizes and nor do I want whatever user data you choose to collect. No, I’m talking about the data you have on anime and its minutiae - characters, staff, companies and all the tidbits in between. Whether you call it a database, an encyclopaedia, a list, a planet or otherwise, I’m interested.

But why?

Read the rest of this entry

Deconstruction

Out of curiosity and a favour to someone, I decided to take a look at some random .dat files that were ripe for the translating; what ensued was a morning of head scratching, hex scrying and using some of the lesser used PHP functions.

Sample File 1, Sample File 2, Sample File 3

All screenshots taken from data1.dat, sample file 1 and the window is resized for the most appropriate screenshot rather than general workability.

so garbled that it sent a few hundred bell tones to my computer speaker

First thing I did was to crank open the lovely XVI32 hex editor and have a look at the sample files provided, their .dat extension more or less indicated they were a proprietary format and were unlikely to relinquish their secrets easily. What was known was that the files contained a header portion, a bundle of XML files in a contiguous stream and a lot of junk data. The XML files could be seen and their encoding was stated as SHIFT JIS and, after cursing its existence, I attributed the junk data to that which seemed like a good place to start.

01
The first eight bytes seemed to be a file signature, but Google searches for all or parts of the signature were fruitless which meant it was time to pick things apart.

02
The next four bytes were different for each file and at first I thought it was part of the block format that made up the header part of the file but the section repetition for the header block didn't match up so after converting it to a variety of different number formats (I'm no hex wizard and I originally thought it was only a two byte short rather than a four byte integer or long) and assumed it was an unisgned long (32 bits) in Little Endian order.

Read the rest of this entry