<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>chaostangent &#187; Code</title>
	<atom:link href="http://chaostangent.com/category/code/feed/" rel="self" type="application/rss+xml" />
	<link>http://chaostangent.com</link>
	<description>More squirrels than sense</description>
	<lastBuildDate>Tue, 17 Aug 2010 21:26:21 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<atom:link rel='hub' href='http://chaostangent.com/?pushpress=hub'/>
		<item>
		<title>Announcing n.adesi.co</title>
		<link>http://chaostangent.com/2010/07/announcing-n-adesi-co/</link>
		<comments>http://chaostangent.com/2010/07/announcing-n-adesi-co/#comments</comments>
		<pubDate>Thu, 22 Jul 2010 22:10:15 +0000</pubDate>
		<dc:creator>chaostangent</dc:creator>
				<category><![CDATA[Anime]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[build]]></category>
		<category><![CDATA[nadesico]]></category>
		<category><![CDATA[omoikane]]></category>
		<category><![CDATA[quick]]></category>
		<category><![CDATA[ruri]]></category>
		<category><![CDATA[shortener]]></category>
		<category><![CDATA[url]]></category>
		<category><![CDATA[url shortener]]></category>

		<guid isPermaLink="false">http://chaostangent.com/?p=3543</guid>
		<description><![CDATA[Because the world needs another URL shortener: http://n.adesi.co/
Not as short as many others but anime related. At the moment it’s very lo-fi and doesn’t have an API or any integration points so it isn’t currently compatible with your favourite Twitter client — this will come soon enough along with stats and other paraphernalia. Built from [...]]]></description>
			<content:encoded><![CDATA[<p>Because the world needs another URL shortener: <a href="http://n.adesi.co/">http://n.adesi.co/</a></p>
<p>Not as short as many others but anime related. At the moment it’s very lo-fi and doesn’t have an API or any integration points so it isn’t currently compatible with your favourite Twitter client — this will come soon enough along with stats and other paraphernalia. Built from scratch in less than a couple of hours so any bugs <a href="http://chaostangent.com/2010/07/announcing-n-adesi-co/#responseForm">drop me a line</a>, likewise with feature requests, otherwise enjoy.</p>
<p><strong>Edit</strong>: Forgot to say, it is currently gathering data, so once a frontend is built existing links will reveal their stats. <span class="signOff">¶</span></p>
]]></content:encoded>
			<wfw:commentRss>http://chaostangent.com/2010/07/announcing-n-adesi-co/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Using Microsoft MapPoint with PHP</title>
		<link>http://chaostangent.com/2009/10/using-microsoft-mappoint-with-php/</link>
		<comments>http://chaostangent.com/2009/10/using-microsoft-mappoint-with-php/#comments</comments>
		<pubDate>Mon, 05 Oct 2009 21:02:33 +0000</pubDate>
		<dc:creator>chaostangent</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[api]]></category>
		<category><![CDATA[bing]]></category>
		<category><![CDATA[mapping]]></category>
		<category><![CDATA[mappoint]]></category>
		<category><![CDATA[maps]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[php5]]></category>
		<category><![CDATA[service]]></category>
		<category><![CDATA[soap]]></category>
		<category><![CDATA[web service]]></category>
		<category><![CDATA[xml]]></category>
		<category><![CDATA[zend]]></category>
		<category><![CDATA[zend framework]]></category>

		<guid isPermaLink="false">http://chaostangent.com/?p=1301</guid>
		<description><![CDATA[The MapPoint service is a commercial offering by Microsoft which gives developers access to a wide variety of mapping functionality through a web service interface. Well that’s how it used to be anyway. For a while MapPoint was just a web service and a technology that powered products like Autoroute, then for a while it [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://msdn.microsoft.com/en-us/library/dd877970.aspx">MapPoint service</a> is a commercial offering by Microsoft which gives developers access to a wide variety of mapping functionality through a web service interface. Well that’s how it used to be anyway. For a while MapPoint was just a web service and a technology that powered products like <a href="http://www.microsoft.com/uk/homepc/autoroute/default.mspx">Autoroute</a>, then for a while it became <a href="http://www.microsoft.com/virtualearth/product/">Microsoft Virtual Earth</a> which did nothing apart from change what appeared on the map images that you loaded from the servers. Then Microsoft launched their <a href="http://www.bing.com/">Bing extravaganza</a> which meant that it’s now called <a href="http://www.bing.com/maps/">Bing Maps</a> — well it is when one logs into the control panel but the service is still called MapPoint. It’s highly confusing and makes it intensely difficult to find what one wants on the Microsoft site, especially on plumbing the depths of <a href="http://msdn.microsoft.com/">MSDN</a>. For the purposes of this diatribe however, MapPoint is a web service that uses the SOAP protocol.</p>
<blockquote class="pullout"><p>“the process could fail catastrophically, mostly due to the abject bloody-mindedness of the MapPoint service”</p>
</blockquote>
<p>SOAP and PHP have not always been the most accommodating of bedfellows, it wasn’t until version five that PHP had the language constructs to support SOAP (namely <a href="http://uk.php.net/manual/en/refs.xml.php">XML</a> and <a href="http://uk.php.net/manual/en/language.oop5.php">Objects</a>) and even now they’re not exactly seamlessly integrated. Version four of PHP relied upon <a href="http://sourceforge.net/projects/nusoap/">pure-PHP to manage SOAP</a>, while five introduced a dedicated extension and related classes. The classes themselves are basic, and lamentably the <a href="http://framework.zend.com/manual/en/zend.soap.html">Zend Framework</a> concentrates more on serving SOAP content than consuming. In short, talking to the MapPoint service using PHP is a pain and fraught with problems — most predominantly is that MapPoint is a service built with .NET in mind (indeed the service was originally called MapPoint .NET), PHP just happens to be supported through the open-standard nature of the protocol.<span id="more-1301"></span></p>
<p>Regardless, if you want to integrate with MapPoint, the only real option you had was <a href="http://www.phpclasses.org/browse/package/2638.html">a relatively complete API</a> by Carlos Jorge Machado Antunes which aimed to provide a complete replica of the web service in PHP. Upon first using it in 2006, things worked well — the API has a custom Soap client which extends the <a href="http://uk.php.net/manual/en/class.soapclient.php">built in SoapClient class</a> and does various things to ensure that everything runs smoothly. Unfortunately, it started running less smoothly as each successive version of PHP was released and it got to the point where at times my development environment worked while testing and production didn’t or vice versa, eventually it became apparent that the SoapClient class was doing things which were no longer necessary or not doing things which were necessary. The most prominent of these was the service’s refusal to accommodate references: e.g. to reduce the envelope size of a SOAP request it would specify an object once and then reference that by ID at other points in the XML document — this is smart thinking but the MapPoint service is a fickle beast. The solution to this (as with most SOAP foibles) is to intercept the XML before it is sent (on the <a href="http://uk.php.net/manual/en/soapclient.dorequest.php">__doRequest()</a> method), create a full DOM object, execute an XPath query to nab all of the instances of a reference and then duplicate the referenced node in place. This is an immensely costly operation to do for each action and coupled with the round trip time of the request means each query is laboriously slow, especially at busy periods.</p>
<p>With one of the <a href="http://chaostangent.com/2009/10/work-related-little-chef-site-relaunch/">big sites that uses MapPoint</a> under renovation, the chance to clean up and improve the way PHP can talk to MapPoint presented itself. At risk of reinventing the wheel, Carlos’ PHP API was used as a base — it already comes with all of the relevant classes and documentation built in so there seemed little point in starting this from scratch. With the aim to ultimately integrate the result into the Zend Framework, the first step was to organise the classes into some sort of hierarchy. With over a 100 classes this was easier said than done, thankfully the organisation is already pre-determined by the class names, so obviously <a href="http://www.phpclasses.org/browse/file/11254.html">ArrayOfDouble</a> goes into an ArrayOf and so forth. Next was to namespace all of the classes; despite its completeness, all of the class names are loaded into the standard namespace which isn’t too much of a problem for small applications or sites, but having common names like Address and Location as globals isn’t the best forward-thinking plan — putting them all into a Mappoint namespace (with the aim to go into a <a href="http://framework.zend.com/manual/en/zend.service.html">Zend_Service_</a>Mappoint one in the future) meant prefixing all of the class names and any references which was no small feat. Alongside this was a tidying up of some code foibles such as using the class name as the constructor name rather than the more standard magic function __construct(); likewise other ye olde PHP object idiosyncrasies such as having to call the parent of an extended class were removed or otherwise optimised.</p>
<p>While all of this was underway, a question as to how far a rebuild should go was raised. The MapPoint web service goes against common PHP programming style by using an uppercase first letter for function names e.g. FindNearRoute() rather than findNearRoute() — neither is incorrect however the latter is more preferable in context. However changing the case of the function call would go against the documented API, anybody using the PHP implementation would need to do a mental check before calling a function. This is a minor case but illustrates the point of what to change in the quest of betterment. For instance some classes contain, what are for all intents and purposes, constants, however in Carlos’ implementation they are static variables e.g. Class::$And; if they were to be converted to constants, common code style dictates they should be made entirely uppercase to denote them as constants e.g. Class::And becomes Class::AND. However this not only refers back to the “what to change” question, but also raises a problem in that what was once a valid variable name ($And) becomes an invalid constant name (AND) due to being a language construct and reserved keyword. Converting them to constants then becomes a question of what could be prefixed or suffixed to constants to make them practicable and consisten.</p>
<p>As the rebuild progressed, it become readily apparent that not only were the previous eccentricities in place for a reason, but that the service itself is ill suited to PHP. For starters, the service requires the use of the antiquated SOAP version 1.1 and forgoes the <a href="http://www.w3.org/2003/06/soap11-soap12.html">improvements in 1.2</a> — this is not a huge issue but is just the start of a number of aggravations and problems. The previously mentioned optimisation issue is unable to be worked around conventionally and only a <a href="http://bugs.php.net/bug.php?id=42652">single bug report</a> on the PHP site mentions this as a problem for other users which means the on-the-fly XML transforms are retained:</p>
<pre class="brush: php">$dom = DOMDocument::loadXML($request);
$path = new DOMXPath($dom);

$references = $path-&gt;query("//*[starts-with(@href, '#ref')]");
foreach($references AS $reference) {
  $target = $path-&gt;query("//*[@id='".substr($reference-&gt;getAttribute('href'), 1)."']");
  if($target-&gt;length &gt; 0) {
    $t = $target-&gt;item(0);
    // if not the same node name then duplicate the children
    if($t-&gt;nodeName != $reference-&gt;nodeName) {
      for($j = 0; $j &lt; $t-&gt;childNodes-&gt;length; $j++) {
        $reference-&gt;appendChild($t-&gt;childNodes-&gt;item($j)-&gt;cloneNode(true));
      }
      $reference-&gt;removeAttribute('href');
    } else { // otherwise just clone the target node
      $x = $t-&gt;cloneNode(true);
      $x-&gt;removeAttribute("id");
      $reference-&gt;parentNode-&gt;replaceChild($x, $reference);
    }
  }
}

$request = $dom-&gt;saveXML();
return parent::_doRequest($client, $request, $location, $action, $version, $oneWay);</pre>
<p>Benchmarking isn’t necessary to demonstrate this is an overhead that ideally would be omitted. Likewise there is the necessity to force the transformation of some objects into the <a href="http://uk3.php.net/manual/en/class.soapvar.php">SoapVar class</a> to enable them to retain their type and namespace identifiers — an odd and enigmatic problem that seems to stem from PHP itself and a process that I’m not entirely familiar with. This means that for some objects there is an additional operation required to enable the SOAP request to be accepted by the MapPoint service:</p>
<pre class="brush: php">// lifted from Carlos Jorge Machado Antunes' original API
public function transform() {
  $uri = Mappoint::NAMESPACE_URI;
  $view = new SoapVar($this, SOAP_ENC_OBJECT, "ViewByBoundingRectangle", $uri, "ViewByBoundingRectangle",$uri);
  $view-&gt;enc_value-&gt;BoundingRectangle = new SoapVar($view-&gt;enc_value-&gt;BoundingRectangle, SOAP_ENC_OBJECT, "LatLongRectangle", $uri, "LatLongRectangle", $uri);
  $southwest = new SoapVar($view-&gt;enc_value-&gt;BoundingRectangle-&gt;enc_value-&gt;Southwest, SOAP_ENC_OBJECT, "LatLong", $uri, "LatLong", $uri);
  $view-&gt;enc_value-&gt;BoundingRectangle-&gt;enc_value-&gt;Southwest = $southwest;
  $northeast = new SoapVar($view-&gt;enc_value-&gt;BoundingRectangle-&gt;enc_value-&gt;Northeast, SOAP_ENC_OBJECT, "LatLong", $uri, "LatLong", $uri);
  $view-&gt;enc_value-&gt;BoundingRectangle-&gt;enc_value-&gt;Northeast = $northeast;
  return $view;
}</pre>
<p>There seems to be little methodology as to why this is necessary and I’m loathe to explore for reasons soon to be explained, however the nature of SOAP means that any implementation problem is difficult to locate and debug. This is compounded by the depth of some of the objects which can plumb eight, sometimes twelve levels deep which makes simple var_dump output near impossible to decipher. To stand any chance of debugging the process, screeds of data was stored upon each request:</p>
<ul>
<li>The XML generated by PHP</li>
<li>The resultant XML after reference removal</li>
<li>Parameters passed</li>
<li>The desired action</li>
<li>The XML response from the server</li>
</ul>
<p>Everything was timestamped while every piece of XML was beautified to provide better navigation and stored in the application log directory — for a standard routing request this resulted in a maximum of six files:</p>
<ol>
<li>Start location search</li>
<li>End location search</li>
<li>Via location search (optional)</li>
<li>Route calculation</li>
<li>Corridor search for points of interest</li>
<li>Map render request</li>
</ol>
<p>At any point the process could fail catastrophically, mostly due to the abject bloody-mindedness of the MapPoint service. Actions within FindService usually return a result object (e.g. FindResult) which consists of an ArrayOf object (e.g. ArrayOf_Location) which is a wrapper around a standard PHP array, the problem occurs when only a single result is returned which then silently transforms the ArrayOf array into an object. This means that every result needs branching logic to determine the disposition of what has been returned and ugly, fragile code results.</p>
<p>On their own these are small issues with what is evidently a complex and full-featured service, however cumulatively they present an API that feels more combative than accommodating and makes working with it tiresome and more infuriating than is necessary. The original idea to create a clean, manageable API implementation that was fit for modern and rapid usage was swiftly quelled by the realisation that MapPoint wears its heritage on its sleeve and was always designed for .NET environments and languages rather than PHP. Realistically the desire to create a PHP API with those characteristics may be moot when more modern successors exist such as <a href="http://www.microsoft.com/maps/">Bing Maps</a> or the <a href="http://www.microsoft.com/maps/multimap/">Multimap API</a> — creating for an obtuse and  superseded service lacks the utility and draw when put in perspective.</p>
<p><script src="http://static.chaostangent.com/blog/scripts/c.php?f=syntaxhighlighter/shCore.js,syntaxhighlighter/shBrushPhp.js,syntaxhighlighter.js" type="text/javascript"></script></p>
<p>In the end the golden land of an elegant, flawless PHP version of the MapPoint API was replaced by a robust and serviceable update of an existing implementation that provides a step between MapPoint and future technologies. In a way it is disappointing but the continuous disdain felt while working with the service is enough to mitigate that. <span class="signOff">¶</span></p>
]]></content:encoded>
			<wfw:commentRss>http://chaostangent.com/2009/10/using-microsoft-mappoint-with-php/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Work related: Little Chef site relaunch</title>
		<link>http://chaostangent.com/2009/10/work-related-little-chef-site-relaunch/</link>
		<comments>http://chaostangent.com/2009/10/work-related-little-chef-site-relaunch/#comments</comments>
		<pubDate>Sun, 04 Oct 2009 15:55:25 +0000</pubDate>
		<dc:creator>chaostangent</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[campaign monitor]]></category>
		<category><![CDATA[css]]></category>
		<category><![CDATA[dwoo]]></category>
		<category><![CDATA[email]]></category>
		<category><![CDATA[internet explorer]]></category>
		<category><![CDATA[iris]]></category>
		<category><![CDATA[iris associates]]></category>
		<category><![CDATA[little chef]]></category>
		<category><![CDATA[mappoint]]></category>
		<category><![CDATA[microsoft]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[php4]]></category>
		<category><![CDATA[php5]]></category>
		<category><![CDATA[project]]></category>
		<category><![CDATA[style sheets]]></category>
		<category><![CDATA[work]]></category>
		<category><![CDATA[work related]]></category>
		<category><![CDATA[zend]]></category>
		<category><![CDATA[zend framework]]></category>

		<guid isPermaLink="false">http://chaostangent.com/?p=1303</guid>
		<description><![CDATA[Examining the October 2009 relaunch of the Little Chef website which attempted to build upon the lessons learned from having worked on the site for over three years; the rebuild of course came with its own tribulations.]]></description>
			<content:encoded><![CDATA[<p><img class="alignnone size-full wp-image-1306" title="Little Chef homepage" src="http://chaostangent.com/wp-content/uploads/2009/10/littlechef-homepage.jpg" alt="Little Chef homepage" width="540" height="347" /></p>
<p>The longest running and most high-profile website I have the pleasure of working on is for <a href="http://www.littlechef.co.uk">Little Chef</a>. With such a recognisable brand and in a period of increased company activity, the site is increasing its role as the primary communication with customers. With a recent aesthetic refresh which only select parts of the site were quick to follow, the remainder was still stoically in the old style — updating the rest was a deceptively large task and exposed the opportunity to rectify some of the niggling obstructions that had grown with the site. What on the surface was just a visual update was in a fact a more far-reaching upgrade.</p>
<blockquote class="pullout"><p>“if these were the most complicated aspects of the site the rebuild would have been simpler and drastically more straightforward.”</p>
</blockquote>
<p>Rebuilding an existing site always starts with the best intentions — glassy eyed optimism seeing only improvements and never pitfalls, but experience has taught temperance rather than ambitious extravagance. Ambivalence is quick to set in: on the one hand there is a full and detailed specification available in the form of the currently used site, while on the other it soon becomes rapidly apparent that with history comes refinement that may not lend itself to rapid reconstruction. Striking a balance between reconstructing for improvement and the silent threat of feature creep is the key to a timely and successful project.<br />
<span id="more-1303"></span></p>
<h2>Styles</h2>
<p>With a visual change came the auspicious chance to update all of the archaic stylesheets. Most content-driven sites tend to have a central sheet with common styles used across different pages — these usually consist of header, footer, sidebar and default content styles; the Little Chef site doesn’t suit this kind of organisation due in part to the uniquely designed sections, very few styles carry across different pages which meant when I originally created the styles, <a href="http://www.smashingmagazine.com/2007/07/27/css-specificity-things-you-should-know/">specificity rules</a> played a big part in crafting different sections. The biggest issue with this is that its near impossible to start with a “blank canvas”, most common identifiers (<code>#content</code>, <code>#sidebarRight</code> etc.) carry some default baggage with them. Lesson learned, <a href="http://www.littlechef.co.uk/styles/c.php?f=style.css">the default stylesheet</a> now only defines the header and footer with the primary content area left up to <a href="http://www.littlechef.co.uk/styles/c.php?f=friends.css">section</a> <a href="http://www.littlechef.co.uk/styles/c.php?f=findalittlechef.css">specific</a> <a href="http://www.littlechef.co.uk/styles/c.php?f=aboutus.css">stylesheets</a> to define; this is not flawless, especially when dealing with Internet Explorer specific styles. Ordinarily only a single stylesheet for each browser should be needed across the site, however styles which were implicit in section specific sheets would need to be explicit in a site-wide sheet. For instance, if there are two section specific stylesheets, each one can use class identifiers or IDs without fear of clashing, however a site-wide one would need an extra layer of specificity for these elements to be targeted. The only recourse was an identifier on the BODY element for each section which means the browser specific style rules always need to be prefixed (sometimes called “warts”) which can be tiresome:</p>
<pre class="brush: css">body#friends #sidebarRight { padding-top: 5.8em; }
body#friends #sidebarRight .login button { margin-right: -14px; }

/* news.css */
body#news .newsJump { height: 1%; width: 555px; }

/* feedback.css */
body#feedback .intro { height: 1%; width: 555px; }
body#feedback form legend { margin-left: -0.7em; }</pre>
<p>Internet Explorer 6 once again proved the most obtuse of browsers, primarily its <a href="http://sonspring.com/journal/ie6-multi-class-bug">idiosyncrasies surrounding the use of ID and class specifiers</a> on a single element e.g. <code>#content.sectionName</code>. Bizarrely the specific element cannot be styled beyond the first instance, however any child elements can. So if a <code>#content</code> rule exists, all <code>#content</code> rules regardless of any subsequent class specific rules will have the same styles, however if there is <code>#content .elementOne</code> and <code>#content.sectionName .elementOne</code>, the rules act as they should. It’s an infuriatingly vulgar arrangement but one that is not entirely unworkable — lamentably most of the “fixes” involve positioning elements absolutely so that IE6’s fast-and-loose interpretation of the box model is mitigated.</p>
<p>To match the new stylesheets, more or less every single page’s markup was re-examined. The history of the site  meant what had begun innocuously had mutated with repeated updates and innumerable additions and modifications; many sections were simplified or wholly recreated, <a href="http://www.littlechef.co.uk/menu">the Menu</a> being the largest example of the latter.</p>
<h2>Framework</h2>
<p>The biggest change by far though is the use of the <a href="http://framework.zend.com">Zend Framework</a> for the entirety of the site. More than just a gravitation towards the new and shiny, the site ideally would have been built with a framework from the outset had it not been for extenuating factors denoting otherwise. Originally conceived for a PHP4 host, the site originally used individual PHP scripts and an ensemble of different libraries and configurations; as is often the case, this worked but showed its fragility as the site grew — and most crucially — would become a hindrance as it grew further. Indeed, the plans for the Little Chef site are such that were it to remain as-is, the work required in the future would be even greater, especially to maintain any kind of quality.</p>
<p><a href="http://framework.zend.com/manual/en/zend.controller.html">Controller</a> is put to work with the <a href="http://framework.zend.com/manual/en/zend.controller.router.html">router</a> handling the forwarding of the old .php files to the relevant controllers — so whereas now a URL may be <a href="http://www.littlechef.co.uk/aboutus/wifi">aboutus/wifi</a>, the old URL <a href="http://www.littlechef.co.uk/wifi.php">wifi.php</a> and its shortened form <a href="http://www.littlechef.co.uk/wifi">wifi</a> will also work. <a href="http://www.littlechef.co.uk/feed/">RSS is now provided</a> through context switching while user authentication is handled using <a href="http://framework.zend.com/manual/en/zend.auth.html">Auth</a> and the action init() hook. Of course if these were the most complicated aspects of the site the rebuild would have been drastically simpler and more straightforward.</p>
<h2>Route finder</h2>
<p><img class="alignnone size-full wp-image-1307" title="Little Chef route finder" src="http://chaostangent.com/wp-content/uploads/2009/10/littlechef-routefinder.jpg" alt="Little Chef route finder" width="540" height="347" /></p>
<p>The largest amount of time was spent on the most popular area of the site, the “<a href="http://www.littlechef.co.uk/findalittlechef">Find a Little Chef</a>” service which integrates with the <a href="http://msdn.microsoft.com/en-us/library/dd877970.aspx">Microsoft MapPoint web service</a>. The full details of the implementation are for another time, suffice to say that it was neither quick nor routine to enable PHP to talk to the service in a manageable way. The most trying part of the process was the number of different states that are possible from only exposing two different pieces of functionality: find a route from one location to another with a corridor search for restaurants and a radial search for restaurants around a specific point. Using the former for this example, there is the best case scenario where both locations are successfully found by the service, a route is calculated between them and restaurants and available and displayed appropriately; beyond this state however is a multitude of different cases, all of which require a disproportionate amount of attention. For instance, if one out of two of the locations is ambiguous, a screen should be displayed offering different choices — this is well explored behaviour present on a number of different mapping services. The problem arises however in how to do a subsequent search once a location is chosen.</p>
<p>Using the query “Barnsley” as a concrete example — there are three different “Barnsley“s in the UK. MapPoint helpfully provides a “DisplayName” property for a location which means displaying a list of the different “Barnsley“s available is easy, however the only meaningful information returned by MapPoint that is subsequently useful is the latitude and longitude. The display name can’t be used to perform a search as one of the names returned is “Barnsley, England, United Kingdom” which is still ambiguous, an Entity ID is returned however this is not usable as <a href="http://msdn.microsoft.com/en-us/library/cc534886.aspx">the primary MapPoint data sources do not allow entity ID searches</a>, only user data sources are. To do a route search, established locations must be used rather than just latitude and longitude points which means that once a choice has been made and the latitude and longitude passed back to the service, a new search has to be done to <a href="http://msdn.microsoft.com/en-us/library/cc534892.aspx">find the closest route-able location</a> which, disappointingly enough, can fail despite the service returning a valid location not one query prior. It’s a startlingly backward way of doing things and just one of a catalogue of traumatically debilitating obstacles to implementation compounded by the drought of quality documentation available which means that alternatives and real-world implementations beyond the interminable FourthCoffee examples are non-existent.</p>
<h2>Friends</h2>
<p><img class="alignnone size-full wp-image-1308" title="Little Chef signup" src="http://chaostangent.com/wp-content/uploads/2009/10/littlechef-signup.jpg" alt="Little Chef signup" width="540" height="347" /></p>
<p>The Friends section, having seen over 36,000 sign ups since its inception, was given an equal amount of attention, most of it devoted to the <a href="http://www.littlechef.co.uk/friends/register">“Sign up” form</a>. <a href="http://framework.zend.com/manual/en/zend.form.html">Form</a> takes away a lot of the pain but does have some noticeable gaps. I don’t use the automatic <a href="http://framework.zend.com/manual/en/zend.form.forms.html#zend.form.forms.decorators">output decorators</a> of a form, primarily as past experience has shown that <em>every</em> form is unique in one way or another that automatic generation can’t and shouldn’t handle. The first of these was a combination field for dates which is reused and extended in the <a href="http://www.littlechef.co.uk/feedback">Feedback section</a> with a time implementation — this is only possible if one manages to stumble across the <a href="http://framework.zend.com/manual/en/zend.form.elements.html#zend.form.elements.validators">$context parameter</a> to a form field’s validation method. Either not available in earlier framework releases or otherwise more cleverly hidden away, $context is an array of all the other data passed to a form which allows for more capable validators and form elements. For a date field, outputting as three drop-down boxes or a JavaScript generated calendar control makes no difference as the validation method can gather any other fields and present back a single, valid date:</p>
<pre class="brush: php">class Lib_Form_Element_Date extends Zend_Form_Element_Xhtml {
	public function init() { $this-&gt;addValidator(new Lib_Validate_Date()); }

	public function isValid($value, $context = null) {
		if(is_array($value) &amp;&amp; array_key_exists("day", $value) &amp;&amp;
			array_key_exists("month", $value) &amp;&amp; array_key_exists("year", $value))
		{
			$value = strftime("%F", mktime(0, 0, 0, $value["month"], $value["day"], $value["year"]));
			$this-&gt;setValue($value);
		}

		return parent::isValid($value, $context);
	}
}</pre>
<pre class="brush: php">class Lib_Validate_Date extends Zend_Validate_Abstract {
	const INVALID_DATE = "invalidDate";

	protected $_messageTemplates = array(
		self::INVALID_DATE =&gt; "\"%value%\" is not a correct date"
	);

	public function isValid($value, $context = null) {
		$this-&gt;_setValue($value);

		if(is_string($value) &amp;&amp; (strpos($value, "-") !== false)) {
			list($y, $m, $d) = explode("-", $value);
			if(!checkdate(intval($m), intval($d), intval($y))) {
				$this-&gt;_error = self::INVALID_DATE;
				return false;
			}
		} else {
			if(strtotime($value) === false) {
				$this-&gt;_error = self::INVALID_DATE;
				return false;
			}
		}

		return true;
	}
}</pre>
<p>This is perhaps the simplest use of the parameter but $context opens up other possibilities as well, namely for validators which base their result on the values of other fields. The most common use would be a password and confirmation fields, without $context it’s impossible to provide a validator for the rule that both must be identical. This case can be made more generic with a FieldMatch validator which can match any number of fields to each other in the case that such a situation would arise.</p>
<p>A more complex example is one where validators are specifically applied only if the value of another fields is as specified. The easiest way to think about this is the “Other” option for drop downs, if this is selected then a user should fill in the free-text field provided. Again this case can be boiled down to a generic “Contingent” validator which is constructed with a list of fields, a list of trigger matches and a list of validators to apply in the event of a field match. Perhaps easier to demonstrate than to describe:</p>
<pre class="brush: php">$form-&gt;addElement("select", "title", array(
  "required" =&gt; true,
  "multiOptions" =&gt; array("Mr" =&gt; "Mr", "Mrs" =&gt; "Mrs", "Other" =&gt; "Other")
));
$form-&gt;addElement("text", "titleother", array(
  "allowEmpty" =&gt; false,
  "validators" =&gt; array(new Lib_Validate_Contingent("title", "Other", new Zend_Validate_NotEmpty()))
));</pre>
<pre class="brush: php">class Lib_Validate_Contingent extends Zend_Validate_Abstract {
	protected $_fields = array(), $_matches = array(), $_validators = array();

	public function isValid($value, $context = null) {
		foreach($this-&gt;_fields AS $k =&gt; $field) {
			if((is_array($this-&gt;_matches[$k]) &amp;&amp; in_array($context[$field], $this-&gt;_matches[$k])) ||
				$context[$field] == $this-&gt;_matches[$k])
			{
				foreach($this-&gt;_validators AS $validator) {
					if(!$validator-&gt;isValid($value, $context)) {
						$this-&gt;_messageTemplates = $validator-&gt;getMessageTemplates();
						$this-&gt;_messageVariables = $validator-&gt;getMessageVariables();
						$this-&gt;_value = $value;

						foreach($validator-&gt;getMessages() AS $k =&gt; $v) {
							$this-&gt;_error($k, $value);
						}

						return false;
					}
				}
			}
		}

		return true;
	}
}</pre>
<p>In this way the Contingent validator is like a trigger — it doesn’t provide any validation messages itself but feeds back the associated validator’s. Important to note the “allowEmpty” setting for the field which means the validator is triggered even if the field itself is empty which in most cases is entirely the point. Originally this validator was attached to the other field (in the example above this would be “title”), however this was logically odd and perplexing in some situations as the validation messages would then be bound to that field rather than the target one. With this it’s entirely possible to create complex, cascading forms without forgoing the use of validators and relying on unwieldy control structures.<br />
<script src="/wp-includes/js/syntaxhighlighter/shCore.js" type="text/javascript"></script><script src="/wp-includes/js/syntaxhighlighter/shBrushCss.js" type="text/javascript"></script><script src="/wp-includes/js/syntaxhighlighter/shBrushPhp.js" type="text/javascript"></script><script type="text/javascript">// <![CDATA[
     SyntaxHighlighter.config.clipboardSwf = "/wp-includes/js/syntaxhighlighter/clipboard.swf"; SyntaxHighlighter.defaults.tabSize = 2; SyntaxHighlighter.defaults.toolbar = false; SyntaxHighlighter.all();
// ]]&gt;</script></p>
<h2>Conclusion</h2>
<p>Both the rebuild itself and the launch, while not without their vexing moments, went better than I had expected and the wealth of lessons learned, components built and challenges encountered more than makes up for the teeth-grinding suffered. There are a myriad of other, smaller complications which don’t warrant a full exploration, most of them borne out of the need for better debugging, especially when dealing with external services. The true measure of the work done now will be in the months and hopefully the years to come when the site is sure to mature into something perhaps drastically different from what it is now. <span class="signOff">¶</span></p>
]]></content:encoded>
			<wfw:commentRss>http://chaostangent.com/2009/10/work-related-little-chef-site-relaunch/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Work related: National College site goes live</title>
		<link>http://chaostangent.com/2009/09/work-related-national-college-site-goes-live/</link>
		<comments>http://chaostangent.com/2009/09/work-related-national-college-site-goes-live/#comments</comments>
		<pubDate>Sun, 13 Sep 2009 17:53:36 +0000</pubDate>
		<dc:creator>chaostangent</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[carousel]]></category>
		<category><![CDATA[css]]></category>
		<category><![CDATA[iris]]></category>
		<category><![CDATA[iris associates]]></category>
		<category><![CDATA[javascript]]></category>
		<category><![CDATA[js]]></category>
		<category><![CDATA[markup]]></category>
		<category><![CDATA[national college]]></category>
		<category><![CDATA[ncsl]]></category>
		<category><![CDATA[project]]></category>
		<category><![CDATA[tabs]]></category>
		<category><![CDATA[tabulator]]></category>
		<category><![CDATA[tooltips]]></category>
		<category><![CDATA[tooltipsy]]></category>
		<category><![CDATA[work releated]]></category>
		<category><![CDATA[xhtml]]></category>
		<category><![CDATA[xhtml strict]]></category>

		<guid isPermaLink="false">http://chaostangent.com/?p=1291</guid>
		<description><![CDATA[An examination of some of the more interesting parts of the National College website which I worked on since November 2008 and was launched in September 2009. Includes a look at the full-fat Carousel script as well as tab and tooltip scripts and markup.]]></description>
			<content:encoded><![CDATA[<p><img class="alignnone size-full wp-image-1293" title="ncsl-1" src="http://chaostangent.com/wp-content/uploads/2009/09/ncsl-1.jpg" alt="ncsl-1" width="540" height="334" /><br />
One of the sites I’ve been working on for almost a year now has gone live: <a href="http://www.nationalcollege.org.uk/index/professional-development.htm">National College for Leadership of Schools and Children’s Services</a>. Responsible for markup, styling and scripting, the site went through numerous visual and requirement changes before the current layout and design was settled upon.</p>
<blockquote class="pullout"><p>“a benchmark for design, usability and accessibility, catering to a wide audience without compromising aesthetics”</p>
</blockquote>
<p>Starting in November 2008 with the then named NCSL (National College for School Leadership), the number of templates could be counted on two hands and  script components were non-existent; since then the brand and colour scheme have changed, a font replacement library has been implemented, upgraded then changed (<a href="http://wiki.novemberborn.net/sifr3/">sIFR</a> version two, then three, then <a href="http://wiki.github.com/sorccu/cufon/about">Cufon</a>), script components now include tabs, social bookmarking, tooltips and a fully-featured carousel, and the number of templates has ballooned to encompass a wide variety of pages including a number of <a href="https://www.nationalcollege.org.uk/index/login.html">member-only pages</a> most site visitors will never see.</p>
<p>The client had a very strict set of requirements regarding accessibility, usability and aesthetics, the site was a great challenge and it’s brilliant to see it go live. With it now out in the wild, it’s a good opportunity to examine just some of the  notable aspects of the project.<span id="more-1291"></span></p>
<h2>Carousel</h2>
<p><a href="http://chaostangent.com/wp-content/uploads/2009/09/ncsl-3-carousel.jpg"><img class="alignnone size-full wp-image-1295" title="ncsl-3-carousel" src="http://chaostangent.com/wp-content/uploads/2009/09/ncsl-3-carousel.jpg" alt="ncsl-3-carousel" width="540" height="193" /></a></p>
<p>The site <a href="http://www.nationalcollege.org.uk/index/professional-development.htm">demonstrates the full version of the carousel</a> that <a href="#carousel">currently adorns chaostangent.com</a> and includes the quick-jump menu and some other tweaks. The quick-jump logic took a while to hammer down — it would have been easy to implement a simple, one-directional system but it needed to be smarter and choose the shortest path (either backwards or forwards) to the desired item. Hiccups included dealing with negative numbers (e.g. if you’re on item four and the target item is number one, the difference between them is negative three) raised the question of whether to normalise that difference to a positive number and work from there or use the negative status to hint at the directionality the carousel should take (short answer: normalise). As an example of the logic, assume the carousel has eight items, the default and current state is the first item and a visitor has just clicked on item four. First, the difference between the two is taken: target — current = 4 — 1 = 3, this is then normalised in case of a negative number. The carousel assumes it is going to travel forward (right) to reach the target item, but first it checks to see whether the difference is greater than half of the total carousel items (four), which in this case it isn’t at which point it moves right. In code, this is:</p>
<pre class="brush: javascript">var diff = (targetItem - this.currentItem);
var normalisedDiff = (diff &lt; 0) ? (diff * -1) : diff;
var newDiff = diff;

if(normalisedDiff &gt; Math.round(this.items / 2))
{
    newDiff = normalisedDiff - this.items;
    newDiff *= (diff &lt; 0) ? -1 : 1;
}

if(newDiff &gt; 0)
{
    this.moveRight(null, newDiff);
}
else
{
    this.moveLeft(null, (newDiff * -1));
}</pre>
<p>Simple, but effective. The assumption of always moving right deals with the case when there would be an equal number of moves either right or left. The transition from one item to another is done step by step rather than as a smooth, no-break transition. The reason for this leads back to the <a href="http://chaostangent.com/2009/08/building-the-carousel/">looping nature of the carousel</a>: due to the way the elements are sequenced when the carousel moves, it would require a great deal of pre-calculation to produce a smooth animation with no real benefit to usability. User testing showed the staged transition helped identify which item a user was on at any time. Again, the carousel (and all scripts examined henceforth) are not released under a permissive license due to being part of a commercial project so are subject to standard intellectual property and copyright laws.</p>
<h2>Tooltips</h2>
<p><a href="http://chaostangent.com/wp-content/uploads/2009/09/ncsl-4-tooltips.png"><img class="alignnone size-full wp-image-1296" title="ncsl-4-tooltips" src="http://chaostangent.com/wp-content/uploads/2009/09/ncsl-4-tooltips.png" alt="ncsl-4-tooltips" width="540" height="220" /></a></p>
<p>Another notable and suitably petite script concerns the tooltips for the <a href="http://www.nationalcollege.org.uk/index/professional-development.htm">“Tools” palette in the right column</a>, the script for which I’ve named “Tooltipsy”. The most vexing part of the script is keeping the tooltip within the viewport:</p>
<pre class="brush: javascript">var co = liElem.cumulativeOffset();
var dim = liElem.getDimensions();
var ttDim = tooltipElem.getDimensions();
var vpDim = document.viewport.getDimensions();

var ttPos = {
  top: (co.top - ttDim.height - 11),
  left: Math.round((co.left + (dim.width / 2)) - (ttDim.width / 2))
};
var offset = ((ttPos.left + ttDim.width) &gt; vpDim.width) ? (vpDim.width - (ttPos.left + ttDim.width)) : 0;

tooltipElem.setStyle({
  top: ttPos.top+"px",
  left: (ttPos.left + offset) +"px"
});

stemElem.setStyle({
  left: (Math.round(ttDim.width / 2) - 8 - offset)+"px"
});</pre>
<p>This code is slightly out of context (the LI element’s cumulative offset is calculated well before this segment) but the core of it is present. First the relevant dimensions and offsets are calculated, then an object is created with to store the estimated position of the tooltip — ignoring the viewport for the time being. The top constant (11) is has been hard-coded for speed, based upon the height of the tooltip plus a three pixel offset so the tooltip does not obscure the target content. The left position is calculated using the left cumulative offset plus half the width of the LI element, minus half the width of the tooltip — this should centre the tooltip according to the LI element. Next an offset is calculated for the viewport: if the tooltip is positioned beyond the viewport dimensions it is offset back so that it is entirely contained i.e. a horizontal scroll bar does not appear. Note that this offset only takes into account the right case, the left, top and bottom cases are ignored — a shortcut as the position of the tooltips is already known in this case, top and bottom cases would need the ability to display the tips above <strong>and</strong> below the element rather than just above as is the case here. After the tooltip is positioned, the stem “pointer” element is placed taking into account the offset so that it is always centred on the element being pointed to rather than the tooltip. The result is a stout little script just over seventy five lines long. If you’re making your own tooltip script rather than using an <a href="http://www.nickstakenburg.com/projects/prototip2/">publicly available library</a>, one aspect which can be vexing is the geometry calculations: the tooltip <em>must</em> be visible for these to be correct, having the tooltips set as “display: none” means the dimensions will be zero or subtly incorrect.</p>
<h2>Tabs</h2>
<p><a href="http://chaostangent.com/wp-content/uploads/2009/09/ncsl-2-tabs.png"><img class="alignnone size-full wp-image-1294" title="ncsl-2-tabs" src="http://chaostangent.com/wp-content/uploads/2009/09/ncsl-2-tabs.png" alt="ncsl-2-tabs" width="540" height="153" /></a></p>
<p>The <a href="http://www.nationalcollege.org.uk/index/leadershiplibrary/leading-early-years/diversity-in-early-years.htm">tab system</a> which is used all over the site was a particularly tricky bit of scripting and markup to get right. Accessibility to non-JavaScript users was a must, there were however other constraints: remain visible at larger text sizes and flow onto two or more lines if the tabs don’t fit within the available horizontal space. The former is relatively easy to satisfy with larger background images for the <a href="http://www.alistapart.com/articles/slidingdoors/">CSS Sliding Doors technique</a>; the latter not so much. At first blush, one would imagine floating the tabs to the left would do the trick, however this causes havoc with multiple lines and multiple words within the tab; the solution is to use “display: inline-block” for browsers which support it (<abbr title="Internet Explorer">IE</abbr>8, Firefox 3+, Opera, Safari) and to use “display: inline” for IE6 and IE7. This means that Firefox 2 users (of which there is enough to warrant concern) are out of luck as <abbr title="Firefox">FF</abbr>2 does not support inline-block and there is no way of targeting that browser version over FF3. Thankfully the user statistics for the site meant this was an acceptable compromise but the bottom line is that there is no fire-and-forget way of styling robust tabs. Regardless, using inline and inline-block means the tabs are solid and cross-browser compatible and able to be used in different contexts throughout the site.</p>
<p>The markup shows some slight redundancy with a superfluous and mildly solecistic SPAN element, however this is necessary for the background effect which, in a difficult twist, needs to overlay the keyline when a tab is selected.</p>
<pre class="brush: xml">&lt;ul class="tabList scripted"&gt;
  &lt;li class="anonymous_element_3 on"&gt;&lt;span&gt;Topics&lt;/span&gt;&lt;/li&gt;
  &lt;li class="anonymous_element_5"&gt;&lt;span&gt;Resources&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;</pre>
<pre class="brush: css">.tabList li {
  display: inline-block;
  background: url(tab-off-left.png) no-repeat left top;
  margin: 0 3px 1px 0;
  position: relative;
  cursor: pointer;
}

.tabList li.on {
  color: #FFF;
  background: url(tab-on-left.png) no-repeat left top;
  margin-bottom: 0;
}

.tabList li span {
  display: block;
  background: url(tab-off-right.png) no-repeat right top;
  padding: 0.3em 1em 0.2em;
}

.tabList li.on span {
  background: url(tab-on-right.png) no-repeat right top;
  padding-bottom: 0.27em; /* Most of the tab effect is about tweaking this value */
}</pre>
<p>The UL.tablist element is pre-built by the tab script (called “Tabulator”) based on the tabs that the script is fed. The “class=“anonymous_element_3”” is generated by the script based upon a tab’s ID. In this case, while the background images are separate, it would be possible to combine them into a single image to reduce HTTP request and loading time — however you would need to wrap your head around <a href="http://www.smashingmagazine.com/2009/04/27/the-mystery-of-css-sprites-techniques-tools-and-tutorials/">CSS sprites</a> as well as some transparency tomfoolery. The IE6 and IE7 specific style sheets perform some minor alterations to this styling:</p>
<pre class="brush: css">.tabList li { display: inline; }
.tabList li.on span { padding-top: 0.35em; padding-bottom: 0.24em; }</pre>
<p>If you’re unconcerned about using pixel values for padding then the numbers become easier to fathom and tweak.</p>
<p>The Tabulator script is short at just over 50 lines and doesn’t do anything wildly inventive. Most of the selection logic is done using <a href="http://www.prototypejs.org/">Prototype</a> niceties like passing the element ID (<a href="http://api.prototypejs.org/dom/element.html#identify-class_method">automatically generated</a> by Prototype if not explicitly set) of the tab to be selected to the event handler function. This could also be done by counting the number of previous siblings a tab in the tab listing has and selecting the tab content element based on that — potentially using <a href="http://www.w3.org/TR/css3-selectors/">CSS3 selectors</a> such as <a href="http://www.w3.org/TR/css3-selectors/#nth-child-pseudo">nth-child</a>; with <a href="http://www.prototypejs.org/2009/9/1/prototype-1-6-1-released">Prototype 1.6.1</a> it would even be possible to store the element ID alongside the element with their DOM-less storage system.</p>
<h2>Markup</h2>
<p>The markup was always going to be complex: a large number of templates and a high degree of fidelity with the designs. Targeting the uncompromising XHTML 1.0 Strict doctype, the result is highly semantic and a tour-de-force of cross-browser compatibility. I tended to steer clear of “modern” CSS niceties which I’ve used in more recent projects such as direct descendant selectors (element1 &gt; element2) which makes rounded corners easier with two nested DIV elements but means the markup needs to be completely respecified in an IE6 stylesheet; I did use :first-child in places but the need for IE6 class specifiers mooted their usage somewhat.</p>
<p>As the project spanned a number of months and a wealth of additions and amends, a strict logging system was put in place to deal with changes that were being made to the templates on a weekly basis. Whenever a release was made to the client, a full report on the files modified, removed or added and the impact of the changes was necessary, mostly these took shape according to requests which had been sent through which meant changes could be isolated according to their context. The project wasn’t placed under source control until late — too late to be of much use — which is when the <a href="http://winmerge.org/">wonderful WinMerge</a> comes into its own, even better that it is released free of charge.</p>
<p>Part of the release was an Online Visual Toolkit: essentially a collection of templates, components and best practice notes for usage of the templates and ancillary files. This acted as the main resource for demonstrating changes and providing updates and has been integrated into the launch of the site for all parties which are involved with working on the site.<br />
<script src="/wp-includes/js/syntaxhighlighter/shCore.js" type="text/javascript"></script><script src="/wp-includes/js/syntaxhighlighter/shBrushJScript.js" type="text/javascript"></script><script src="/wp-includes/js/syntaxhighlighter/shBrushCss.js" type="text/javascript"></script><script src="/wp-includes/js/syntaxhighlighter/shBrushXml.js" type="text/javascript"></script><script type="text/javascript">// <![CDATA[
                     SyntaxHighlighter.config.clipboardSwf = "/wp-includes/js/syntaxhighlighter/clipboard.swf"; SyntaxHighlighter.defaults.tabSize = 2; SyntaxHighlighter.defaults.toolbar = false; SyntaxHighlighter.all();
// ]]&gt;</script></p>
<h2>Overall</h2>
<p>Despite the length of the project, the result is a benchmark for design, usability and accessibility, catering to a wide audience without compromising aesthetics. There are elements of the site which haven’t been examined in detail above (use of Cufon as a font display library, social bookmarking script to name but a couple) but they can be counted as minor notes to the major ones above. At the time of writing the site is currently going through some initial teething troubles: despite the templates originally validating against XHTML 1.0 Strict, the current pages do not for a variety of reasons — many to do with the templates being integrated with a bespoke content management system; currently the number of assets being loaded per page is quite immense especially with regards JavaScript source files and CSS backgrounds, even my developer note recommending the usage of a script combiner + compressor and the Google AJAX libraries has gone unheeded. Combine these with the slightly souring note that other JS libraries have been used instead of the ones available (<a href="http://tetlaw.id.au/view/blog/fabtabulous-simple-tabs-using-prototype/">Fabtabulous</a> vs. Tabulator) and the launch isn’t everything I had hoped, but the core of the project remains and with polish and incremental improvement, the situation will no doubt improve. <span class="signOff">¶</span></p>
]]></content:encoded>
			<wfw:commentRss>http://chaostangent.com/2009/09/work-related-national-college-site-goes-live/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>50 frames of Life</title>
		<link>http://chaostangent.com/2009/09/50-frames-of-life/</link>
		<comments>http://chaostangent.com/2009/09/50-frames-of-life/#comments</comments>
		<pubDate>Tue, 01 Sep 2009 07:16:09 +0000</pubDate>
		<dc:creator>chaostangent</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Geekery]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[avatars]]></category>
		<category><![CDATA[conways game of life]]></category>
		<category><![CDATA[game of life]]></category>
		<category><![CDATA[gravatars]]></category>
		<category><![CDATA[john horton conway]]></category>
		<category><![CDATA[optimisation]]></category>
		<category><![CDATA[php]]></category>

		<guid isPermaLink="false">http://chaostangent.com/?p=1230</guid>
		<description><![CDATA[My Sunday afternoon project wasn’t something that I could just let lie and it didn’t take long for work to start on it again. Using the list of improvements I had identified, I began with the aesthetics and then moved on to other, more number intensive areas of research.
Before even touching the code I subsumed [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://chaostangent.com/2009/08/sunday-afternoon-project-conways-game-of-life-in-php/">My Sunday afternoon project</a> wasn’t something that I could just let lie and it didn’t take long for work to start on it again. Using the list of improvements I had identified, I began with the aesthetics and then moved on to other, more number intensive areas of research.</p>
<p>Before even touching the code I subsumed everything into a Git repository; I’m a long time <a href="http://subversion.tigris.org/">Subversion</a> user but relatively new to <a href="http://git-scm.com/">Git</a> so I still regularly refer back to the “<a href="http://git-scm.com/course/svn.html">Git — SVN Crash Course</a>” which is pleasantly concise. With this done, I attacked the GIF output method first:</p>
<p class="thumbnails four"><img class="alignnone size-full wp-image-1231" title="gameoflife2-sample-1" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-sample-1.gif" alt="gameoflife2-sample-1" width="64" height="64" /> <img class="alignnone size-full wp-image-1232" title="gameoflife2-sample-2" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-sample-2.gif" alt="gameoflife2-sample-2" width="64" height="64" /> <img class="alignnone size-full wp-image-1233" title="gameoflife2-sample-3" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-sample-3.gif" alt="gameoflife2-sample-3" width="64" height="64" /> <img class="alignnone size-full wp-image-1234" title="gameoflife2-sample-4" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-sample-4.gif" alt="gameoflife2-sample-4" width="64" height="64" /></p>
<blockquote class="pullout"><p>“cooked up in a few hours and wasn’t subject to any stringent mathematical basis”</p>
</blockquote>
<p>First was visibly increasing the size of the cells, I had originally used a multiplier of four for previous iterations but that made them very indistinct, and with only fifty generations it meant a large portion of the space wasn’t used. The result was an increase in cell size to seven with a one pixel border: this was the result of a happy accident while crafting the previous post and resulted in the introductory images, however the calculations for the edge cells was incorrect which is why those animations don’t appear to “loop”  at the edges as they should. This implementation fixed that and with a vastly smaller environment (only 8x8 with a 5x5 seed), each generation of cells and their progression is easier to see. Next was addressing the colour issue, generating both a background and foreground colour met with mixed results so taking a leaf from <a href="http://scott.sherrillmix.com/blog/blogger/wp_identicon/">WP_Identicon’s book</a>, I kept the background colour constant and generated the foreground colour only:<span id="more-1230"></span></p>
<p class="thumbnails four"><img class="alignnone size-full wp-image-1235" title="gameoflife2-sample-5" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-sample-5.gif" alt="gameoflife2-sample-5" width="64" height="64" /> <img class="alignnone size-full wp-image-1236" title="gameoflife2-sample-6" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-sample-6.gif" alt="gameoflife2-sample-6" width="64" height="64" /> <img class="alignnone size-full wp-image-1237" title="gameoflife2-sample-7" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-sample-7.gif" alt="gameoflife2-sample-7" width="64" height="64" /> <img class="alignnone size-full wp-image-1238" title="gameoflife2-sample-8" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-sample-8.gif" alt="gameoflife2-sample-8" width="64" height="64" /></p>
<p>The generation was done by using different parts of the e-mail address again: the full address for the red component, the local part for the green component and the domain part for the blue component. This method produced the first four images and had a definite tendency for the green spectrum again, I then altered it slightly to compute the red component from the local part and green from the full address which resulted in a pleasing orange colour. Next was to try to tone down the rapid movement — while a tenth of second is fine for debugging and tracking progression, it’s distracting when viewed for long periods of time and the possibility of multiple avatars appearing at the same time meant slowing down the animation:</p>
<p class="thumbnails three"><img class="alignnone size-full wp-image-1238" title="gameoflife2-sample-8" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-sample-8.gif" alt="gameoflife2-sample-8" width="64" height="64" /> <img class="alignnone size-full wp-image-1239" title="gameoflife2-sample-9" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-sample-9.gif" alt="gameoflife2-sample-9" width="64" height="64" /> <img class="alignnone size-full wp-image-1240" title="gameoflife2-sample-10" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-sample-10.gif" alt="gameoflife2-sample-10" width="64" height="64" /></p>
<p>The first is the original 1/10 of a second, the second is 1/2 and the third 3/4. Obviously when viewed next to quicker varieties the effect is somewhat lost, but at 3/4 of a second per frame, a full fifty generations will last close to a minute and still remain under 15k/b which is more than acceptable for an avatar. Performing a full run on the original 1,300+ e-mail addresses, produced a set of results which coincidentally matched my previous desire to be able to identify an e-mail address’s domain at a glance. For example, Hotmail tended towards greens and oranges:</p>
<p class="thumbnails seven"><img class="alignnone size-full wp-image-1241" title="gameoflife2-hotmail-1" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-hotmail-1.gif" alt="gameoflife2-hotmail-1" width="64" height="64" /> <img class="alignnone size-full wp-image-1242" title="gameoflife2-hotmail-2" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-hotmail-2.gif" alt="gameoflife2-hotmail-2" width="64" height="64" /> <img class="alignnone size-full wp-image-1243" title="gameoflife2-hotmail-3" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-hotmail-3.gif" alt="gameoflife2-hotmail-3" width="64" height="64" /> <img class="alignnone size-full wp-image-1244" title="gameoflife2-hotmail-4" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-hotmail-4.gif" alt="gameoflife2-hotmail-4" width="64" height="64" /> <img class="alignnone size-full wp-image-1245" title="gameoflife2-hotmail-5" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-hotmail-5.gif" alt="gameoflife2-hotmail-5" width="64" height="64" /> <img class="alignnone size-full wp-image-1246" title="gameoflife2-hotmail-6" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-hotmail-6.gif" alt="gameoflife2-hotmail-6" width="64" height="64" /> <img class="alignnone size-full wp-image-1247" title="gameoflife2-hotmail-7" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-hotmail-7.gif" alt="gameoflife2-hotmail-7" width="64" height="64" /></p>
<p>while BT Internet more towards blues and pale yellows:</p>
<p class="thumbnails seven"><img class="alignnone size-full wp-image-1248" title="gameoflife2-btinternet-1" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-btinternet-1.gif" alt="gameoflife2-btinternet-1" width="64" height="64" /> <img class="alignnone size-full wp-image-1249" title="gameoflife2-btinternet-2" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-btinternet-2.gif" alt="gameoflife2-btinternet-2" width="64" height="64" /> <img class="alignnone size-full wp-image-1250" title="gameoflife2-btinternet-3" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-btinternet-3.gif" alt="gameoflife2-btinternet-3" width="64" height="64" /> <img class="alignnone size-full wp-image-1251" title="gameoflife2-btinternet-4" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-btinternet-4.gif" alt="gameoflife2-btinternet-4" width="64" height="64" /> <img class="alignnone size-full wp-image-1252" title="gameoflife2-btinternet-5" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-btinternet-5.gif" alt="gameoflife2-btinternet-5" width="64" height="64" /> <img class="alignnone size-full wp-image-1253" title="gameoflife2-btinternet-6" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-btinternet-6.gif" alt="gameoflife2-btinternet-6" width="64" height="64" /> <img class="alignnone size-full wp-image-1254" title="gameoflife2-btinternet-7" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-btinternet-7.gif" alt="gameoflife2-btinternet-7" width="64" height="64" /></p>
<p>These are regardless of the local part of the address, and while they’re not the “one colour per domain” that I had hoped for, in aggregate their identity is clear.</p>
<p>In a sense, that was the project completed, I had created something that was unique, quirky and most of all aesthetically pleasing; I could spend hours tweaking and adjusting parameters to make for better avatars, perhaps slowing down the animation further or altering the colour generation methods, but for what it is, the result is better than I expected. This wasn’t the end of the work though. As you may notice, none of the above animations have any protracted dead states, whereby the animation is blank or static for more than a frame — the reason for this is that while I addressed the aesthetics first, I also implemented two heuristics to stop the algorithm in those cases:</p>
<pre class="brush: php">private function isInactive(array $state)
{
	foreach($state AS $row)
	{
		foreach($row AS $cell)
		{
			if($cell == 1)
			{
				return false;
			}
		}
	}

	return true;
}

private function isEqual(array $state1, array $state2)
{
	foreach($state1 AS $h =&gt; $row)
	{
		if($row !== $state2[$h])
		{
			return false;
		}
	}

	return true;
}</pre>
<p>Both are simple and are called after a new state has been generated and output but before it has been switched for the current state, that way the final state will still be shown but no further operations will take place. These both cover the most common cases but ignores more specialised examples such as patterns with an alternating pattern (period two or above) which will not be caught. The inactivity function could be ignored altogether and rely on the equality function but this would require at least two generations of dead cells which I deemed to be unacceptably. Both of these functions are the result of optimisations meant to short-circuit generation and prevent needless calculation. However I wasn’t entirely certain that the cost of iterating through the grid was worth it, so storing away the timings for the original version of the algorithm I did a number of other benchmarks to see whether this was worth it, and to get some general statistics on the process.</p>
<h2>Statistics</h2>
<p>The first run used the first version of the algorithm (no stop checks) on a 32x32 grid over 50 generations on 1382 sample e-mail addresses. The average time for only the game to run (e.g. not counting the seed or colour generation) including GIF generation was 1.2227 seconds; with a standard deviation of 0.0623 and variance of 0.0039, the numbers are fairly solid. On an identical run but including stop checks the average time per run was 0.6297 seconds, while standard deviation and variance were 0.5097 and 0.2598 respectively — obviously the shorter average time is balanced out by the distribution of values, the quickest run finished in 0.0352 seconds while the slowest came out at 1.8953. I wanted the values to be as “real world” as possible which is why I didn’t bother checking for the time taken to process all values or take more detailed timings from within the Life algorithm itself, these numbers represent values likely to be experienced if this was to ever be put into use.</p>
<p>This proved that pruning was worthwhile overall, so I continued on with some comparative testing. Using the same sample set and generation count, an 8x8 grid had an average time per run of 0.3078 seconds while a 64x64 grid average 1.4124 seconds which is still only slightly worse than a 32x32 without any stop state checking. It’s difficult to tell with only three data points how exactly the algorithm is going to increase in time versus grid size but the correlation if it wasn’t obvious before is backed up by the statistics.</p>
<p>With these checked, I moved on to the other secondary aspects of the algorithm, namely the colour and seed generation. I was interested in both how long they took to run and also the quality of the results they produced: namely whether there were any collisions in the colours and seeds. For the original two colour generation from an e-mail address, the time taken was tied to the length of the string being operated on, even so the average time taken was ~0.0001 for strings up to 39 characters long which meant this wasn’t a very computationally expensive operation. The most five common string lengths were between 20 and 25 characters long:</p>
<table border="0">
<thead>
<tr>
<th scope="col">String length</th>
<th scope="col">Average time</th>
<th scope="col">Standard deviation</th>
<th scope="col">Variance</th>
<th scope="col">Count</th>
</tr>
</thead>
<tbody>
<tr>
<td>20</td>
<td>9.30851E-05</td>
<td>1.44625E-05</td>
<td>2.09163E-10</td>
<td>94</td>
</tr>
<tr>
<td>21</td>
<td>9.37708E-05</td>
<td>1.56694E-05</td>
<td>2.45531E-10</td>
<td>96</td>
</tr>
<tr>
<td>22</td>
<td>0.000101972</td>
<td>1.91865E-05</td>
<td>3.68121E-10</td>
<td>107</td>
</tr>
<tr>
<td>23</td>
<td>0.000102602</td>
<td>1.78043E-05</td>
<td>3.16992E-10</td>
<td>133</td>
</tr>
<tr>
<td>24</td>
<td>0.00010361</td>
<td>1.86302E-05</td>
<td>3.47085E-10</td>
<td>118</td>
</tr>
<tr>
<td>25</td>
<td>0.000108533</td>
<td>2.64901E-05</td>
<td>7.01723E-10</td>
<td>137</td>
</tr>
</tbody>
</table>
<p>Not exactly thrilling reading but demonstrates that it was running much as I expected. What was most worrying was when I checked for collisions for each colour (background and foreground): 43 foreground collisions and 83 background collisions and 43 full colour collisions; all of the e-mail addresses were unique so this was a worrying trend. I was particularly concerned with the background colour but even with such a small sample set (in comparison to the number of possible e-mail addresses) there is a 3.1% chance of a full collision which is less than ideal. The one colour generation fared little better:</p>
<table border="0">
<thead>
<tr>
<th scope="col">String length</th>
<th scope="col">Average time</th>
<th scope="col">Standard deviation</th>
<th scope="col">Variance</th>
<th scope="col">Count</th>
</tr>
</thead>
<tbody>
<tr>
<td>20</td>
<td>0.00010434</td>
<td>1.68583E-05</td>
<td>2.84203E-10</td>
<td>94</td>
</tr>
<tr>
<td>21</td>
<td>0.000108323</td>
<td>1.69948E-05</td>
<td>2.88823E-10</td>
<td>96</td>
</tr>
<tr>
<td>22</td>
<td>0.000112514</td>
<td>1.34494E-05</td>
<td>1.80885E-10</td>
<td>107</td>
</tr>
<tr>
<td>23</td>
<td>0.000111835</td>
<td>1.68996E-05</td>
<td>2.85597E-10</td>
<td>133</td>
</tr>
<tr>
<td>24</td>
<td>0.000114466</td>
<td>1.76485E-05</td>
<td>3.11469E-10</td>
<td>118</td>
</tr>
<tr>
<td>25</td>
<td>0.000120416</td>
<td>1.49733E-05</td>
<td>2.24199E-10</td>
<td>137</td>
</tr>
</tbody>
</table>
<p>Not only is it marginally slower but the number of collisions came to 147 which is 10.6% of the sample set, worse than the two colour generation and odd considering the range of colours that this function can produce. The key is in the standard deviation of the colours though: 73.9285, 72.4408, 75.2587 for the red, green and blue components — despite using most of the colour spectrum, they don’t deviate much from the mean which is increasing the likelihood of a collision.</p>
<p>While there are equally detailed statistics for the seed generation, the number show much of what one would expect: time increases according to string length and general time, like the colour generation is in the region of ~0.0001 seconds for average length addresses. What I was most interested in though is the number of identical seeds: 22. This is not entirely unexpected given that the method for generating was cooked up in a few hours and wasn’t subject to any stringent mathematical basis, 1.6% of the sample set isn’t game breaking.</p>
<p>All of this may seem academic for such a small system but the results are invaluable if I ever wanted to advance the script any — the statistics show that the further optimisation of the Life algorithm could yield quicker run times, while the seed and colour generation need work to be able to generate more unique values. The Life algorithm is well travelled by other programmers which means a lot of the complex optimisation work has already been done (<a href="http://www.ddj.com/hpc-high-performance-computing/184406478">Hash Life</a> etc.) whereas the two generators definitely need more work and a more structured approach to their construction.</p>
<p><script src="/wp-includes/js/syntaxhighlighter/shCore.js" type="text/javascript"></script><script src="/wp-includes/js/syntaxhighlighter/shBrushPhp.js" type="text/javascript"></script><script type="text/javascript">// <![CDATA[
                 SyntaxHighlighter.config.clipboardSwf = "/wp-includes/js/syntaxhighlighter/clipboard.swf"; SyntaxHighlighter.defaults.tabSize = 2; SyntaxHighlighter.defaults.toolbar = false; SyntaxHighlighter.all();
// ]]&gt;</script></p>
<h2>Release</h2>
<p>For the moment I’m finished tinkering with the algorithm itself so it makes sense to release the code in case anyone else wishes to use this as a base. The license for this release is <a href="http://creativecommons.org/licenses/by-sa/3.0/">Creative Commons Attribution-Share Alike 3.0</a> which means you can download, redistribute, play and alter but give credit where it’s due. This is version 1.1 which includes the checks for stop conditions.</p>
<p><strong>Conway’s Game of Life in PHP + Seed and Colour generation — version 1.1</strong><br />
<a href="http://chaostangent.com/wp-content/uploads/2009/09/life-1.1.zip">ZIP — 7kb</a> <span class="signOff">¶</span></p>
]]></content:encoded>
			<wfw:commentRss>http://chaostangent.com/2009/09/50-frames-of-life/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sunday afternoon project: Conway’s Game of Life in PHP</title>
		<link>http://chaostangent.com/2009/08/sunday-afternoon-project-conways-game-of-life-in-php/</link>
		<comments>http://chaostangent.com/2009/08/sunday-afternoon-project-conways-game-of-life-in-php/#comments</comments>
		<pubDate>Sun, 30 Aug 2009 20:44:07 +0000</pubDate>
		<dc:creator>chaostangent</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Geekery]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[animated gif]]></category>
		<category><![CDATA[animation]]></category>
		<category><![CDATA[automata]]></category>
		<category><![CDATA[avatar]]></category>
		<category><![CDATA[cellular automata]]></category>
		<category><![CDATA[conways game of life]]></category>
		<category><![CDATA[game of life]]></category>
		<category><![CDATA[gd]]></category>
		<category><![CDATA[gravatar]]></category>
		<category><![CDATA[life]]></category>
		<category><![CDATA[php]]></category>

		<guid isPermaLink="false">http://chaostangent.com/?p=1201</guid>
		<description><![CDATA[On a quiet Sunday afternoon of a Bank holiday weekend, a project is born marrying the cellular automata of John Horton Conway's Game of Life and the automatic generation of avatars. While not entirely successful, the foundation has been laid for further improvement.]]></description>
			<content:encoded><![CDATA[<p class="thumbnails two"><a href="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife-1.gif"><img class="alignnone size-full wp-image-1204" title="gameoflife-1" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife-1.gif" alt="gameoflife-1" width="248" height="140" /></a> <a href="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife-2.gif"><img class="alignnone size-full wp-image-1205" title="gameoflife-2" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife-2.gif" alt="gameoflife-2" width="248" height="140" /></a></p>
<p>As a way of spending my bank holiday Sunday afternoon, I decided to embark on a small project; I didn’t know what the project would be when I first began browsing through <a href="http://en.wikipedia.org">Wikipedia</a> but eventually I ended up in <a href="http://cplus.about.com/lr/programming_challenges/183021/1/">About.com’s C++ challenge section</a>, <a href="http://cplus.about.com/od/programmingchallenges/a/challenge17.htm">one of which</a> concerned <a href="http://en.wikipedia.org/wiki/Conway%27s_Game_of_Life">John Horton Conway’s “Game of Life”</a>: a rudimentary cellular automaton which, after its inception in 1970,  had immeasurable impact on fields as diverse as philosophy and theology. After toying with some ideas, I decided to build a script which automatically creates animations of a number of generations of the game. From that seed the project grew into the first steps towards an avatar system, much like the automatically generated <a href="http://en.gravatar.com/">Gravatars</a> that currently adorn so many Wordpress based blogs.</p>
<blockquote class="pullout"><p>“I wanted something that was deterministic and identifiable”</p>
</blockquote>
<p>The first step was getting the algorithm working, and as I had already decided to make it web-based, that meant a PHP implementation. Using only the Wikipedia page as reference, I threw together a very basic script that allowed me to enter in some settings (grid dimensions, seed and generation limit) and for it to spit out the states between the seed and the generation cut off. After some wrangling with minor bugs (spelling errors, incorrect typing etc.) an unoptimised first version of the algorithm was complete:<span id="more-1201"></span></p>
<pre class="brush: php">private function tick($currentState)
{
	// copy
	$newState = $currentState;
	$width = count($currentState[0]); $height = count($currentState);

	for($h = 0; $h &lt; $height; $h++)
	{
		for($w = 0; $w &lt; $width; $w++)
		{
			$neighbours = 0;
			for($i = 0; $i &lt; 8; $i++)
			{
				$newH = $h; $newW = $w;
				switch($i)
				{
					case 0:
						$newW -= 1; $newH -= 1;
						break;
					case 1:
						$newH -= 1;
						break;
					case 2:
						$newW += 1; $newH -=1;
						break;
					case 3:
						$newW += 1;
						break;
					case 4:
						$newW += 1; $newH += 1;
						break;
					case 5:
						$newH += 1;
						break;
					case 6:
						$newW -= 1; $newH += 1;
						break;
					case 7:
						$newW -= 1;
						break;
				}

				$newW = ($newW &lt; 0) ? ($width + $newW) : $newW;
				$newW = ($newW &gt;= $width) ? ($newW - $width) : $newW;

				$newH = ($newH &lt; 0) ? ($height + $newH) : $newH;
				$newH = ($newH &gt;= $height) ? ($newH - $height) : $newH;

				$neighbours += ($currentState[$newW][$newH]);
			}

			if($currentState[$w][$h] == 1)
			{
				if(($neighbours &lt; 2) || ($neighbours &gt; 3))
				{
					$newState[$w][$h] = 0;
				}
			}
			else
			{
				if($neighbours == 3)
				{
					$newState[$w][$h] = 1;
				}
			}
		}
	}

	return $newState;
}</pre>
<p>Part of the development of this involved settling on a suitable debug output. Initially this was just a pre-formatted output of each state with a 0 representing an inactive cell and a 1 representing an active one; this is fine for quickly checking the state but matching up against more detailed debug output (such as neighbours for a specific cell) proved tricky, especially when you take into account line-heights and other annoyances. I settled on a quick HTML table which had a caption for the current generation of the algorithm and also added co-ordinate references for easy look ups.</p>
<p><a href="http://chaostangent.com/wp-content/uploads/2009/08/debugoutput.png"><img class="alignnone size-medium wp-image-1224" title="debugoutput" src="http://chaostangent.com/wp-content/uploads/2009/08/debugoutput-540x227.png" alt="debugoutput" width="540" height="227" /></a></p>
<p>This is fine for checking a limited number of states or small grids but when those variables climb higher then browser rendering time and page payload size becomes an issue. As mentioned previously, this is the most basic implementation of the algorithm and has not been subject to any optimisation: each cell requires eight lookups and with no heuristic pruning larger grids will take exponentially longer to calculate. There are obvious improvements that could be made as well as storage alterations (<a href="http://en.wikipedia.org/wiki/Quadtree">quad trees</a> et. al.) that could benefit, however until I could measure the algorithm’s performance, this is a decent first attempt. The next step was to output something more useful than HTML tables — enter <a href="http://uk3.php.net/manual/en/ref.image.php">GD</a>. Putting together a very quick function to create a static GIF file:</p>
<pre class="brush: php">private function outputStateToGif(array $state, $generation, $outputDir, $foreground, $background)
{
	$width = count($state[0]); $height = count($state);

	$img = imagecreate($width * 2, $height * 2);
	imagefill($img, 0, 0, imagecolorallocate($img, $background[0], $background[1], $background[2]));

	$onColour = imagecolorallocate($img, $foreground[0], $foreground[1], $foreground[2]);

	for($h = 0; $h &lt; $height; $h++)
	{
		for($w = 0; $w &lt; $width; $w++)
		{
			if($state[$w][$h] == 1)
			{
				imagefilledrectangle($img, $w*2, $h*2, $w*2+1, $h*2+1, $onColour);
			}
		}
	}

	$outputDir = rtrim($outputDir, "/");
	$filename = sprintf("%04d", $generation);

	imagegif($img, "{$outputDir}/{$filename}.gif");
	imagedestroy($img);
}</pre>
<p>With all of this in place it meant that I could now create animated GIF files of the Game of Life without any hassle — combining the GIF files together at the end is a simple case of firing up <a href="http://www.imagemagick.org/script/index.php">ImageMagick</a> with the command:</p>
<pre>convert -delay 10 -loop 0 output/*.gif output/output.gif</pre>
<p>This will merge all of the GIF files together and loop at the end with each frame <a href="http://www.imagemagick.org/script/command-line-options.php#delay">lasting a 10th of a second</a> (default ticks per section is 100). There is the possibility of optimising the animated GIF further with transparency so that each frame takes up the minimum amount of space however with 200 frame animations being scarcely over 100 kilobytes, there doesn’t seem much point especially when taking into account the complexities involved.</p>
<p>With the algorithm in place and output more or less sewn up, it was time to concentrate on the avatar aspect of the project. Ideally I wanted something that was deterministic and identifiable — both of these are currently fulfilled by the automatically generated Gravatar icons which are based off <a href="http://scott.sherrillmix.com/blog/blogger/wp_identicon/">Scott Sherrill-Mix’s WP_Identicon plugin</a>. I would be examining how to  turn an e-mail address into a seed for the algorithm and generating colours to make them unique: trickier said than done with innumerable ways to achieve these goals, but producing <em>good</em> results is tricky.</p>
<p>To generate the seed I converted each character of the e-mail address to a number and then performed some operations on that. Using the built-in <a href="http://uk3.php.net/manual/en/function.ord.php">PHP function ord()</a> was out of the question as this would only deal with ASCII characters (until PHP get their finger out and natively support Unicode) and this would potentially be dealing with Unicode characters; I briefly considered rolling my own function but stumbled upon an <a href="http://hsivonen.iki.fi/php-utf8/">implementation by Henri Sivonen</a> which was based off code from the <a href="http://lxr.mozilla.org/seamonkey/source/intl/uconv/src/nsUTF8ToUnicode.cpp">Mozilla project</a> which would have been scrutinised far more than something I would have cooked up on a sleepy Sunday afternoon. The utf8ToUnicode() function takes a string as a parameter and returns an array of Unicode code point integers: perfect for what I was going to use them for. I spent a great deal of time trying out different operations on those values and eventually settled on one that worked for seeds of dimensions less than 6:</p>
<pre class="brush: php">$pieces = utf8ToUnicode($email);

$previous = 124;
foreach($pieces AS $piece)
{
	$k = intval(sprintf("%01b", (($piece * $previous) &gt;&gt; PHP_INT_SIZE) &amp; 0x1));
	$seed[] = (count($seed) &lt; ($width * $height)) ? $k : (array_shift($seed) &amp; $k);
	$previous = $piece;
}</pre>
<p>After multiplying the value by the previous one (a static initial value is used, I went for 124 as a general ASCII midpoint value), the result is shifted to the right to obtain a single binary value. If the string is longer than the seed (e.g. greater than 25 for a 5x5 grid) then it shifts the first entry from the generated seed and bitwise ANDs it with the value.</p>
<p>This fulfils the deterministic nature of the seed and means the entire string is used rather than stopping when the seed length is reached. If the string is shorter than the seed, it is padded with inactive cells (0’s). This method produces decent results, while they weren’t exactly wildly different from each other (i.e. the chance of seed collisions is fairly high) it successfully went from a string to a seed with acceptable results. I had initially considered cribbing some one-way hash functions from algorithms such as <a href="http://csrc.nist.gov/groups/ST/toolkit/index.html">SHA</a> or <a href="http://labs.calyptix.com/haval.php">HAVAL</a> and flicked through <a href="http://www.schneier.com/book-applied.html">Applied Cryptography</a> to that end, however I eventually decided that this would be overkill for such a simple implementation.</p>
<p>The next step was generating colours. I wanted the domain part of an e-mail address to represent the background colour while the local part would represent the foreground colour; this way it would be obvious who was using <a href="http://mail.google.com">Google Mail</a> or <a href="http://www.hotmail.com">Live Mail</a> etc. but visitors would still be able to remain unique. This generation is definitely an area for improvement as the results will show:</p>
<pre class="brush: php">public static function colours($string)
{
	list($local, $fqdn) = explode("@", $string);
	$multiplier = array_sum(utf8ToUnicode($string));

	$localProduct = array_sum(utf8ToUnicode($local)) * $multiplier;
	$fqdnProduct = array_sum(utf8ToUnicode($fqdn)) * $multiplier;

	$fg = array(($localProduct &gt;&gt; 16) &amp; 0xFF, ($localProduct &gt;&gt; 8) &amp; 0xFF, $localProduct &amp; 0xFF);
	$bg = array(($fqdnProduct &gt;&gt; 16) &amp; 0xFF, ($fqdnProduct &gt;&gt; 8) &amp; 0xFF, $fqdnProduct &amp; 0xFF);

	return array($fg, $bg);
}</pre>
<p>In short this sums the Unicode values of each part of the address and multiplies it by the sum of the entire string, the reason for the multiplication is that the values within a common e-mail address are usually fairly low (ironically within the first 128 ASCII characters) and the resultant number definitely favoured the green / blue end of the spectrum, multiplication introduces a red component in most cases. I toyed with other methods of producing a usable number such as using the product of the values rather than a summation, unfortunately this created a number that was too large to be useful — frequently breaking the maximum integer value limit.</p>
<p>With both the seed and the colours now able to be generated from an e-mail address, it was time to trial it out on a large number of possible addresses.</p>
<p class="thumbnails eight"><img class="alignnone size-full wp-image-1208" title="gameoflife-sample-1" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife-sample-1.gif" alt="gameoflife-sample-1" width="64" height="64" /> <img class="alignnone size-full wp-image-1209" title="gameoflife-sample-2" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife-sample-2.gif" alt="gameoflife-sample-2" width="64" height="64" /> <img class="alignnone size-full wp-image-1210" title="gameoflife-sample-3" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife-sample-3.gif" alt="gameoflife-sample-3" width="64" height="64" /> <img class="alignnone size-full wp-image-1211" title="gameoflife-sample-4" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife-sample-4.gif" alt="gameoflife-sample-4" width="64" height="64" /> <img class="alignnone size-full wp-image-1212" title="gameoflife-sample-5" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife-sample-5.gif" alt="gameoflife-sample-5" width="64" height="64" /> <img class="alignnone size-full wp-image-1213" title="gameoflife-sample-6" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife-sample-6.gif" alt="gameoflife-sample-6" width="64" height="64" /> <img class="alignnone size-full wp-image-1214" title="gameoflife-sample-7" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife-sample-7.gif" alt="gameoflife-sample-7" width="64" height="64" /> <img class="alignnone size-full wp-image-1215" title="gameoflife-sample-8" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife-sample-8.gif" alt="gameoflife-sample-8" width="64" height="64" /> <img class="alignnone size-full wp-image-1216" title="gameoflife-sample-9" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife-sample-9.gif" alt="gameoflife-sample-9" width="64" height="64" /> <img class="alignnone size-full wp-image-1217" title="gameoflife-sample-10" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife-sample-10.gif" alt="gameoflife-sample-10" width="64" height="64" /> <img class="alignnone size-full wp-image-1218" title="gameoflife-sample-11" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife-sample-11.gif" alt="gameoflife-sample-11" width="64" height="64" /> <img class="alignnone size-full wp-image-1219" title="gameoflife-sample-12" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife-sample-12.gif" alt="gameoflife-sample-12" width="64" height="64" /> <img class="alignnone size-full wp-image-1220" title="gameoflife-sample-13" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife-sample-13.gif" alt="gameoflife-sample-13" width="64" height="64" /> <img class="alignnone size-full wp-image-1221" title="gameoflife-sample-14" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife-sample-14.gif" alt="gameoflife-sample-14" width="64" height="64" /></p>
<p>This is a random sampling from over 1,300 addresses which I used to get benchmarks and timing data (a topic for another time) — each is a 32x32 grid (double sized to 64x64 for the images) and are limited to 50 generations. The colours are obviously all shifted towards the blue/green area of the spectrum due to the way they are generated, there are no bright reds or certain colour mixtures such as yellows or oranges which makes them very bland, especially when combined with the choice of foreground colours which can easily lack contrast. The next problem is the mixed bag of animations given a 5x5 seed, obviously I wasn’t expecting a raft of infinite growth seeds but, having looked at them in aggregate, the majority tend to dissolve after only a few generations or become static, both of which make for boring avatars.</p>
<h2>Ways forward</h2>
<p>I’m certainly not finished with this idea yet as I think it definitely has legs although the measure will be in the details. The first aspect to address is the visuals, I had expected more growth within 50 generations but a lot of the grids could be reduced down to 16x16 which would allow for visibly larger and subsequently more interesting cell animations — as the environment is toroidal in nature this may turn out to be beneficial to growth. Colours are also high priority, choosing a background colour based upon domain name of the e-mail was perhaps the wrong way to go and settling on a single background colour and a different foreground colour will likely help immeasurably with aesthetics.</p>
<p>I’ve still to collate the timing data I gathered from the run on the e-mail database however my initial examination indicates that the Game of Life algorithm itself likely doesn’t need overhauling to improve speed, proposed optimisations such as <a href="http://tomas.rokicki.com/hlife/">Hashlife</a> improve speed at the cost of memory and with such small grids the effect would likely be negligible. The most time consuming aspect of the script is the generation of the GIF files and subsequent I/O which is done regardless of whether the state has normalised or entirely died out — pruning and state checking would likely be immensely beneficial.</p>
<p><script src="/wp-includes/js/syntaxhighlighter/shCore.js" type="text/javascript"></script><script src="/wp-includes/js/syntaxhighlighter/shBrushPhp.js" type="text/javascript"></script><script type="text/javascript">// <![CDATA[
                SyntaxHighlighter.config.clipboardSwf = "/wp-includes/js/syntaxhighlighter/clipboard.swf"; SyntaxHighlighter.defaults.tabSize = 2; SyntaxHighlighter.defaults.toolbar = false; SyntaxHighlighter.all();
// ]]&gt;</script></p>
<h2>Conclusion</h2>
<p>For a Sunday afternoon the results are not exactly ground breaking but are interesting and act as a good base for further enhancement — the possibilities are immense and it’s always gratifying to see a project come together in a short space of time. That said, I think I’ve seen enough animated GIFs of tiny blinking blocks for one day. <span class="signOff">¶</span></p>
]]></content:encoded>
			<wfw:commentRss>http://chaostangent.com/2009/08/sunday-afternoon-project-conways-game-of-life-in-php/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Building the carousel</title>
		<link>http://chaostangent.com/2009/08/building-the-carousel/</link>
		<comments>http://chaostangent.com/2009/08/building-the-carousel/#comments</comments>
		<pubDate>Mon, 03 Aug 2009 21:30:57 +0000</pubDate>
		<dc:creator>chaostangent</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Geekery]]></category>
		<category><![CDATA[carousel]]></category>
		<category><![CDATA[component]]></category>
		<category><![CDATA[delicious]]></category>
		<category><![CDATA[feed]]></category>
		<category><![CDATA[framework]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[javascript]]></category>
		<category><![CDATA[last.fm]]></category>
		<category><![CDATA[lastfm]]></category>
		<category><![CDATA[library]]></category>
		<category><![CDATA[lovefilm]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[prototype]]></category>
		<category><![CDATA[rest]]></category>
		<category><![CDATA[rss]]></category>
		<category><![CDATA[scriptaculous]]></category>
		<category><![CDATA[xmlrpc]]></category>
		<category><![CDATA[youtube]]></category>
		<category><![CDATA[zend]]></category>
		<category><![CDATA[zend framework]]></category>

		<guid isPermaLink="false">http://chaostangent.com/?p=603</guid>
		<description><![CDATA[
The newest addition to chaostangent.com is the carousel nestling comfortably at the foot of every page. Sporting a variety of “social media” feeds as well as other morsels, it showcases a number of interesting technologies and techniques including: a fully looping carousel (JavaScript and CSS), integration with numerous external APIs (PHP, Zend Framework), screen-scraping and [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://chaostangent.com/wp-content/uploads/2009/08/img-carousel.jpg"><img class="alignnone size-full wp-image-607" title="chaostangent.com footer carousel" src="http://chaostangent.com/wp-content/uploads/2009/08/img-carousel.jpg" alt="chaostangent.com footer carousel" width="540" height="151" /></a></p>
<p>The newest addition to chaostangent.com is the carousel <a href="#carousel">nestling comfortably at the foot of every page</a>. Sporting a variety of “social media” feeds as well as other morsels, it showcases a number of interesting technologies and techniques including: a fully looping carousel (JavaScript and CSS), integration with numerous external APIs (<a href="http://www.php.net">PHP</a>, <a href="http://framework.zend.com">Zend Framework</a>), screen-scraping and local caching of results to name but a few. It successfully fulfils the primary goal I had for it: cramming as much functionality into a contained a space as reasonably possible.</p>
<blockquote class="pullout"><p>“I can just boot up Zend_Service_Delicious and be done with it right? If only things were that simple.”</p>
</blockquote>
<h2>JavaScript</h2>
<p>The <a href="http://www.smileycat.com/design_elements/carousels/">carousel interface</a> is design du-jour at the moment — sported by sites such as <a href="http://www.apple.com/uk/mac/">Apple</a>, <a href="http://www.bbc.co.uk/iplayer">BBC iPlayer</a> and <a href="http://www.gametrailers.com">Gametrailers</a> — they manage selective display of information while still providing a high degree of interactivity. In short: they’re swish and solve the problem of too much to feature in too little space. The carousel library I am using is a simplified, stripped-down version of one I developed for a large work project — for this reason I’m unable to release it under any kind of license. The original has a number of features that I wouldn’t be using including automated construction of a “jump to” control and being able to navigate over a number of entries at once. My library is the only one I know of which successfully loops, providing an “infinite” carousel of sorts; <a href="http://www.prototype-ui.com/">other publicly available libraries</a> cease at either end of the carousel which in some situations is more intuitive but the challenge of making one <em>not </em>do this was posed to me, and I couldn’t very well pass it up.<br />
<span id="more-603"></span><br />
The library works as such:</p>
<ul>
<li>If moving right on the carousel (usually seen as “advancing” or moving forwards in countries with left-to-right written language) the first element within the carousel has its margin tweened from zero to negative its width e.g. if an element is 100 pixels wide the margin would tween from 0 to –100 pixels. This tween uses the <a href="http://script.aculo.us">Scriptaculous effects library</a> and “pulls” the item out of sight.</li>
<li>Once this is complete the now hidden element is removed  from the start of the carousel items and appended  to the end.</li>
<li>With this done the now appended element’s margin is set back to zero.</li>
</ul>
<ul>
<li>If moving left on the carousel (usually seen as “receding” or moving backwards) the last element of the carousel is modified with a negative margin equal to its width.</li>
<li>That element is then appended to the beginning of the carousel items.</li>
<li>Once done, the negative margin is then tweened to zero, effectively “pushing” the carousel item into view.</li>
</ul>
<p>Both effects can be done in a few lines of JavaScript and the act of moving an element from one end of the carousel to another can be done in a single line, e.g. for the “move right” action:</p>
<pre class="brush: javascript">this.elem.insert(this.elem.childElements().first().remove().setStyle({"marginLeft": 0}));</pre>
<p>The majority of the class (thank you <a href="http://www.prototypejs.org/api/class">Prototype</a>) concerns itself with ensuring that there are no glitches during the transition: it locks itself to prevent repeated clicks which could cause the carousel items to resequence themselves and the visual smoothness to be lost. The transition itself is non-standard and was originally pioneered by <a href="http://www.robertpenner.com/">Robert Penner for Flash</a> and is commonly termed the “Easing equations” — these provide smooth, life like movement for animations. The equation used for this is the “EaseFromTo” equation adapted by <a href="http://kendsnyder.com/sandbox/easing/">Ken Snyder for Scriptaculous</a>.</p>
<h2>Markup and styling</h2>
<p>The carousel would not operate as it does without some fairly dense and brutal markup and styling. It is comprised of an outside container which acts as a window onto the contents of the unordered list below it: each item within the list is treated as a single carousel item. The unordered list must be in one line otherwise there is a jarring “drop in” effect for items; for this reason the container has its overflow set to hidden, the unordered list has its white-space set to “nowrap” and the list items are set to display as inline-blocks. That’s the crux of the styling, obviously with IE6 and 7 not supporting inline-block displays, this won’t work but thankfully they abuse the “inline” property enough for them to play nicely:</p>
<pre class="brush: css">#external.scripted { overflow: hidden; }
#external.scripted &gt; ul { white-space: nowrap; overflow: hidden; }
#external &gt; ul &gt; li { display: inline-block; width: 277px; }

/* In style-ie6.css */
#external ul li { display: inline; }

/* In style-ie7.css */
#external &gt; ul &gt; li { display: inline; }</pre>
<p>The benefit to this markup and styling is that there are numerous possibilities for accessibility improvements when JavaScript is disabled: if the information is unimportant (e.g. not the primary navigation on the page which is a bad idea anyway) then the default visible carousel items can be left as-is; for a quick and dirty accessible solution, removing the hidden overflow and white-space declarations from the carousel means all of the items will follow the normal page flow — this however can cause a conspicuous “snap-in” effect when the script is loaded. The most effective method however is to change the overflow-x property to “auto” to allow for horizontal scrolling — a native and lo-fi way of providing access to the full carousel; this has its own foibles though including the necessity of setting a width on the default, no-script carousel, this is offset by loading of the scripted controls being more subtle than the other methods. I opted not to implement this as the contents are superfluous enough to be ignored, especially so close to the foot of the page.</p>
<p>Probably the most taxing part of the markup was working out the width a margins for each carousel item so that it aligned with the columns / grid I had set up for the site — this involved a bit of number crunching and a selection of scribbled diagrams.</p>
<h2>PHP</h2>
<p>Before chaostangent.com was a blog it was a splash page that served much the same purpose as the carousel: it aggregated a lot of social media into a small space (before it did that it was a simple splash page, and before that it was a blog again but I digress). For this reason a lot of the work involving PHP had already been done — or so I initially thought. There a subtle tribulations associated with the back end code that didn’t become apparent until I began implementing it.</p>
<p>The Zend Framework supplies a lot of <a href="http://framework.zend.com/manual/en/zend.service.html">Zend_Service_*</a> classes which — in theory — take a lot of the effort out of interacting with external APIs. I can for instance just boot up <a href="http://framework.zend.com/manual/en/zend.service.delicious.html">Zend_Service_Delicious</a> and be done with it right? If only things were that simple. I needed to cache the results of the queries — using <a href="http://framework.zend.com/manual/en/zend.cache.html">Zend_Cache</a>, natch — so that the queries wouldn’t be done on a page request and bog down the server and the target API — however passing some Zend_Service_* objects into Zend_Cache proved problematic as they weren’t set up for serialization and thus not correctly stored. For this reason I use the <a href="http://framework.zend.com/manual/en/zend.feed.html">Zend_Feed</a> component to read in my <a href="http://feeds.delicious.com/v2/rss/chaostangent?count=15">delicious RSS feed</a> which provides all that I need and was able to be successfully serialized and cached; Zend_Feed was likewise used for the two <a href="http://www.irisassociates.com">other</a> <a href="http://japanographia.com">blogs</a> I post to. The YouTube listing of my favourite videos is <a href="http://gdata.youtube.com/feeds/base/users/ChaosTangent/favorites?client=ytapi-youtube-user&amp;v=2">available as an RSS feed</a>, however the feed doesn’t contain interesting video information such as rating, length and so forth, so using the <a href="http://framework.zend.com/manual/en/zend.gdata.html">Zend_Gdata</a> component seemed like a sure fit. Unfortunately querying something relatively simple like a favourite video feed for a user causes <strong>a lot</strong> of data to be generated, so much so that even with only four results the cache file for YouTube comes out at over half a megabyte and obviously spikes the memory usage when loading this in. There is also the bizarre deficiency that while the RSS feed contained the date and time of when I had made a video a favourite, the Zend_Gdata classes did not. So I either compromised on my desires for the YouTube element of the carousel, rolled my own storage for the feed or just suck it up. I opted for the latter and vowed to optimise it for memory usage at a later time.</p>
<h3>Last.fm</h3>
<p><a href="http://ws.audioscrobbler.com/1.0/user/chaostangent/recenttracks.rss">My last.fm feed</a> was to be one of the highlights of the carousel so I put the most effort into it. Lamentably, the Zend_Service component was for the ageing <a href="http://framework.zend.com/manual/en/zend.service.audioscrobbler.html">AudioScrobbler service</a> which while  functionality compatible with the <a href="http://www.last.fm/api">updated last.fm API</a> but omitted several necessary features. This included the guarantee that results for my most recent tracks would include images and links. I pontificated writing a whole new Zend_Service component but put that on the back burner and dove into using the API directly. To cut a long story short I shied away from using <a href="http://framework.zend.com/manual/en/zend.rest.html">Zend_Rest</a> as PHP’s function naming <a href="http://uk.php.net/manual/en/functions.user-defined.php">didn’t allow</a> for periods within function names which more or less forced me to use the XML-RPC component: <a href="http://framework.zend.com/manual/en/zend.xmlrpc.html">Zend_XmlRpc</a>. Working around its refusal to return objects — instead defaulting to quote-escaped XML strings which I then converted to <a href="http://uk.php.net/manual/en/book.simplexml.php">SimpleXML</a> elements — I conditionally stacked up a <a href="http://www.last.fm/api/show?service=278">number</a> <a href="http://www.last.fm/api/show?service=407">of</a> <a href="http://www.last.fm/api/show?service=290">queries</a> to ensure that the information I needed was, within reasonable parameters, always available. A glitch did occur with using SimpleXML and the <a href="http://uk.php.net/manual/en/function.simplexml-element-xpath.php">xpath function</a> whereby it tended to cache the xpath result if it was used on a separate child node from the same parent. For example: the main XML RPC call to get my recent tracks</p>
<pre class="brush: php">$result = $xmlrpc-&gt;call("user.getRecentTracks", array($p));
$sx = new SimpleXMLElement(stripslashes($result));
foreach($sx-&gt;recenttracks-&gt;track AS $track)</pre>
<p>Now if I iterate over each track and do</p>
<pre class="brush: php">$image = (string)reset($track-&gt;xpath("//image[@size='small']"));</pre>
<p>$image should contain the individual image for each track. Not so, $image will always retain the <em>first</em> image that xpath() returned as the results are silently cached by SimpleXML / PHP. To get around this you have to splinter the SimpleXMLElement from its parent with an otherwise redundant</p>
<pre class="brush: php">$track = new SimpleXMLElement($track-&gt;asXML());</pre>
<p>A simple fix but a pain to debug.</p>
<h3>LOVEFiLM</h3>
<p>The last element I wanted to include on my carousel was my <a href="http://www.lovefilm.com/">LOVEFiLM</a> (similar to <a href="http://www.netflix.com/">Netflix</a> in the US) recent DVD rental list. LOVEFiLM is <a href="http://lovefilmaffiliates.blogspot.com/">apparently trialling</a> a <a href="http://twitter.com/LOVEFiLMAPI">fully fledged API</a> developed by in conjunction with an external company; however I didn’t need a full API, only a list of my most recently rented and rated. <a href="http://www.lovefilm.com/account/previously_rated.html">The page</a> for this was uniform enough so I set about trying to find a way to pull in that page and working on it from there. The most likely candidate was a cookie left by the website called “lovefilm_session” which has a lifetime of two weeks and the value looks like a standard MD5 hash. PHP <a href="http://uk.php.net/manual/en/book.session.php">doesn’t place any IP restrictions on sessions</a> so I was hoping that the <a href="http://www.lovefilm.com/corporate/jobs_info.html?editorial_id=4963">Perl</a> implementation the site was using was similar and simply sending the cookie along with a request for the page. This worked a treat in local testing and thankfully carried across to production without incident. The next task was getting the desired information out — being an in-production site meant that simply loading the HTML (despite being under the XHTML 1.0 transitional DTD) into an XML interpreter threw up numerous errors and was unworkable which meant regular expressions were my only recourse.</p>
<p>Using a suitably generic regex to get the title, rating and the URL for the DVD in question required some fine tuning — it still needs to match on certain patterns which makes it inherently fragile. Hopefully by the time LOVEFiLM update their site their API will have been released for consumption.<br />
<script src="/wp-includes/js/syntaxhighlighter/shCore.js" type="text/javascript"></script> <script src="/wp-includes/js/syntaxhighlighter/shBrushPhp.js" type="text/javascript"></script><br />
<script src="/wp-includes/js/syntaxhighlighter/shBrushJScript.js" type="text/javascript"></script><script src="/wp-includes/js/syntaxhighlighter/shBrushCss.js" type="text/javascript"></script> <script type="text/javascript">// <![CDATA[
             SyntaxHighlighter.config.clipboardSwf = "/wp-includes/js/syntaxhighlighter/clipboard.swf"; SyntaxHighlighter.defaults.tabSize = 2; SyntaxHighlighter.defaults.toolbar = false; SyntaxHighlighter.all();
// ]]&gt;</script></p>
<h2>Conclusion</h2>
<p>Overall the carousel took a full weekend of planning, design, build and testing as well as some last minute tweaks such as digging into the <a href="http://codex.wordpress.org/Function_Reference/get_the_time">WordPress date functions</a> for <a href="http://codex.wordpress.org/Function_Reference/date_i18n">internationalising dates and times</a> to complete but the result is better than I could have hoped for. There are always more social media sites out there so the possibilities for additions and enhancements are great — I can only see the carousel getting better as time goes on. <span class="signOff">¶</span></p>
]]></content:encoded>
			<wfw:commentRss>http://chaostangent.com/2009/08/building-the-carousel/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Tidbits from gallery.chaostangent.com</title>
		<link>http://chaostangent.com/2009/08/tidbits-from-gallery-chaostangent-com/</link>
		<comments>http://chaostangent.com/2009/08/tidbits-from-gallery-chaostangent-com/#comments</comments>
		<pubDate>Sat, 01 Aug 2009 14:38:14 +0000</pubDate>
		<dc:creator>chaostangent</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Geekery]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[firebug]]></category>
		<category><![CDATA[framework]]></category>
		<category><![CDATA[functions]]></category>
		<category><![CDATA[gallery.chaostangent.com]]></category>
		<category><![CDATA[gd]]></category>
		<category><![CDATA[image]]></category>
		<category><![CDATA[imagemagick]]></category>
		<category><![CDATA[javascript]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[resizing]]></category>
		<category><![CDATA[sql]]></category>
		<category><![CDATA[swfupload]]></category>
		<category><![CDATA[zend framework]]></category>

		<guid isPermaLink="false">http://chaostangent.com/?p=594</guid>
		<description><![CDATA[Exploring some parts of the code behind gallery.chaostangent.com including database functions for dealing with modified preorder tree traversal models, image resizing in PHP and JavaScript numeracy functions.]]></description>
			<content:encoded><![CDATA[<p>These are some of the neater parts of <a href="http://gallery.chaostangent.com">gallery.chaostangent.com</a> that don’t warrant a full exploration on their own but serve the goal of making the application more streamlined. I’ve crafted these examples to be focused so they don’t contain superfluous details like error checking, timestamp columns and the like.</p>
<h2>Database</h2>
<p>The gallery schema is as follows:</p>
<pre class="brush: sql">CREATE TABLE IF NOT EXISTS `galleries` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `left` int(10) unsigned NOT NULL default '0',
  `right` int(10) unsigned NOT NULL default '0',
  `parent` int(10) unsigned NOT NULL default '0',
  `title` tinytext NOT NULL,
  `directory` tinytext NOT NULL,
  PRIMARY KEY  (`id`),
  KEY `parent` (`parent`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 ;</pre>
<p>This covers both the <a href="http://www.sitepoint.com/article/hierarchical-data-database/2/">Modified Preorder Tree Traversal</a> (‘left‘ and ‘right‘ columns) model as well as the more standard hierarchical model (‘parent‘ column). I’m still undecided as to whether indexing the ‘left‘ and ‘right‘ columns provides any benefits. Most of the queries on the gallery table involve getting the direct children of a particular node; the breadcrumb trail at the top of the page however is built using the ‘left‘ and ‘right‘ columns:</p>
<pre class="brush: sql">SELECT * FROM `galleries` WHERE (`left` &gt;= ?) AND (`right` &lt;= ?) ORDER BY `left`</pre>
<p>Doing a multi-column index in MySQL <a href="http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html">works from the left column onwards</a>, so for the above query, indexing on ‘left‘ and ‘right‘ would be a benefit. However when inserting and deleting nodes, queries are done singularly e.g. one for ‘left‘ and one for ‘right‘ which having an index on one and not the other may turn out to be detrimental in terms of update times. I could always do two indexes:</p>
<pre class="brush: sql">ALTER TABLE `galleries` ADD INDEX ( `left` , `right` ) ;
ALTER TABLE `galleries` ADD INDEX ( `right` , `left` ) ;</pre>
<p>This runs the risk though of having a table that’s more index than data. I haven’t done a full benchmark of the different queries for each scenario but I would imagine only for large trees would indexing provide any tangible benefit.<br />
<span id="more-594"></span><br />
The two database functions which do a lot of the heavy lifting for the gallery table are insertion and deletion:</p>
<pre class="brush: sql">CREATE FUNCTION `addGallery`(_parent INT, _title TEXT, _directory TEXT) RETURNS int(11)
BEGIN
  SELECT `left`, `right` INTO @pleft, @pright FROM `galleries` WHERE id = _parent LIMIT 1;
  UPDATE `galleries` SET `right` = `right` + 2 WHERE `right` &gt; (@pright - 1);
  UPDATE `galleries` SET `left` = `left` + 2 WHERE `left` &gt; (@pright - 1);
  INSERT INTO `galleries` (`left`, `right`, parent, title, directory) VALUES (@pright, (@pright + 1), _parent, _title, _directory);
  RETURN LAST_INSERT_ID();
END

CREATE FUNCTION `deleteGallery`(_id INT) RETURNS int(11)
BEGIN
  SELECT `left`, `right` INTO @left, @right FROM `galleries` WHERE id = _id;
  DELETE FROM `images` WHERE gallery_id = _id;
  DELETE FROM `galleries` WHERE id = _id;
  SELECT ROW_COUNT() INTO @ret;
  UPDATE `galleries` SET `right` = (`right` - 2) WHERE `right` &gt; @right;
  UPDATE `galleries` SET `left` = (`left` - 2) WHERE `left` &gt; @left;
  RETURN @ret;
END</pre>
<p>I’ve yet to find a good way of labelling an SQL function’s parameters as they usually have an identical name to columns I’m using within queries. The add function can insert a node anywhere, whereas the delete function can only consistently delete leaf nodes. One of the major drawbacks to MPTT model is that moving existing nodes about the tree or deleting subtrees is tricky as it involves either retaining a lot of data or re-keying the entire table after the operation, neither of which are ideal. There’s nothing complex going on in the functions, once you get your head around how MPTT works these should become self-explanatory. The added benefit of wrapping these operations up are that they’re treated as transactional which saves an extra two queries (“START TRANSACTION” and “COMMIT / ROLLBACK”) if these were being done in code.</p>
<h2>Images</h2>
<p>One of the many great things about the <a href="http://framework.zend.com">Zend Framework</a> is that the developers have managed to <a href="http://framework.zend.com/manual/en/zend.form.standardElements.html#zend.form.standardElements.file">streamline file uploading for forms</a> which means there’s no more explicit checking of error conditions, temporary files and whatnot — getting an image from the user into the application is now relatively painless. On a <a href="http://framework.zend.com/manual/en/zend.db.table.row.html#zend.db.table.row.extending.insert-update">pre-insert hook</a> for an Image model I do some sanity checks (directories writeable etc.) then do a simple <a href="http://uk3.php.net/manual/en/function.getimagesize.php">getimagesize()</a> and <a href="http://uk3.php.net/manual/en/function.filesize.php">filesize()</a> to grab the file’s important measurements. Once the database row has been inserted, I have a post-insert hook that generates all the different versions — the reason this is post rather than pre is that versions are named according to the ID of the image in the database, the uploaded image meanwhile retains its original filename wherever possible.</p>
<p>The image function I always seem to use is “fit to area”: you have a dimension that you’d like an image to fit within and retain its original proportions:</p>
<pre class="brush: php">private function fitToArea($width, $height, $target)
{
	$newWidth = $newHeight = $target;
	if($width &gt; $height)
	{
		$newHeight = round(($height / $width) * $target);
	}
	elseif($height &gt; $width)
	{
		$newWidth = round(($width / $height) * $target);
	}

	return array(
		0 =&gt; $newWidth, "width" =&gt; $newWidth,
		1 =&gt; $newHeight, "height" =&gt; $newHeight
	);
}</pre>
<p>Variations of this can be made for performing “fit to width” and “fit to height” or what gallery.chaostangent.com used to do which was absolutely square thumbnails. The above could be boiled down into a couple of <a href="http://uk3.php.net/manual/en/language.operators.comparison.php">ternary operations</a> but I like to keep it expanded and easy to follow.</p>
<p>The physical resizing of the image is done one of two ways: <a href="http://www.imagemagick.org/">ImageMagick</a> or <a href="http://uk3.php.net/manual/en/ref.image.php">GD</a>. The former is the preferred method but the latter is more widely supported and is cross platform due to its nature as a PHP module rather than an external executable (N.B. I’m aware there exists an <a href="http://uk3.php.net/manual/en/book.imagick.php">ImageMagick module for PHP</a> but have not used it as in theory it has all the problems of GD in terms of memory usage and time-outs so when I use “ImageMagick” here, I’m referring to the command line executable). The ImageMagick command which does the work is:</p>
<pre>convert "filename"[0] -strip -resize widthxheight -sharpen 0x1.0 -quality quality -colorspace RGB "target"</pre>
<p>There are a few non-standard parts in there:</p>
<ul>
<li><a href="http://www.imagemagick.org/script/command-line-options.php#strip">–strip</a> gets rid of superfluous information (EXIF, comments, colour profiles etc.)</li>
<li><a href="http://www.imagemagick.org/script/command-line-options.php#colorspace">–colorspace</a> forces the resultant image into RGB which is supported across all browsers, JPGs can also be CMYK which is a bit iffy with browser support</li>
<li><a href="http://www.imagemagick.org/script/command-line-options.php#sharpen">–sharpen</a> image convolution which sharpens the image, sharpening should <em>always</em> be done on any image reduction</li>
<li>[0] this selects the first frame of what could be a multi-frame image (animated GIF or PNG), some versions of ImageMagick will force animated file type creation, overriding whatever you may have in the target if this isn’t present</li>
</ul>
<p>Doing this with GD takes a lot more code which means a lot more chances for errors to crop up. The first thing you have to do is get the image type so you can load it into GD’s proprietary format, you can get this with getimagesize() then using one of the <a href="http://uk3.php.net/manual/en/function.imagecreatefromjpeg.php">imagecreatefromx()</a> functions. Once you have that, you have to check if the image is true colour — 8bit PNGs and GIFs use palettes which make resizing/resampling ugly:</p>
<pre class="brush: php">if(!imageistruecolor($res))
{
	$tc = imagecreatetruecolor($imageInfo[0], $imageInfo[1]);
	imagecopy($tc, $res, 0, 0, 0, 0, $imageInfo[0], $imageInfo[1]);
	imagedestroy($res);

	$res = $tc;
	$tc = null;
}</pre>
<p>I believe this was originally taken from a PHP.net comment so kudos to the original author. Once you’re sure you have a true colour image:</p>
<pre class="brush: php">$tRes = imagecreatetruecolor($width, $height);
imagecopyresampled($tRes, $res, 0, 0, 0, 0,
	$width, $height, $imageInfo[0], $imageInfo[1]);</pre>
<p>This copies and resamples the image to the desired size. $width and $height are the target sizes while $imageInfo contains the original image dimensions. At this point you can output to a JPG and be done with it, however I believe in the benefits of sharpening which lamentably GD does not have an in-built function for. In comes <a href="http://loriweb.pair.com/8udf-sharpen.html">image convolutions</a>:</p>
<pre class="brush: php">imageconvolution($tRes, array(
	array(-1,-1,-1), array(-1,20,-1), array(-1,-1,-1)
), 12, 0);</pre>
<p>The array notion is a little annoying but essentially this applies a matrix, divisor and offset to the image (every pixel for every channel) which accents the edges making the image appear crisper. Depending on the types of images you’re dealing with will define which central number and divisor you use but I recommend playing with the values to get the best result. Convolving an image is not a cheap operation and for large images this can be lengthy and computationally intensive; there is also the problem that the <a href="http://uk3.php.net/manual/en/function.imageconvolution.php">imageconvolution()</a> function didn’t exist prior to PHP 5.1 so if you’re using an earlier version (and not matter what people say, PHP 4 is <em>still</em> in use) then you’re out of luck unless you want to do the convolution by hand using <a href="http://uk3.php.net/manual/en/function.imagecolorat.php">imagecolorat()</a>.</p>
<h2>JavaScript</h2>
<p>Apart from the image addition page, there is only a smattering of JavaScript throughout the site to enhance certain aspects. The possibility exists for me to do AJAX calls for galleries so that a user never has to reload the page however the payload for an AJAX request isn’t going to be much more than for a full page request — if the design was more complex then there would be an argument for it however as it is, there isn’t the justification for either loading the HTML directly or loading XML or JSON and transforming that on the page. I did end up adding a small bit of JavaScript to vertically centre the images within a gallery as CSS doesn’t do this reliably:</p>
<pre class="brush: javascript">$$("#gallery li a img").each(function(s) {
	s.setStyle({
		// quick and dirty vertical centering
		marginTop: Math.round((175 - s.getHeight()) / 2)+"px"
	});
});</pre>
<p>This uses <a href="http://www.prototypejs.org/api/utility/dollar-dollar">Prototype’s selection function</a>, ordinarily by the point I reach this function I’ve already assigned #gallery to a variable which means I can do a scoped selection (e.g. <a href="http://www.prototypejs.org/api/element/select">variable.select(“li a img”)</a> rather than using $$()). I hard code the value just for expediency, you could just as easily find out the height of the containing li element using s.up(“li”).getHeight() however for large pages of images this could be slow as you’re then doing an extra DOM call per image.</p>
<p>As I <a href="http://chaostangent.com/2009/07/rebuilding-gallery-chaostangent-com/">mentioned before</a> <a href="http://swfupload.org/">SWFUpload</a> requires a lot of JavaScript upfront to make it play nice — I usually create an object with all the <a href="http://demo.swfupload.org/Documentation/#events">SWFUpload function hooks</a> and then just fill them in as and when I require them. This means I can have a skeleton object which I can drag and drop into any project where I’m using SWFUpload. I find it useful to set the <a href="http://demo.swfupload.org/Documentation/#debug">debug function</a> to output to the <a href="http://getfirebug.com/console.html">Firebug console (console.log)</a> and to turn on debugging so I know what’s going on. Thankfully the library comes with several helpers which cover just about everything you could want to do with it: speed, cookie and queue integrate well and do what you would expect. One of the most helpful functions I wrote concerned converting from bytes into a more sensible denomination (kilobytes, megabytes, gigabytes) dependent on the value provided:</p>
<pre class="brush: javascript">var fileSize = function(sizeInBytes)
{
	if(sizeInBytes &gt; 1073741824)
	{
		return Math.round((sizeInBytes * 100) / 1073741824) / 100 + " GB";
	}
	else if(sizeInBytes &gt; 1048576)
	{
		return Math.round((sizeInBytes * 100) / 1048576) / 100 + " MB";
	}
	else if(sizeInBytes &gt; 1024)
	{
		return Math.round(sizeInBytes / 1024) + " KB";
	}
	return sizeInBytes + " B";
};</pre>
<p>It takes account of JavaScript’s lack of a fully featured round() function and multiplies and divides accordingly. This works in numerous places such as totally up the selected file sizes and the current speed of the upload.<br />
<script src="/wp-includes/js/syntaxhighlighter/shCore.js" type="text/javascript"></script> <script src="/wp-includes/js/syntaxhighlighter/shBrushPhp.js" type="text/javascript"></script><br />
<script src="/wp-includes/js/syntaxhighlighter/shBrushSql.js" type="text/javascript"></script> <script src="/wp-includes/js/syntaxhighlighter/shBrushJScript.js" type="text/javascript"></script><br />
<script type="text/javascript">// <![CDATA[
         SyntaxHighlighter.config.clipboardSwf = "/wp-includes/js/syntaxhighlighter/clipboard.swf"; SyntaxHighlighter.defaults.tabSize = 2; SyntaxHighlighter.defaults.toolbar = false; SyntaxHighlighter.all();
// ]]&gt;</script></p>
<h2>Conclusion</h2>
<p>There are a raft of other parts to gallery.chaostangent.com which merit exploring but are more intrinsically tied to the context of the site rather than the above which are useful in isolation. <span class="signOff">¶</span></p>
]]></content:encoded>
			<wfw:commentRss>http://chaostangent.com/2009/08/tidbits-from-gallery-chaostangent-com/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Calculating the geodesic distance between two points</title>
		<link>http://chaostangent.com/2009/07/calculating-the-geodesic-distance-between-two-points/</link>
		<comments>http://chaostangent.com/2009/07/calculating-the-geodesic-distance-between-two-points/#comments</comments>
		<pubDate>Mon, 27 Jul 2009 22:11:24 +0000</pubDate>
		<dc:creator>chaostangent</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Geekery]]></category>
		<category><![CDATA[airy]]></category>
		<category><![CDATA[antipodal]]></category>
		<category><![CDATA[calculation]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[distance]]></category>
		<category><![CDATA[geodesic]]></category>
		<category><![CDATA[haversine]]></category>
		<category><![CDATA[latitude]]></category>
		<category><![CDATA[longitude]]></category>
		<category><![CDATA[maths]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[script]]></category>
		<category><![CDATA[suppliers]]></category>
		<category><![CDATA[vincenty]]></category>
		<category><![CDATA[wgs84]]></category>

		<guid isPermaLink="false">http://chaostangent.com/?p=569</guid>
		<description><![CDATA[Tasked with setting up a supplier search, I opted to use the most complex and precise formula available for calculating the distance between two points. I then implemented into a database function for easy computational overhead.]]></description>
			<content:encoded><![CDATA[<p>I was recently tasked with recreating an existing supplier search for a client; I was provided with a database of suppliers, most of which had been geocoded, and not much else. This scenario is fairly standard when dealing with <a href="http://www.littlechef.co.uk/findalittlechef.php">mapping applications</a>: a user enters in a postcode and the system will return a list of the closest suppliers to that location. The postcode part of this equation is well travelled — the <a href="http://www.postoffice.co.uk">Post Office</a> in the UK will not relinquish the mapping from a postcode to a latitude, longitude tuple without a <a href="http://www.royalmail.com/portal/rm/jump2?mediaId=400085&amp;catId=400084">large outlay of cash</a> (and numerous non-disclosure agreements), the easiest option is to use an external service for this. I opted for <a href="http://www.postcodeanywhere.co.uk">PostcodeAnywhere</a> as I had used them before with great success. The latter part of this challenge — the return of the closest database entries — was something that I wanted to try myself as I didn’t known when I would get such an opportunity again.</p>
<blockquote class="pullout"><p>“if something is worth doing, then it’s worth overdoing”</p>
</blockquote>
<p>To say there are many different ways of calculating the distance between two points would be an understatement. One which I had used before involved northing and easting co-ordinates from a known point within the UK (usually the centroid or London). Using this meant a smattering of trigonometry would be enough to return a decent list of matches; this always struck me as crude, despite it’s usefulness, using an antiquated and subjective co-ordinate system seemed the wrong way to approach the problem. Latitude and longitude are globally recognised and provide a precise way of defining points on the globe — reading up on how they are calculated was the step one. Step two was finding an algorithm that calculated the distance between two arbitrary points. The first one I found was the <a href="http://en.wikipedia.org/wiki/Haversine_formula">Haversine</a> <a href="http://www.movable-type.co.uk/scripts/latlong.html">formula:</a> simple, easy to follow and easy to implement. Knowing that this formula was based upon the assumption that the Earth was perfectly spherical grated slightly with me — I reasoned there must be a more accurate algorithm. I found this precision in <a href="http://en.wikipedia.org/wiki/Vincenty%27s_formulae">Vinencty’s</a> <a href="http://www.movable-type.co.uk/scripts/latlong-vincenty.html">algorithm</a>, it was then I decided to enact a contrived but deliciously fun maxim: <em>if something is worth doing, then it’s worth overdoing</em>.<br />
<span id="more-569"></span><br />
Vincenty’s formula is accurate to within half a millimetre if given the correct variables for your location — as the earth is an ellipsoid, and a non-uniform one at that (pesky hills), your location will determine just how accurate the formula is. Most systems use the generic and perfectly suitable <a href="http://en.wikipedia.org/wiki/WGS_84"><abbr title="World Geodetic System">WGS</abbr>–84</a> variables which are usually accurate to within 2 metres — this is the system <a href="http://en.wikipedia.org/wiki/Global_Positioning_System">GPS</a> uses. As all the suppliers I would be searching for and all the postcodes would be within the UK, I could use the more precise Airy (1830) set — likely named after <a href="http://en.wikipedia.org/wiki/George_Biddell_Airy">George Biddell Airy</a> for his work on planetary densities. The maths involved in the formula is dense to say the least and I would be lying if I said I grokked it but the implementation is straightforward.</p>
<p>I had originally envisaged doing some sort of segmenting of the suppliers prior to working out distances in a script; as my brain mulled over the possibilities (closest postcode, caching of major city locations, co-ordinate system conversions) I realised that setting it up as a database function would solve all of these problems. Firing up <a href="http://www.phpmyadmin.net/">phpMyAdmin</a> I bashed out an attempt and after a couple of fixes (mostly syntactic foibles of <a href="http://dev.mysql.com/doc/refman/5.0/en/create-procedure.html">MySQL</a>) it was working a treat:</p>
<pre><code>DELIMITER //
DROP FUNCTION IF EXISTS distanceVincenty//
CREATE FUNCTION distanceVincenty(lat1 FLOAT, lon1 FLOAT, lat2 FLOAT, lon2 FLOAT) RETURNS INT
BEGIN

DECLARE a, b, f, L, U1, U2, sinU1, cosU1, sinU2, cosU2 DOUBLE;
DECLARE lambda, lambdaP, sinLambda, cosLambda DOUBLE;
DECLARE sinSigma, cosSigma, sigma, sinAlpha, cosSqAlpha, cos2SigmaM, C DOUBLE;
DECLARE iterLimit INT;
DECLARE uSq, A1, B1, deltaSigma, s DOUBLE;

SET a = 6377563.396, b = 6356256.909, f = (1 / 299.3249646);
SET L = RADIANS(lon2 - lon1);
SET U1 = ATAN((1 - f) * TAN(RADIANS(lat1)));
SET U2 = ATAN((1 - f) * TAN(RADIANS(lat2)));
SET sinU1 = SIN(U1), cosU1 = COS(U1);
SET sinU2 = SIN(U2), cosU2 = COS(U2);

SET lambda = L, lambdaP = 0, iterLimit = 100;
mainLoop: REPEAT
	SET sinLambda = SIN(lambda), cosLambda = COS(lambda);
	SET sinSigma = SQRT((cosU2 * sinLambda) * (cosU2 * sinLambda) + (cosU1 * sinU2 - sinU1 * cosU2 * cosLambda) * (cosU1 * sinU2 - sinU1 * cosU2 * cosLambda));
	SET cosSigma = sinU1 * sinU2 + cosU1 * cosU2 * cosLambda;
	IF sinSigma = 0 THEN RETURN 0.0; END IF;

	SET sigma = ATAN2(sinSigma, cosSigma);
	SET sinAlpha = cosU1 * cosU2 * sinLambda / sinSigma;
	SET cosSqAlpha = 1 - sinAlpha * sinAlpha;
	SET cos2SigmaM = cosSigma - 2 * sinU1 * sinU2 / cosSqAlpha;
	IF cos2SigmaM IS NULL THEN SET cos2SigmaM = 0; END IF;

	SET C = f / 16 * cosSqAlpha * (4 + f * (4 - 3 * cosSqAlpha));
	SET lambdaP = lambda;
	SET lambda = L + (1 - C) * f * sinAlpha * (sigma + C * sinSigma * (cos2SigmaM + C * cosSigma * (-1 + 2 * cos2SigmaM * cos2SigmaM)));

	SET iterLimit = iterLimit - 1;
UNTIL ((ABS(lambda - lambdaP) &gt; 1E-12) AND (iterLimit &gt; 0))
END REPEAT mainLoop;

SET uSq = cosSqAlpha * (a * a - b * b) / (b * b);
SET A1 = 1 + uSq / 16384 * (4096 + uSq * (-768 + uSq * (320 - 175 * uSq)));
SET B1 = uSq / 1024 * (256 + uSq * (-128 + uSq * (74 - 47 * uSq)));
SET deltaSigma = B1 * sinSigma * (cos2SigmaM + B1 / 4 * (cosSigma * (-1 + 2 * cos2SigmaM * cos2SigmaM) - B1 / 6 * cos2SigmaM * (-3 + 4 * sinSigma * sinSigma) * (-3 + 4 * cos2SigmaM * cos2SigmaM)));
SET s = b * A1 * (sigma - deltaSigma);

RETURN ROUND(s);
END;
//</code></pre>
<p>All of the trigonometry functions are built into MySQL, even helpful ones like ATAN2 and its ilk. The function returns the distance between the points in millimetres which can then be easily transformed into your chosen unit of choice.</p>
<p>As you can no doubt guess from the code above, this isn’t exactly computationally cheap. Potentially for nearly antipodal points the main loop will repeat up to a iterLimit times (100 above) before continuing. As well as this, depending on the construction of your database and SQL statement, you could end up doing this calculation for every record in your table, e.g.:</p>
<pre><code>SELECT id, title, address, distanceVincenty(latitude, longitude, @lat, @lon) AS distance FROM `suppliers` ORDER BY distance DESC</code></pre>
<p>will force MySQL to calculate the distance for every row and then order accordingly. I’ve yet to do any benchmarks as it would be lunacy to put into production, however queries which ordinarily took a few hundred milliseconds started taking up to two seconds to complete. Thankfully the system I was working on solved this problem itself by having each supplier categorised and rated which meant searches rarely returned more than ten to twenty results before distance calculation; the possibility still exists of doing segmentation based on addresses prior to the calculation but that will be dependant on your specific requirements.</p>
<p>So while I doubt any users of the new supplier search realise it, their results are accurate to within a few millimetres, potentially saving them microns of shoe leather in walking or giving them true results for the case when rival suppliers are mere millimetres apart. For anyone who isn’t interested in meridians and arc tangents then the Vincenty function is most certainly overkill and sticking with Haversine is likely the smarter move, but for those valuing absolute precision, this certainly provides a great little exercise. <span class="signOff">¶</span></p>
]]></content:encoded>
			<wfw:commentRss>http://chaostangent.com/2009/07/calculating-the-geodesic-distance-between-two-points/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Rebuilding gallery.chaostangent.com</title>
		<link>http://chaostangent.com/2009/07/rebuilding-gallery-chaostangent-com/</link>
		<comments>http://chaostangent.com/2009/07/rebuilding-gallery-chaostangent-com/#comments</comments>
		<pubDate>Sun, 26 Jul 2009 18:09:31 +0000</pubDate>
		<dc:creator>chaostangent</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Geekery]]></category>
		<category><![CDATA[cake]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[framework]]></category>
		<category><![CDATA[galleries]]></category>
		<category><![CDATA[gd]]></category>
		<category><![CDATA[imagemagick]]></category>
		<category><![CDATA[images]]></category>
		<category><![CDATA[modified preorder tree traversal]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[php5]]></category>
		<category><![CDATA[prado]]></category>
		<category><![CDATA[symfony]]></category>
		<category><![CDATA[thumbnails]]></category>
		<category><![CDATA[zend framework]]></category>

		<guid isPermaLink="false">http://chaostangent.com/?p=477</guid>
		<description><![CDATA[All about the recent major update to the application powering gallery.chaostangent.com. A brief history of the application and its purpose as well as some implementation details and an in-depth look at the updates and the reasons behind them.]]></description>
			<content:encoded><![CDATA[<p><a href="http://chaostangent.com/wp-content/uploads/2009/07/img-gallerychaostangentcom.jpg"><img class="alignnone size-medium wp-image-484" title="gallery.chaostangent.com front page" src="http://chaostangent.com/wp-content/uploads/2009/07/img-gallerychaostangentcom-540x408.jpg" alt="gallery.chaostangent.com front page" width="540" height="408" /></a><br />
<a href="http://gallery.chaostangent.com">gallery.chaostangent.com</a> is an application for storing and organising images – ostensibly a very simple desire but one I found not catered for by <a href="http://gallery.menalto.com/">existing</a> <a href="http://coppermine-gallery.net/">web</a> <a href="http://wordpress.org/extend/plugins/nextgen-gallery/">applications</a> when it was first conceived in 2005. The concept was an application that was simple and easy to use while still allowing for a degree of organisation to ensure images weren’t stored in a single “pool”.</p>
<blockquote class="pullout"><p>“With a small, well-defined feature set it seemed like a good time to address some of the issues which had crept in”</p>
</blockquote>
<h2>Background</h2>
<p>When I first started developing the application, PHP 5 hadn’t been released for very long and was <a href="http://gophp5.org/node/7">receiving a mixed reception</a>. Regardless, I started developing using a custom built framework I had cobbled together from scratch – one that would eventually go on to be refined and used in some of my work projects. With the lack of other mature frameworks to compare with, it was rough round the edges and did little more than segment out code into the <a href="http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller">MVC pattern</a> and even then it wasn’t an entirely clean encapsulation; it was however useful.<br />
<span id="more-477"></span><br />
The first version had the following functionality:</p>
<ul>
<li> Uploading of images using a form</li>
<li> Hierarchical galleries in a folder like tree structure – mirrored in the file structure on the server</li>
<li> Hidden galleries for concealing galleries from the frontend</li>
<li> User management with access control lists</li>
<li> Batch image upload from a folder on the server</li>
</ul>
<p>The batch image upload was added when it became obvious that using a web form to add multiple images was a tedious and protracted affair. The batch upload allowed a user to transfer files onto the server using whatever method they desire (e.g. FTP) and then specify that folder for trawling. This made adding hundreds of images a breeze, despite being less than optimal or straightforward.</p>
<h2>Technical</h2>
<p>The application used the “<a href="http://www.sitepoint.com/article/hierarchical-data-database/2/">Modified Pre-Order Tree Traversal</a>” mechanism for storing hierarchical data. This provided its own set of problems; for instance: to get the first level descendants of a using only <abbr title="Modified Pre-order Tree Traversal">MPTT</abbr> you have to hit the entire gallery sub-tree, so for the root node this was the entire tree. Using a hybrid approach solves this problem by storing the gallery parent not just the left and right values; this makes descendant selection trivial.</p>
<p>Another technical hurdle was the thumbnail creation. The most commonly used image library within PHP is <a href="http://uk3.php.net/manual/en/ref.image.php">GD</a> which is really only suitable for smaller images. As it operates within PHP’s memory, larger images cause more memory to be used until the server imposed limit (<a href="http://uk3.php.net/manual/en/ini.core.php#ini.memory-limit">memory_limit in php.ini</a>) is reached. This causes odd states whereby an image has been uploaded but no way of telling whether the script would time-out or hit the memory limit prior to beginning processing.</p>
<p>The solution to this dilemma was to switch to using the command line tool <a href="http://www.imagemagick.org/">ImageMagick</a>. As an external executable, PHP’s memory limit is no longer an issue; however ImageMagick comes with its own foibles. This involved lesser travelled areas of image manipulation such as <a href="http://www.imagemagick.org/script/command-line-options.php#colorspace">colourspaces</a> and multi-frame images (animated GIFs et. al.).</p>
<h2>Problems</h2>
<p>For a number of years the application worked admirably, only a small selection of niggles remained:</p>
<ul>
<li>Batch image upload process</li>
<li> Odd hidden galleries logic</li>
<li> Tedious bulk image deletion</li>
<li> Square thumbnails</li>
<li> Changing the name of a gallery changed the filesystem folder name which altered the URLs for the images</li>
</ul>
<p>Since its inception, a selection of PHP frameworks had been released and matured such as <a href="http://www.pradosoft.com/">Prado</a>, <a href="http://cakephp.org/">CakePHP</a> and <a href="http://www.symfony-project.org/">Symfony</a> and the behemoth of <a href="http://rubyonrails.org/">Ruby on Rails</a> was dominating development at the time. With a small, well-defined feature set it seemed like a good time to address some of the issues which had crept in.</p>
<h2>False starts</h2>
<p>Despite improvements to the bespoke framework I used like request routing, several attempts at improving the application met with the problem that rebuilding it didn’t fundamentally improve it and the updated framework didn’t make coding any quicker or simpler, just different. These conclusions made me down tools and re-evaluate the rebuild.</p>
<h2>Version 2.0</h2>
<p>Over a year later I began work again, this time using the oft mentioned and newly released <a href="http://framework.zend.com">Zend Framework</a>. I quickly surpassed the functionality milestone I had reached with older versions – mainly due to my focus on individual functionality. I had set up a <a href="http://subversion.tigris.org/">Subversion</a> repository and disciplined myself into making smaller, more frequent commits rather than monolithic end-of-the-day updates. After some iterative improvements the application retained the core functionality of version 1.0 and most importantly, built upon it:</p>
<ul>
<li>Thumbnails are now proportional – aesthetics are easy to change, thumbnails aren’t</li>
<li> Any number of different image versions can be generated e.g. not just thumbnails; parameters are flexible and easy to extend</li>
<li> Removed hidden galleries – despite improving the logic, their usefulness was always in question</li>
<li> Users are no longer subject to an <abbr title="Access Control List">ACL</abbr> – designed for small, trustworthy communities rather than sprawling user bases</li>
<li> Galleries now separate out their display and filesystem names – each can be changed independently of the other</li>
<li> Image uploads are now done using <a href="http://swfupload.org/">SWFUpload</a> if available</li>
<li> Some actions use <abbr title="Asynchronous JavaScript and XML">AJAX</abbr> if available to enhance usability</li>
<li> Galleries can be output in <abbr title="eXtensible Markup Language">XML</abbr> and <abbr title="JavaScript Object Notation">JSON</abbr> for external processing</li>
</ul>
<p>With such an array of improvements it wasn’t long before the build was tested and put into place. Despite being incompatible with the previous version, this turned out to be a good time to clean out the cruft and start afresh.</p>
<h2>Technical</h2>
<p>The first 2.0 build mirrored the code layout of the 1.0 build, so image sizing was done within the controller which made them needlessly large and unwieldy. The second 2.0 build sectioned image sizing out into the models, triggered on database hooks (pre-insert, post-insert etc.). This was initially tricky as the tightly coupled functionality seemed an odd fit until it was massaged into being more generic and loosely coupled.</p>
<p>Image versions are now calculated on-the-fly rather than being stored in the database. Some image versions may or may not exist depending on the settings a user has entered and the image in question – e.g. a thumbnail is set to always be generated, while a blog sized image may only be present if the image is large enough. This granularity of control allows for a variety of usage scenarios.</p>
<p>Using SWFUpload was an early decision: moving away from the server based folder upload was a high priority. Implementing SWFUpload is tricky as it requires a lot of JavaScript upfront to work well; the Flash cookie bug also cropped up, polluting an otherwise pristine authentication system.</p>
<p>The database structure was simplified even further than version 1.0. Images no longer stored the filename of the thumbnail (a result of allowing multiple image versions) or the filesize of the image, dimensions are retained as these would be too costly to calculate on the fly. I originally planned to use triggers for some of the more complex database actions like insertion and deletion within the gallery tree however this isn’t possible as triggers can’t modify the table they’re fired on – this is a general database principal rather than the standard bloody-mindedness usually exhibited by MySQL. I ended up using database functions which reduced the error checking required in the PHP code.</p>
<h2>Future</h2>
<p>I am aiming to get a release candidate completed soon. I initially stayed away from a public release due to the usage of a custom framework that was used heavily in commercial projects, and the gnawing doubt that it wasn’t yet good enough for exposure to my peers. With these barriers now removed, it remains for me to thoroughly test and document the system and package it; aspects which I take for granted (having access to the database, knowledge of where the ImageMagick executable is etc.) will need to be addressed prior to release. There are other tertiary concerns as well such as a bug tracker, support forums and so forth.</p>
<p>I am still aiming to further refine the existing functionality – primarily the production of image versions and SWFUpload integration. With the ability to format galleries in XML and JSON, it opens up a number of possibilities, including integration with Wordpress’s media navigator or sidebar widgets. <span class="signOff">¶</span></p>
]]></content:encoded>
			<wfw:commentRss>http://chaostangent.com/2009/07/rebuilding-gallery-chaostangent-com/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cloud computing</title>
		<link>http://chaostangent.com/2008/09/cloud-computing/</link>
		<comments>http://chaostangent.com/2008/09/cloud-computing/#comments</comments>
		<pubDate>Mon, 01 Sep 2008 18:29:00 +0000</pubDate>
		<dc:creator>chaostangent</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Geekery]]></category>
		<category><![CDATA[bandwidth]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[cloud computing]]></category>
		<category><![CDATA[infrastructure]]></category>
		<category><![CDATA[internet]]></category>
		<category><![CDATA[reliability]]></category>
		<category><![CDATA[server]]></category>
		<category><![CDATA[service]]></category>
		<category><![CDATA[storage]]></category>

		<guid isPermaLink="false">http://chaostangent.com/?p=3</guid>
		<description><![CDATA[
King Cloud by akakumo used under Creative Commons Attribution-Share Alike
The term “cloud computing” is being bandied about more and more recently, sometimes termed “x as a service”, its proponents make it out to be the embodiment of an ideology whereby one doesn’t worry about the details and simply wants to get things done. From my [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.flickr.com/photos/kky/704056791/"><img class="alignnone size-medium wp-image-413" title="King Cloud by akakumo" src="http://chaostangent.com/wp-content/uploads/2008/09/704056791_8f5db72f63_o-540x405.jpg" alt="King Cloud by akakumo" width="540" height="405" /></a><br />
<cite class="caption"><a href="http://www.flickr.com/photos/kky/704056791/">King Cloud by akakumo</a> used under Creative Commons Attribution-Share Alike</cite></p>
<p>The term “cloud computing” is being bandied about more and more recently, sometimes termed “x as a service”, its proponents make it out to be the embodiment of an ideology whereby one doesn’t worry about the details and simply wants to get things done. From my perspective as a developer, the most interesting parts of the <a href="http://en.wikipedia.org/wiki/Cloud_computing">CC paradigm</a> revolve around infrastructure, service and storage but unlike a great many others, I’m unwilling to jump head-first into using CC implementations.</p>
<p>Growing up for me has always been about trying to get the most amount of bandwidth realistically available to me, often times verbally fighting for it, be it with my sister or the IT providers at my university. Coming from that background I have a healthy respect for how precious people make bandwidth out to be and the detrimental effects not having enough of it can cause. In this light, you can understand why I’m wary of cloud computing. Internet access is still not as ubiquitous as many people, most densely-packed city dwellers, make it out to be. The application end of the CC scale I’m always going to meet with scepticism, my documents are stored on my hard drive which is eminently more tangible than an increasingly ephemeral idea of connectivity.</p>
<blockquote class="pullout"><p>“five nines uptime isn’t what these services are pushing as their tagline”</p>
</blockquote>
<p>Other uses of CC though include offering a service beneficial to developers and producers alike, and this for me is where the allure begins. Not having to worry about storage requirements or dedicated server space for a project is an enticing prospect, cutting out a swathe of niggles and possible overheads, breaking it down to what many feel is the future: it just works. Being able to simply sign up and start pulling and pushing data through a well defined API, to a service rather than a dirty filesystem has an elegance to it. Or perhaps the idea that servers are no longer tied to a physical machine, instances just minutes away from being summoned to life as quickly as they can be brought down.<br />
<span id="more-3"></span><br />
<a href="http://www.flickr.com/photos/wtlphotos/494749811/"></a><a href="http://www.flickr.com/photos/wtlphotos/494749811/"><img class="alignnone size-medium wp-image-414" title="Cotton Ball Clouds / Nuvole a Batuffolo" src="http://chaostangent.com/wp-content/uploads/2008/09/494749811_7f338637c0_b-540x329.jpg" alt="Cotton Ball Clouds / Nuvole a Batuffolo" width="540" height="329" /></a><br />
<cite class="caption"><a href="http://www.flickr.com/photos/wtlphotos/494749811/">Cotton Ball Clouds / Nuvole a Batuffolo by WTL photos</a> used under Creative Commons Attribution-No Derivative Works</cite></p>
<p>One large drawback to all of this ease though is the one aspect which has held me back from dropping my name into the lovingly documented <a href="http://aws.amazon.com/">Amazon Web Services</a>: reliability. The most high profile hits recently included <a href="http://www.theregister.co.uk/2008/02/15/amazon_s3_outage_feb_2008/">Amazon’s S3 outage</a> or <a href="http://www.washingtonpost.com/wp-dyn/content/article/2008/08/11/AR2008081101894.html">GMail’s downtime</a> both of which may be isolated incidents, but demonstrate that five nines uptime isn’t what these services are pushing as their tagline, only ease of use. Of course neither Amazon or Google want downtime and the hit to their credibility will take time to heal, but it highlights that these are not services to be relied on <em>yet</em>. Most current thinking is to use these services as side-cars or backups rather than your main technology platform and to rely upon a robust application to deal with hiccups; perhaps using <a href="http://aws.amazon.com/ec2">EC2</a> as load balancer or for surge capacity, or <a href="http://aws.amazon.com/s3/">S3</a> to store larger files with graceful fall-over to a generic “These files are not currently available” message.</p>
<p>At the moment the services occupy a part of my brain that is doing things backwards: trying to construct uses for the services to justify signing up for them. I’m always looking to learn new technologies, but for the moment I have no use for them that isn’t already filled more than adequately by other services. I run a high-availability dedicated server for work which has no trouble handling several large capacity sites and a shared hosting solution with <a href="http://www.dreamhost.com">Dreamhost</a> for this site which allows me to tinker and write without any artifical boundaries. For me, I enjoy the finer details of running these services, crafting and honing them to be as swift and efficient as possible; building robustness is part of that, but assuming the underlying foundations you’re building on are less reliable than your application and that downtime should be factored into project scopes are foreign concepts to me. <span class="signOff">¶</span></p>
]]></content:encoded>
			<wfw:commentRss>http://chaostangent.com/2008/09/cloud-computing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Building a better screenshotter</title>
		<link>http://chaostangent.com/2008/08/building-a-better-screenshotter/</link>
		<comments>http://chaostangent.com/2008/08/building-a-better-screenshotter/#comments</comments>
		<pubDate>Sun, 31 Aug 2008 12:00:18 +0000</pubDate>
		<dc:creator>chaostangent</dc:creator>
				<category><![CDATA[Anime]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Geekery]]></category>
		<category><![CDATA[avisynth]]></category>
		<category><![CDATA[bframe]]></category>
		<category><![CDATA[hidef]]></category>
		<category><![CDATA[highdefinition]]></category>
		<category><![CDATA[iframe]]></category>
		<category><![CDATA[mpeg]]></category>
		<category><![CDATA[mplayer]]></category>
		<category><![CDATA[pframe]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[screenshots]]></category>
		<category><![CDATA[scripting]]></category>

		<guid isPermaLink="false">http://chaostangent.com/?p=7</guid>
		<description><![CDATA[
My previous forays into crafting an automatic screenshot taker were, at the time, very successful. The system managed to pump out usable images in a fraction of the time it would have taken me to seek and do them manually; I even extended the script to handle multiple file-inputs which made ‘capping an entire series [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignnone size-medium wp-image-417" title="High definition snow" src="http://chaostangent.com/wp-content/uploads/2008/08/5cmpersecond-01-540x303.jpg" alt="High definition snow" width="540" height="303" /></p>
<p>My <a href="http://chaostangent.com/2006/08/screenshotter/">previous forays</a> into crafting an automatic screenshot taker were, at the time, very successful. The system managed to pump out usable images in a fraction of the time it would have taken me to seek and do them manually; I even extended the script to handle multiple file-inputs which made ‘capping an entire series a breeze. Lamentably, this was a honeymoon period before cracks started to show, followed by gaping chasms.</p>
<p>The only workaround the first screenshotter used was a glitch for Windows Media files which meant the first frame sought was always blank, it swerved around this limitation by taking two shots and discarding the first. This symptom, however, was indicative of what would become a persistent problem.</p>
<p><strong>Background</strong></p>
<blockquote class="pullout"><p>“The thing to understand is that seeking in a video files is very difficult”</p>
</blockquote>
<p>The first significant problem I encountered with the setup was with the series <a href="http://chaostangent.com/2007/09/claymore/">Claymore</a>, a great many of the resulting images seemed to have a lot of “bleed through”, as if one frame were being intermingled with another, this was above an beyond the standard cross-fade transition screenshots that were common. At the time I assumed it was because the files I used were modern H264 MKV files rather than the standard XViD ones I had been using before, or that the encoding was particularly shoddy. After downloading an updated version of <a href="http://www.mplayerhq.hu/">mplayer for Windows</a> the problem seemed to disappear; I ended up regenerating a lot of the images for episodes which were most severe offenders.</p>
<p>After a spate of swift updates, I didn’t blog anime any more so the screenshotter shortcut on my desktop lay dormant until I decided to unleash some madness on Strawberry Panic. While the setup worked, it was producing an unusual amount of exact duplicate images, despite being over five seconds apart. I realised there was a fundamental underlying cause for this that an mplayer update wouldn’t fix. True high-definition versions (not upscales) of certain releases were now readily available, namely the seminal Ghost in the Shell: Standalone Complex (and associated movie Solid State Society) and a selection of <a href="http://www.animenewsnetwork.com/encyclopedia/people.php?id=3487">Makoto Shinkai</a> works including <a href="http://chaostangent.com/2007/10/five-centimeters-per-second/">5 centimeters per second</a>, which I wanted to pluck some quality captures from (for desktop wallpaper or other purposes). These files did not agree with the screenshotter at all and stoically produced correct resolution but entirely black captures which was less than useful.<br />
<span id="more-7"></span><br />
<img class="alignnone size-medium wp-image-418" title="High definition solitude" src="http://chaostangent.com/wp-content/uploads/2008/08/5cmpersecond-02-540x303.jpg" alt="High definition solitude" width="540" height="303" /></p>
<p><strong>The cause</strong></p>
<p>While not entirely sure of the cause, my hunch lay with the way newer files operate. Modern video compression usually comes in <a href="http://en.wikipedia.org/wiki/B-frame">three frame types</a>, I-Frame (usually termed keyframes), P-Frame which are predictive frames (sometimes thought of as “only the bits that change”) and B-Frames which which are also predictive frames but don’t just rely on the previous decoded frames but the forward frames as well. H264, the most modern MPEG standard also gives three other frame types: SI, SP and multi-frame motion frames, all of which are outside the scope as to why they’re different. The thing to understand is that seeking in a video files is very difficult as not only does the decoder need to potentially decode all the way back to an I-frame, but also forward to get all the data needed which can be time consuming and intensive.</p>
<p>The older screenshotter worked by repeatedly spawning mplayer with command line arguments stating it should jump to a specific point, play only 2 frames which it outputs as images and then exit. The scripting part (handled by PHP) was doing all the calculations as to where to seek to rather than expecting mplayer to do this (the reason for this will be explained below). The problem of duplicate screenshots was likely being caused by forcing mplayer to seek to an arbitrary point where it would display the first frames it could reliably show, and in some highly compressed files, this could be the same across one or more screengrab attempts. The blank frame problem I postulated was probably a computational bottleneck, my computer can’t play most 1080p files in real time and can baulk at some 720p ones, my theory is that mplayer was doing the best job it could without desynching and displayed a blank frame so it could “catch up” as the file progressed, which it couldn’t do due to only forcing two frames.</p>
<p>In short, because mplayer was being continually restarted, the screenshots lacked the “context” of the rest of the file and thus were substandard or just plain missing.</p>
<p><img class="alignnone size-medium wp-image-419" title="High definition consumerism" src="http://chaostangent.com/wp-content/uploads/2008/08/5cmpersecond-03-540x303.jpg" alt="High definition consumerism" width="540" height="303" /></p>
<p><strong>Version 2</strong></p>
<p>The original screenshotter layered PHP on top of mplayer because of a number of perceived deficits in mplayer’s operation. mplayer has always had a “framestep” option (which in the most recent versions has been bundled into the video filter architecture) which seemed like exactly what was needed, unfortunately it only forced the <em>rendering </em>of those particular frames rather than doing what I had hoped and only playing those frames; using this option means taking screenshots would take as long as the file itself.</p>
<p>My next thought was to use something a bit more specialised for scripting, <a href="http://avisynth.org/">AviSynth</a> sprung immediately to mind. As well as a cornucopia of other options, it had a function which seemed to be what I was looking for: <a href="http://avisynth.org/mediawiki/SelectEvery">SelectEvery</a>. My hope that it would provide a simple solution to this problem was dashed as it did exactly the same as the mplayer framestep option, rendering only the selected frames rather than compressing.</p>
<p>Version 2 is more of a compromise than a solution. Essentially mplayer needed to play through the file in its entirety to be able to extract good screenshots but needed to be quick enough to be usable. The result is using previously mentioned options of mplayer (frameskip, JPEG output) but forcing the frame rate to be fast so that mplayer can tear through the file as fast as it can. Thankfully by default mplayer doesn’t do any frame skipping and it requires you to stipulate it skip frames (aggressively or otherwise). The final command line I used for my shortcut was:</p>
<pre><code>mplayer.exe -quiet -nosound -vo jpeg:progressive:quality=85:outdir=screenshots -fps 1000 -vf framestep:i120</code></pre>
<p>I would have used –really-quiet except my version of mplayer didn’t seem to like that option. Dropping this into a shortcut on my desktop, I’m able to drag a file onto the shortcut and for everything to just whir away in the background. The lower case “i” in front of the framestep intervals tells mplayer to spit out an “I!” everytime it captures a frame which you can disable if you want the minimum of fuss. The framestep value is in frames which means the time between files is dependant on the frame of the video you’re capturing, however I usually do screenshots in depth and then prune them afterwards, you can adjust this to suit your style.</p>
<p><img class="alignnone size-medium wp-image-420" title="High definition grey day" src="http://chaostangent.com/wp-content/uploads/2008/08/5cmpersecond-04-540x303.jpg" alt="High definition grey day" width="540" height="303" /></p>
<p><strong>Conclusions and ways forward</strong></p>
<p>In terms of results, the images dotted about this article say all that they need to really. The number of duplicates is none existent and there is no apparent visual distortion or corruption or errant blanks even with high definition files.</p>
<p>As far as benchmarks are concerned, I haven’t yet done any tests to see whether this way is faster or more complete than my previous version. Setting the frame rate so high may be detrimental rather than a beneficial as mplayer may be pushing harder to get through the file than it should be, FPS could be optimal at a much lower value (perhaps a multiple of the actual framerate).</p>
<p>This method also requires a lot more maintenance on the part of the user as the screenshots are unceremoniously dumped into a single directory and are subject to being overwritten if doing a sequence of files. There may yet be a place for scripting in this version, although I’m loathe to do so as the purity of keeping the method entirely within a shortcut is not to be underestimated. <span class="signOff">¶</span></p>
]]></content:encoded>
			<wfw:commentRss>http://chaostangent.com/2008/08/building-a-better-screenshotter/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Expectancy: PHP 5.3</title>
		<link>http://chaostangent.com/2008/08/expectancy-php-5-3/</link>
		<comments>http://chaostangent.com/2008/08/expectancy-php-5-3/#comments</comments>
		<pubDate>Wed, 27 Aug 2008 18:18:56 +0000</pubDate>
		<dc:creator>chaostangent</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Geekery]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[namespaces]]></category>
		<category><![CDATA[oop]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[php4]]></category>
		<category><![CDATA[php5]]></category>
		<category><![CDATA[php5.2]]></category>
		<category><![CDATA[php5.3]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://chaostangent.com/?p=9</guid>
		<description><![CDATA[
The release of PHP 5.3 is due sometime soon and with a feature freeze in place since the 24th of July and a pre-release alpha now available, it’s worth exploring some of the many additions and changes that are going to be introduced.
As PHP is the language I most frequently work in and one which [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignnone size-full wp-image-422" title="Pretty Hard Panda" src="http://chaostangent.com/wp-content/uploads/2008/08/php.gif" alt="Pretty Hard Panda" width="120" height="67" /></p>
<p>The release of PHP 5.3 is due sometime soon and with a feature freeze in place since the 24th of July <em>and</em> a pre-release alpha now available, it’s worth exploring some of the many additions and changes that are going to be introduced.</p>
<p>As PHP is the language I most frequently work in and one which I’ve done all sorts with (from web applications, to <a href="http://192.168.1.65/blog.chaostangent.com/archives/370">file exploration</a> to <a href="http://192.168.1.65/blog.chaostangent.com/archives/14">media player scripting</a>), I like to think I’m sensitive to deficiencies and oddities in the released implementations. Version 5.3 contains a lot of elements backported from the still distant version 6, the most glaring omission being end-to-end Unicode support without mb_* fudges or iconv; being able to use string-backed functions like array_unique() without suspicion will be a big help, but I digress.</p>
<p>The most high-profile addition is that of namespaces, gone will be the warts that dot current frameworks (e.g. Zend_Db_Table_Rowset) which will make different frameworks and modules far easier to use and far more friendly when you want them to play nicely together.</p>
<blockquote class="pullout"><p>“PHP and MySQL have always been bedfellows despite their conflicting release licenses”</p>
</blockquote>
<p>Static functions have also been promoted to all a lot of the meta-programming niceties that member functions have including true overloading support which will allow first level abstractions such as database wrappers to not require instantiation before being called (which I discovered around the same time as <a href="http://192.168.1.65/blog.chaostangent.com/archives/40">my get_class exploration</a>). For instance, if using an ORM, doing People::getAllById() will now be easier to achieve. Along side this many of the magic methods have been tightened up to make them less ambiguous (__get can only be public and not static, signatures enforced etc.)</p>
<p>Looking through some of the other <a href="http://wiki.php.net/doc/scratchpad/upgrade/53">changes detailed in the PHP Wiki</a> it seems that a selection of new functions surrounding garbage collection are now being exposed including checking whether it is enabled, and selectively enabling or disabling it. Whether this is a mistake (close by get_extension_funcs() is detailed as a new function but <a href="http://uk3.php.net/manual/en/function.get-extension-funcs.php">appears to have been in since PHP4</a>) and these are bleed-throughs from the Zend Engine is unclear, but without some surrounding memory management facilities, it would seem unwise to disable or allow disabling of garbage collection.</p>
<p>On the extension front numerous ones have been standardised and moved into the PECL system which goes some way to neatening things up; the change <a href="http://blog.felho.hu/what-is-new-in-php-53-part-3-mysqlnd.html">some are talking about</a> is the choice between a local MySQL library (mysqlnd) versus the native libmysql library that comes when compiling against a MySQL release. PHP and MySQL have always been bedfellows despite their conflicting release licenses (especially so since Sun gobbled up MySQL) so this seems like a smart move for all concerned with separate code-base, better engine integration and statistical analysis now possible (<a href="http://www.hristov.com/andrey/projects/php_stuff/pres/mysqlnd_vikinger.pdf">PDF details</a>).</p>
<p>What all of this adds up to is a release that’s solid on paper, but the bum-rush for patches is sure to be as swift as any other PHP release. Especially with the OO enhancements though, it feels like these should have been included from day one, as not only will there now be a disjoint between PHP4 and PHP5 shared servers, but PHP5.2 and PHP5.3 as well. For someone who runs their own server this is not massive worry, especially when the list of backwards compatibility changes are so small, but for service providers (hosts, ISPs etc.) still dragging their feet over 4 &gt; 5 &gt; 5.2, this adds another step of complexity.</p>
<p>The real test will obviously be the frameworks and high profile applications that PHP utilises and with word that the <a href="http://framework.zend.com/">Zend Framework</a> won’t be <a href="http://www.nabble.com/PHP-5.3-Namespaces-on-ZF-td18836642.html">supporting namespaces until its 2.0</a> release next year the lead time could be immense, especially when you consider phpBB, what was once considered the yardstick of PHP usage, <a href="http://www.phpbb.com/support/documentation/3.0/quickstart/quick_requirements.php">still supports 4.3</a> with its most recent version, the playing field for cutting edge PHP seems less than agile. <span class="signOff">¶</span></p>
]]></content:encoded>
			<wfw:commentRss>http://chaostangent.com/2008/08/expectancy-php-5-3/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Javascriptery: Tabbed forms</title>
		<link>http://chaostangent.com/2008/03/javascriptery-tabbed-forms/</link>
		<comments>http://chaostangent.com/2008/03/javascriptery-tabbed-forms/#comments</comments>
		<pubDate>Wed, 12 Mar 2008 20:44:57 +0000</pubDate>
		<dc:creator>chaostangent</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[accessibility]]></category>
		<category><![CDATA[ala]]></category>
		<category><![CDATA[forms]]></category>
		<category><![CDATA[javascript]]></category>
		<category><![CDATA[script]]></category>
		<category><![CDATA[snippet]]></category>
		<category><![CDATA[tabbed]]></category>
		<category><![CDATA[tabs]]></category>

		<guid isPermaLink="false">http://chaostangent.com/?p=12</guid>
		<description><![CDATA[
Forms are perhaps the bane of web development for me; you can’t get them to look good, you can’t find a foolproof way to make them act well and lets not even start of trying to get them into a pacified state, free from the dangers of user input (surprise ending: form input will never [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://gallery.chaostangent.com/galleries/random/blog/tabbedform.png" alt="" width="512" height="100" /></p>
<p>Forms are perhaps the bane of web development for me; you can’t get them to look good, you can’t find a foolproof way to make them act well and lets not even start of trying to get them into a pacified state, free from the dangers of user input (surprise ending: form input will never be completely trustworthy). A lot of sites would appear to have aesthetically pleasing forms, however this is a careful ruse by them as they sidestep the problem of forms by having only one or two of them, and then they usually only have a few fields. The monstrosities I am required to deal with almost daily are things of grotesque beauty, veritable Rube Goldberg machines  of complexity.<span id="more-12"></span></p>
<p>The long and the short of this diversion into why forms are evil (please, end my suffering quickly <a href="http://www.w3.org/MarkUp/Forms/">XForms</a>) is that to get a form looking good, you have to spend a long time fiddling with things. Enough of this banter anyway, my fiddling with JavaScript (like the dirty little bastard child of C and Perl it is) produced a way of creating a tabbed form that defaults to a standard single form if a user prefers to use <a href="http://noscript.net/">NoScript</a> or an antiquated browser of yore.</p>
<p>So the following markup:</p>
<pre><code>
</code>
<form id="theForm" method="post">
<fieldset>
<legend>First tab</legend>
<ol>
<li><label for="formone">One</label>
<input id="formone" name="one" type="text" /></li>
<li><label for="formtwo">Two</label>
<input id="formtwo" name="two" type="text" /></li>
<li><label for="formthree">Three</label>
<input id="formthree" name="three" type="text" /></li>
</ol>
</fieldset>
<fieldset>
<legend>Second tab</legend>
<ol>
<li><label for="formfour">Four</label>
<input id="formfour" name="four" type="text" /></li>
<li><label for="formfive">Five</label>
<input id="formfive" name="five" type="text" /></li>
<li><label for="formsix">Six</label>
<input id="formsix" name="six" type="text" /></li>
</ol>
</fieldset>
<fieldset>
<legend>Third tab</legend>
<ol>
<li><label for="formseven">Seven</label>
<input id="formseven" name="seven" type="text" /></li>
<li><label for="formeight">Eight</label>
<input id="formeight" name="eight" type="text" /></li>
<li><label for="formnine">Nine</label>
<input id="formnine" name="nine" type="text" /></li>
</ol>
</fieldset>
</form>
</pre>
<p>The default will be to display the first fieldset, with links in a list to display the other two. A trivial form like this certainly doesn’t require a tabbed layout, but a monstrosity that contains 27 input fields (some multiple choice) could do with a little information management when displayed to the user. The general markup is what I’ve settled on for the majority of my forms and is based almost entirely on <a href="http://www.alistapart.com/articles/prettyaccessibleforms/">Nick Rigby’s article on ALA</a> but the styling isn’t what’s important here.</p>
<p>For this project, like any I undertake with JavaScript, I’ll be using the <a href="http://www.prototypejs.org/">Prototype library</a> (version 1.6 specifically for this snippet), this could be done without it with minimum fuss but Prototype is lovely so I usually already have it included.</p>
<p>The functionality of this project is pretty minimal, the building of a list of the available fieldsets lies at the core of it. When the script is invoked it will hide all but the first fieldset, build an unordered list of the fieldsets (taking the names from the &lt;legend&gt; elements) and then set up event listeners for that list to change the visible state of each fieldset.</p>
<p>First things first, set up the Javascript object and hiding of the fieldsets:</p>
<pre><code>var tabbedForm = {
	init: function() {
		var formElem = $('theForm');
		if(formElem)
		{
			$A(formElem.getElementsByTagName('fieldset')).each(function(s, i) {
				var fieldsetId = s.identify();
				// hide all but the first
				if(i != 0)
				{
					s.hide();
				}
			});
		}
	}
};</code></pre>
<p>Nothing spectacular, uses the oft ignored index property of the each() function to scry when it’s not the first in a list, there are plenty of other ways of achieving this. Next job is to build the list of available fieldsets and plop that into the document at some point, so augmenting the init() function:</p>
<pre><code>init: function() {
	var formElem = $('theForm');
	if(formElem)
	{
		var listElem = document.createElement('ul');
		$A(formElem.getElementsByTagName('fieldset')).each(function(s, i) {
			var fieldsetId = s.identify();
			// hide all but the first
			if(i != 0)
			{
				s.hide();
			}

			var legendElem = s.down('legend');
			if(legendElem)
			{
				var listItemElem = document.createElement('li');
				var linkElem = document.createElement('a');
				linkElem.href = fieldsetId;
				linkElem.innerHTML = legendElem.innerHTML;
				Element.addClassName(linkElem, fieldsetId);
				Event.observe(linkElem, 'click', tabbedForm.tabClicked);

				listItemElem.appendChild(linkElem);
				listElem.appendChild(listItemElem);
			}
		});

		Element.insert(formElem, {before: listElem});
	}
}</code></pre>
<p>An unordered list item is created, the for each fieldset, the &lt;legend&gt; element is nabbed and its value used as the title for each list item. Probably the only questionable part is making the link element point to the ID of the fieldset, this is just how I do things so that when a link is clicked, the ID is available. Other people I know put these sort of items within the Javascript object itself or in a classname or somesuch, whatever works for you; I don’t have to worry about non-Javascript users clicking the links because the entire structure is generated rather than marked up. I drop the completed unordered list above the form element which fits with the “tab” metaphor we’re aiming for.</p>
<p>The only remaining function is what happens when a link in the generated list is clicked which according to my event listener is called (cunningly enough), “tabClicked”:</p>
<pre><code>tabClicked: function(evt) {
	Event.stop(evt);
	var linkElem = Event.findElement(evt, 'a');
	var formElem = $('theForm');
	if(linkElem &amp;&amp; formElem)
	{
		var idToShow = linkElem.href.substr(linkElem.href.lastIndexOf('/')+1);
		$A(formElem.getElementsByTagName('fieldset')).each(function(s) {
			if(s.identify() == idToShow)
			{
				s.show();
			}
			else
			{
				s.hide();
			}
		});
	}
}</code></pre>
<p>After stopping the link click event from bubbling up any further it grabs the clicked link element (I find it best not to take for granted which element has been clicked and just do a “findElement” to make sure we’re on the same page), pulls the ID from href attribute then iterates through the form’s fieldsets to find the one it refers to.</p>
<p>At this point the scripting is completed and a <a href="http://192.168.1.65/blog.chaostangent.com/stuff/tabbedform/">barebones proof of concept</a> can be seen. Obviously with no style it’s not going to look like tabs, but with a little <a href="http://www.alistapart.com/articles/slidingdoors/">sliding-door tomfoolery</a>, you’ll be tabbed up in no time. At this point you’ll likely want to expand on the functions above by dropping in some choice CSS classes, setting the active tab to “on” for appropriate styling and maybe even adding some other classes to let your stylesheet know things have been modified by a script (I find simply added a “scripted” class to the container element works wonders).</p>
<p>The beauty of this is it’s accessible (the form still works 100% without scripting) and it prevents a user from seeing just what a mammoth form they may be completing (blood of your first born? yes please). <span class="signOff">¶</span></p>
]]></content:encoded>
			<wfw:commentRss>http://chaostangent.com/2008/03/javascriptery-tabbed-forms/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Deconstruction part 2</title>
		<link>http://chaostangent.com/2008/01/deconstruction-part-2/</link>
		<comments>http://chaostangent.com/2008/01/deconstruction-part-2/#comments</comments>
		<pubDate>Thu, 10 Jan 2008 19:08:37 +0000</pubDate>
		<dc:creator>chaostangent</dc:creator>
				<category><![CDATA[Anime]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Geekery]]></category>
		<category><![CDATA[battle moon wars]]></category>
		<category><![CDATA[bits]]></category>
		<category><![CDATA[bytes]]></category>
		<category><![CDATA[c]]></category>
		<category><![CDATA[compression]]></category>
		<category><![CDATA[decompile]]></category>
		<category><![CDATA[deconstruction]]></category>
		<category><![CDATA[hex]]></category>
		<category><![CDATA[lz77]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[xml]]></category>
		<category><![CDATA[xvi]]></category>
		<category><![CDATA[yanepak]]></category>
		<category><![CDATA[yanesdk]]></category>
		<category><![CDATA[yaneurao]]></category>

		<guid isPermaLink="false">http://chaostangent.com/?p=17</guid>
		<description><![CDATA[Attacking those “random” files a couple of days ago provided enough of a challenge to keep me interested for a few hours, especially as it seemed like I was treading new ground in terms of spec’ing out previously unexplored file formats. It turned out that the files had already been mapped and successfully decompressed and [...]]]></description>
			<content:encoded><![CDATA[<p>Attacking those “random” files a <a href="http://chaostangent.com/2008/01/deconstruction/">couple of days ago</a> provided enough of a challenge to keep me interested for a few hours, especially as it seemed like I was treading new ground in terms of spec’ing out previously unexplored file formats. <a href="http://chaostangent.com/2008/01/deconstruction/#comments">It turned out</a> that the files had already been mapped and successfully decompressed and the only thing left to do was build an unpacker which was in the pipeline. It seemed my work wasn’t exactly fruitless but other, probably smarter people had everything under control. I wasn’t about to let that stop me though.</p>
<p><em>Note (2008–01-11): The full (official?) SDK for this file format <a href="http://yaneurao.hp.infoseek.co.jp/yaneSDK2nd/">has been located</a> which includes both a packer and an unpacker as well as other tools I’m sure are useful for working on the file format. The full name of the file format is “Yaneurao” with the SDK going by the nomenclature of “yaneSDK” which is the stem for the file format signature of “yanepkDx”. There is already a <a href="http://yanesdkdotnet.sourceforge.jp/">.NET version of the SDK</a> so if you’re interested in my deconstruction process then read on, otherwise I would recommend using the official/fully-featured SDKs.</em></p>
<blockquote class="pullout"><p>“Then, in that moment of lucid elation, I realised exactly what was going wrong.”</p>
</blockquote>
<p>The compression format was identified as <abbr title="Lempel-Ziv-Storer-Szymanski">LZSS</abbr> and reading through <a href="http://sekai.insani.org/archives/24">several</a> <a href="http://oldwww.rasip.fer.hr/research/compress/algorithms/fund/lz/lzss.html">sites</a> revealed that some of the data I had initially spotted but attributed to SHIFT JIS (or at one point a Unicode Byte Order Marker, perfect for a non-Unicode file) were the tell-tale signatures of LZSS; the gradual degradation into junk data was also typical of the algorithm as the further into the file the stream progresses, the more back references are present.</p>
<p><img class="alignnone size-full wp-image-425" title="LZSS" src="http://chaostangent.com/wp-content/uploads/2008/01/06.png" alt="yanePkDX" width="382" height="20" /><br />
While I hadn’t heard of LZSS, it came as no surprise that it was a modified version of <a href="http://en.wikipedia.org/wiki/LZ77">LZ77</a> which I had come across before though never toyed with. Having to <a href="http://www.cs.duke.edu/courses/spring03/cps296.5/papers/ziv_lempel_1977_universal_algorithm.pdf">dig through a dense PDF</a> was not my idea of fun and my university days had proven that reading academic proofs rarely lead to workable implementations for me so I <a href="http://www.google.co.uk/search?q=lzss+php">searched for a ready-made PHP version</a> which (for reasons which will soon become glaringly apparent) didn’t prove fruitful. After coming up against dead-ends with other languages I settled on the <a href="http://www.koders.com/c/fidC554142F5E42CA3433CD4C8B9043D09C8A092DF8.aspx">defacto C version</a> which seemed most other versions I found were based off.<br />
<span id="more-17"></span><br />
Ignoring my <a href="/stuff/deconstruction/deconstructor.zip">original deconstruction script</a> for the moment, I worked on the assumption that each individual file contained within the large .dat files were individually compressed given that each file had a readable opening section of bytes and (according to the LZSS spec) didn’t have any back references. Like with other implementations of algorithms I didn’t fully understand, I copied the C code more or less exactly, altering formatting to my tastes and altering code to take into account any PHP idioms that I could foresee. Checking things over, I pumped in one of the compressed files and, unsurprisingly, the output file was more or less blank. After rechecking the code and running it again, the output file was once again filled with spaces and some sporadic junk bytes that didn’t look familiar.</p>
<p>The script wasn’t even outputting the uncompressed data at the beginning of the file and the output was larger than the input but still not the size flagged in the original .dat files. After scratching my head for a while I set about spitting out some debug data to pinpoint what had gone wrong and where. The algorithm is broken down into roughly three main sections, in two main control structures. Putting in some basic output formatting to check each section was executing proved that each section was being run in a way that I could only assume was correct:</p>
<p><img class="alignnone size-full wp-image-426" title="The output attacks" src="http://chaostangent.com/wp-content/uploads/2008/01/07.png" alt="The output attacks" width="520" height="144" /></p>
<p>This assumption of course turned out to be false but I wouldn’t realise this until later in the day. The LZSS algorithm uses a number of constants to define things such as the size of the sliding buffer window, maximal reference length and minimal reference length (a change from the LZ77 algorithm to prevent the encoding being longer than the original) so I tweaked the values first with sensible then ridiculous values only to have the script spit out similarly broken output. The C algorithm also had several places where it used hex values to do bitwise operations, converting these to decimal (obviously) proved ineffective and I was ready by now to admit that I was stumped. I had been working on it for a while now so I took a break for lunch, during which I decided to ditch the C algorithm and start from scratch so that I actually understood what was going on.</p>
<p>This proved even more torturous so I switched back to my original script and started spitting out some fairly detailed output including: the section of the algorithm, the current byte location in the stream, the hex value of the most pertinent read byte and the binary value of that byte.</p>
<p><img class="alignnone size-full wp-image-427" title="It just keeps coming!" src="http://chaostangent.com/wp-content/uploads/2008/01/08.png" alt="It just keeps coming!" width="520" height="144" /></p>
<p>This more or less nailed down that the entire implementation was broken, the values it was generating from the very beginning were incorrect which of course meant all the back references and so forth were incorrect. Using the binary output and a bit of paper I worked out what the values were <em>supposed</em> to be and started following the values through the algorithm. This part was absolutely essential to working out what was wrong with the implementation of the algorithm as it elucidated what each part did:</p>
<ol>
<li>The first section (which I had termed “FLAGS”) worked out whether a byte was a control byte and set a flags variable</li>
<li>The second section (which I had termed “AND1”) assumed it was reading a raw byte and simply wrote it to the output stream (and the buffer).</li>
<li>The third section (which I had termed “CONTROL”) read the two control bytes which formed a back reference and then read the appropriate data from the buffer and subsequently the output.</li>
</ol>
<p>From my output it was apparent the meat of the algorithm, reading the raw data, wasn’t being done. Then, in that moment of lucid elation, I realised exactly what was going wrong.</p>
<p>PHP was grabbing a byte from the input file as a string, and being a loosely-typed language meant that when it came to doing bitwise operations, the underlying type was incredibly important. I’m more than willing to admit that this state of affairs was my own damn fault for prototyping this in a language that wasn’t built for algorithms and bit level operations and had I done this in a strongly-typed language, everything would have been dandy. Of course, had I simply dumped the implementation I found into a C file and compiled away, I wouldn’t really understand what was going on, so my retardedness didn’t go to waste.</p>
<p>Long story short, forcing the read bytes into an integer using the ord() function (and intval() just to make sure) solved the issue and the file I was working on transformed before my eyes.</p>
<p><img class="alignnone size-full wp-image-428" title="Taming the algorithm" src="http://chaostangent.com/wp-content/uploads/2008/01/09.png" alt="09" width="520" height="144" /></p>
<p><img class="alignnone size-full wp-image-429" title="Almost got it..." src="http://chaostangent.com/wp-content/uploads/2008/01/10.png" alt="Almost got it..." width="520" height="150" /></p>
<p>Almost.</p>
<p>Turns out what “sage” had said in my comments on the original version of my unpacker was slightly wrong, the sliding window wasn’t 256 bytes (0x100) but the standard LZSS implementation window size of 4096 bytes which means that nothing really needed to be changed from the standard C implementation of the algorithm. As a proof of concept:</p>
<p><a href="http://chaostangent.com/wp-content/uploads/2008/01/chara_init_third.xml.lzss">Sample LZSS compressed file</a>, <a href="http://chaostangent.com/wp-content/uploads/2008/01/chara_init_third.xml">Sample uncompressed file</a></p>
<p>So I now present version 1.1 of the deconstructor script which is released under the same <a href="http://creativecommons.org/licenses/by/2.0/">Creative Commons Attribution 2.0 Generic license</a>.</p>
<p><a href="http://chaostangent.com/wp-content/uploads/2008/01/deconstructor1.1.zip">deconstructor1.1.zip (1.4KB)</a></p>
<p>The usage is exactly the same:<br />
<code>php deconstructor.php data1.dat output\</code></p>
<p>The only difference will be the output spat out by the script which will tell you when a file has been decompressed and whether it succeeded or failed (done by checking the canonical size in the .dat file versus the output size).</p>
<p><strong>To-do</strong><br />
At the moment the script outputs a file to a temporary name and then operates on that file. This isn’t optimal but I was having trouble getting my implementation to work in-stream, probably due to fatigue. I may or may not fix that for the PHP version as the next step is to drop the entire deconstructor into a C or C++ file and do a native compile so you don’t have to mess around with PHP and I feel like I’ve developed something in a big-boys’ language. If I get the time and the inclination I may do that over the weekend.</p>
<p>As well as the unpacker, I get the feeling that the <a href="http://blog.seiha.org/">friend</a> who this is a favour for will require a repacker which will obviously mean doing the LZSS algorithm in reverse and also bundling everything into a .dat file. Should be an intriguing challenge to see if I’ve learned anything from this little endeavour. <span class="signOff">¶</span></p>
]]></content:encoded>
			<wfw:commentRss>http://chaostangent.com/2008/01/deconstruction-part-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Deconstruction</title>
		<link>http://chaostangent.com/2008/01/deconstruction/</link>
		<comments>http://chaostangent.com/2008/01/deconstruction/#comments</comments>
		<pubDate>Mon, 07 Jan 2008 19:41:31 +0000</pubDate>
		<dc:creator>chaostangent</dc:creator>
				<category><![CDATA[Anime]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Geekery]]></category>
		<category><![CDATA[battle moon wars]]></category>
		<category><![CDATA[bits]]></category>
		<category><![CDATA[bytes]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[decompile]]></category>
		<category><![CDATA[deconstruction]]></category>
		<category><![CDATA[hex]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[translation]]></category>
		<category><![CDATA[werk]]></category>
		<category><![CDATA[xml]]></category>
		<category><![CDATA[xvi]]></category>
		<category><![CDATA[yanepak]]></category>

		<guid isPermaLink="false">http://chaostangent.com/?p=15</guid>
		<description><![CDATA[Out of curiosity and a favour to someone, I decided to take a look at some random .dat files that were ripe for the translating; what ensued was a morning of head scratching, hex scrying and using some of the lesser used PHP functions.
Sample File 1, Sample File 2, Sample File 3
All screenshots taken from [...]]]></description>
			<content:encoded><![CDATA[<p>Out of curiosity and a <a href="http://blog.seiha.org/">favour to someone</a>, I decided to take a look at some random .dat files that were ripe for the translating; what ensued was a morning of head scratching, hex scrying and using some of the lesser used PHP functions.</p>
<p><a href="http://192.168.1.65/blog.chaostangent.com/stuff/deconstruction/data1.dat">Sample File 1</a>, <a href="http://192.168.1.65/blog.chaostangent.com/stuff/deconstruction/data3.dat">Sample File 2</a>, <a href="http://192.168.1.65/blog.chaostangent.com/stuff/deconstruction/data5.dat">Sample File 3</a></p>
<p><em>All screenshots taken from data1.dat, sample file 1 and the window is resized for the most appropriate screenshot rather than general workability.</em></p>
<blockquote class="pullout"><p>“so garbled that it sent a few hundred bell tones to my computer speaker”</p>
</blockquote>
<p>First thing I did was to crank open the lovely XVI32 hex editor and have a look at the sample files provided, their .dat extension more or less indicated they were a proprietary format and were unlikely to relinquish their secrets easily. What was known was that the files contained a header portion, a bundle of XML files in a contiguous stream and a lot of junk data. The XML files could be seen and their encoding was stated as SHIFT JIS and, after cursing its existence, I attributed the junk data to that which seemed like a good place to start.</p>
<p><a href="http://chaostangent.com/wp-content/uploads/2008/01/01.png"><img class="alignnone size-full wp-image-434" title="yanepkDx" src="http://chaostangent.com/wp-content/uploads/2008/01/01.png" alt="01" width="320" height="20" /></a><br />
The first eight bytes seemed to be a file signature, but <a href="http://www.google.com/search?q=yanepk">Google</a> <a href="http://www.google.com/search?q=yanepkdx">searches</a> for all or parts of the signature were fruitless which meant it was time to pick things apart.</p>
<p><a href="http://chaostangent.com/wp-content/uploads/2008/01/02.png"><img class="alignnone size-full wp-image-435" title="Crazy bytes" src="http://chaostangent.com/wp-content/uploads/2008/01/02.png" alt="02" width="320" height="20" /></a><br />
The next four bytes were different for each file and at first I thought it was part of the block format that made up the header part of the file but the section repetition for the header block didn’t match up so after converting it to a variety of different number formats (I’m no hex wizard and I originally thought it was only a two byte short rather than a four byte integer or long) and assumed it was an unisgned long (32 bits) in Little Endian order.<br />
<span id="more-15"></span><br />
<a href="http://chaostangent.com/wp-content/uploads/2008/01/03.png"><img class="alignnone size-full wp-image-436" title="Hex assault" src="http://chaostangent.com/wp-content/uploads/2008/01/03.png" alt="Hex assault" width="443" height="438" /></a><br />
The next section pattern repeated a number of times until the file obviously started with the embedded XML files. After a bit of byte counting and “duh” moments, the general format of the section is:</p>
<p><code>256 bytes - file path and name<br />
4 bytes - unsigned long<br />
4 bytes - unsigned long<br />
4 bytes - unsigned long</code></p>
<p>At a total of 268 bytes for each block, this layout repeats for precisely the number of times specified by the very first unsigned long (after the file signature). So the entire header block consists of:</p>
<p><code>8 bytes - signature "yanepkDx"<br />
4 bytes - number of header entries<br />
(number of header entries * 268 bytes) - header entries</code></p>
<p>This was all well and good but didn’t really illuminate exactly what the three numbers were. After pulling out all the entries, a few things became clear:</p>
<ul>
<li>The first number in each block increases for each successive block</li>
<li>The second number was always larger than or equal to the third number</li>
<li>The first number plus the third number always equalled the first number of the block immediately after the current one</li>
</ul>
<p>So without resorting to rocket science the first number is the absolute byte offset of the filename, the second number was a bit of a mystery, the third number is the length in bytes of the data in the file. After pushing this info through a script it became obvious this was the defacto format of the file, no complex tree structures or other nasties were awaiting; the XML files were pulled out without problem and within a few minutes their original file structure was recreated.</p>
<p>All done right? Wrong. My initial thought that the XML files were SHIFT JIS encoded was indeed correct, however it didn’t solve the junk that proliferated <strong>some</strong> the files.</p>
<p><a rel="text/xml" href="/stuff/deconstruction/arive_boss_plane.xml">Sample un-junked file</a>, <a rel="text/xml" href="/stuff/deconstruction/chara_growth.xml">Sample junked file</a><br />
<a href="http://chaostangent.com/wp-content/uploads/2008/01/04.png"><img class="alignnone size-full wp-image-437" title="Junked file" src="http://chaostangent.com/wp-content/uploads/2008/01/04.png" alt="Junked file" width="520" height="150" /></a><br />
Trying to shift the format into different encodings using known functions only seemed to jumble the junk around rather than get rid of it. It now became apparent that the data was more than likely compressed or otherwise encoded which illuminated what the mysterious second number was in each of the header blocks. The third number represented the packed size of the data, the second represented the unpacked size; this was obvious as the smaller, un-junked files had the same values for each, usually less than 150 bytes.</p>
<p><a href="http://chaostangent.com/wp-content/uploads/2008/01/05.png"><img class="alignnone size-full wp-image-438" title="Junk output" src="http://chaostangent.com/wp-content/uploads/2008/01/05.png" alt="Junk output" width="520" height="144" /></a><br />
Running both the individual files and the larger .dat file through various decompressors proved less than useful as most of the time the file became so garbled that it sent a few hundred bell tones to my computer speaker making it sound like it was having a seizure. I tried various versions and functions of the gzip/zlib library, bzip2, LHA (of which I knew the Japanese were particularly fond of) and of course good old fashioned zip. It stood to reason that the compression wasn’t going to be processor intensive (very few game compression schemes are) which more or less ruled out predictive text algorithms (PPM et al) as well as ACE and 7z formats. The files also seemed to lack any form of dictionary entries as for each file the XML declaration was always in tact which meant that the compression seemed to start an arbitrary length into the file (which would explain why the smaller files were untouched).</p>
<p>This is unfortunately as far as I got after a mornings work and spent a decent amount of time attempting to track down information. The game the files comes from is <a href="http://blog.seiha.org/?p=92">Battle Moon Wars Act 3</a> and it seems that they use TYPE MOON characters, other games of which have been successfully translated which may be one avenue to investigate. The <a href="http://en.wikipedia.org/wiki/Battle_Moon_Wars">developers of the game are “Werk”</a> and if any of their other games (either in the series or otherwise) had been pulled apart, it may give some indication as to where to go forward. There does seem to be information in someone’s brain as not only was an <a href="/stuff/deconstruction/data1unpacked.dat">“unpacked” version of data1.dat unearthed</a>, but <a href="http://forums.visualnews.net/showthread.php?t=11925">forum</a> <a href="http://nrvnqsr.proboards20.com/index.cgi?action=display&amp;board=doujin&amp;thread=1124787854&amp;page=3">posts</a> indicate that work had already begun (if not already aborted) on the technical side of things.</p>
<p>For today at least I’m done with attempting to reverse-engineer arbitrary files and perhaps after sleeping on it some bright idea will be revealed to me that daylight failed to illuminate. For now there is the command line PHP script I quickly prototyped to deconstruct the .dat files (released under the <a href="http://creativecommons.org/licenses/by/2.0/">Creative Commons Attribution 2.0 Generic License</a>) and the promise of further work in the future:</p>
<p><a href="/stuff/deconstruction/deconstructor.zip">deconstructor.zip (1KB)</a></p>
<p>Things should be self explanatory from the file; get a command line PHP interpreter set up and run “deconstructor.php” with the name of the file to tear apart and optionally an output folder e.g.<br />
<code>php deconstructor.php data1.dat output\</code></p>
<p>This is an open call for anyone who wants to help with the effort to scry the encoding/compression of the XML files whether you already know or want to take a stab at it, you are more than welcome. <span class="signOff">¶</span></p>
]]></content:encoded>
			<wfw:commentRss>http://chaostangent.com/2008/01/deconstruction/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Converting to Flash video — (almost) free and not so easy</title>
		<link>http://chaostangent.com/2007/07/converting-to-flash-video-almost-free-and-not-so-easy/</link>
		<comments>http://chaostangent.com/2007/07/converting-to-flash-video-almost-free-and-not-so-easy/#comments</comments>
		<pubDate>Thu, 19 Jul 2007 09:55:19 +0000</pubDate>
		<dc:creator>chaostangent</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Geekery]]></category>
		<category><![CDATA[bitrate]]></category>
		<category><![CDATA[ffmpeg]]></category>
		<category><![CDATA[flash]]></category>
		<category><![CDATA[flv]]></category>
		<category><![CDATA[free]]></category>
		<category><![CDATA[h263]]></category>
		<category><![CDATA[mpeg]]></category>
		<category><![CDATA[opensource]]></category>
		<category><![CDATA[vfw]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[video for windows]]></category>
		<category><![CDATA[virtualdub]]></category>
		<category><![CDATA[vp6]]></category>
		<category><![CDATA[vp62]]></category>

		<guid isPermaLink="false">http://chaostangent.com/?p=19</guid>
		<description><![CDATA[How do you convert an arbitrary video file into a playable Flash video using freely available programs and methods? After close to an afternoon of searching, testing and head-scratching, I finally have a whole answer that can be applied ad-hoc to almost any video you can get your hands on.
This “guide” (more anecdotal than how-to) [...]]]></description>
			<content:encoded><![CDATA[<p>How do you convert an arbitrary video file into a playable Flash video using freely available programs and methods? After close to an afternoon of searching, testing and head-scratching, I finally have a whole answer that can be applied ad-hoc to almost any video you can get your hands on.</p>
<p>This “guide” (more anecdotal than how-to) assumes knowledge of video encoding basics, I’m not going to cover the difference between container and video formats or how to use <a href="http://virtualdub.org/">VirtualDub</a>, there are plenty of other tutorials and guides that cover those topics.<br />
<span id="more-19"></span><br />
The Flash Video container format (FLV) supports two major video formats: the first is codenamed “Sorenson Spark” and is a variant on the H.263 standard; the second is “On2 TrueMotion VP6”. The former is well supported in many encoding and decoding tools and libraries, the latter isn’t. It will come as no surprise then that the latter allows for far greater compression at similar visual quality when compared apples-to-apples to the H.263 format. To give you a quantifiable measure of this: I managed to get more than 2 times greater compression and better visual quality when using the VP6 codec. In short, this is the codec you want to use to get the most out of your movies.</p>
<p><strong>Quick and dirty</strong><br />
If you just want to get a compatible FLV file quickly and painlessly and aren’t worried about size or quality overmuch, then grab yourself a copy of <a href="http://ffmpeg.mplayerhq.hu/">ffmpeg</a> (a recent compiled Windows binary can be <a href="http://ffdshow.faireal.net/mirror/ffmpeg/">found here</a>) and put it in a place where your command-line of choice can find it. Then punch in:</p>
<p><code>ffmpeg -i "yourvideofilegoeshere.avi" outfile.flv</code></p>
<p>Voila, in no time flat you’ll have an all-singing all-dancing .flv file ready for whatever you have in store for it. If you’re feeling particularly awesome, you can even control the output size of the video:</p>
<p><code>ffmpeg -i "totallyawesomekittenvideo.avi" -s 320x240 outfile.flv</code></p>
<p>ffmpeg, converts using the H.263 video flavour and MP3 audio format which is <em>probably</em> fine for most people. The problem with this process is the output is less than stellar and suffers from a tremendous amount of artefacting. I wanted to exercise a little more control over the visual quality. Some searching revealed byzantine quantizer settings which made the command line look like a calculator had exploded:</p>
<p><code>ffmpeg -i "hahathatguytotallysucks.avi" -qcomp 0.6 -qmax 15 -qdiff 4 -i_qfactor 0.71428572 -b_qfactor 0.76923078 -maxrate 972800 -s 320x240 -b 819200 -refs 1 -subq 1 -y outfile.flv</code></p>
<p>Tweaking these options gives variable results, but nothing close to the kind of quality / file-size ratio I wanted. The VP6 codec seemed worth trying out. Unfortunately using the VP6 codec is rife with hurdles; the primary one is that it is entirely proprietary, <a href="http://www.on2.com/">On2</a> own licenses and patents and probably crocodiles with bazookas to protect the codec; some companies have obtained licenses to use it in their products (On2 of course having their own implementation) which means the easiest and most pain-free route is to buy one of those products and bask in the fully-licensed glory.</p>
<p>What isn’t widely publicised is that On2 released a version of the VP6 codec for “Personal use” but no longer provide it for download on their website. A cursory search on Google (lets say “<a href="http://www.google.com/search?q=vp6+vfw+codec">vp6 vfw codec</a>”) returns some good matches. After downloading and installing, I now had the ability to encode to VP6 as long as it’s for “Personal use” according to the license agreement. This little endeavour was for my own curiosity rather than monetary gain which I’m sure falls under that stipulation.</p>
<p>Codec in hand, in theory it should be as simple as encoding to VP6 using something like VirtualDub and then muxing everything together into an FLV file. If only things were that simple. As far as I could see, there exists no standalone set of FLV muxing tools (like the seminal MKVtoolnix suite). However, ffmpeg can output to an FLV file and provides the ability to do a straight copy (i.e. no transcoding) of the source video, that could work…</p>
<p>No.</p>
<p>For the VP6 codec to be recognised within an FLV file, the container needs to have special bits set which indicate to the player that it’s going to receive VP6 video content rather than H.263/Spark; ffmpeg doesn’t write these bits as it doesn’t “officially” deal with VP6. After much searching, I stumbled upon a way to modify ffmpeg to write these bits but the patch hasn’t been merged into the main ffmpeg branch yet. You can download the patch and a pre-compiled Windows binary from <a href="http://sh0dan.blogspot.com/2006/09/command-line-flash-8-flv-encoding.html">a blog</a> which also offers an alternate method of achieving what I’m describing.</p>
<p><strong>The crux</strong><br />
If you haven’t been keeping up, here’s the short version of it. We want to make an FLV using VP6. Vanilla ffmpeg doesn’t do this so we need a modified ffmpeg to do it. Use the above blog page or modify your ffmpeg source to get ffmpeg to do what we want.</p>
<p>Open up your source video in VirtualDub and apply any filters you want (contrast, brightness, resize etc.) but make sure the “Flip vertically” filter is somewhere in that mix. Go to compression and select the VP6 codec; if you’re using VP 6.2 you can do two-pass encoding which, potentially gives better results than a one-pass encode. Make sure “Use source audio” is selected and save your video down. You’ll have an .avi file which has VP6 video and the original source audio.</p>
<p>Now use your modified version of ffmpeg and use the following (substituting filenames where applicable):</p>
<p><code>ffmpeg -y -i "ooooohprettyprettyflowers.avi" -vcodec copy outfile.flv</code></p>
<p>If you want to control the audio compression a little better add the options for that:</p>
<p><code>ffmpeg -b 128 -ac 2 -ar 44100 -y -i "ohwowexplosions.avi" -vcodec copy outfile.flv</code></p>
<p>You will now have “outfile.flv” which is your final Flash video file, ready for uploading. Of course, the proof of the pudding is in the tasting:</p>
<p id="h263"><a href="http://www.macromedia.com/go/getflashplayer">Get Flash</a> to see this player.</p>
<p>h263.flv — 1,466KB<br />
<script type="text/javascript">// <![CDATA[
	var so = new SWFObject("http://192.168.1.65/blog.chaostangent.com/stuff/flashvideo/mediaplayer.swf","h263","512","384","7");
	so.addParam("allowfullscreen","true");
	so.addVariable("file","http://192.168.1.65/blog.chaostangent.com/stuff/flashvideo/h263.flv");
	so.write("h263");
// ]]&gt;</script></p>
<p id="vp62"><a href="http://www.macromedia.com/go/getflashplayer">Get Flash</a> to see this player.</p>
<p>vp62.flv — 676KB<br />
<script type="text/javascript">// <![CDATA[
	var so = new SWFObject("http://192.168.1.65/blog.chaostangent.com/stuff/flashvideo/mediaplayer.swf","vp62","512","384","8");
	so.addParam("allowfullscreen","true");
	so.addVariable("file","http://192.168.1.65/blog.chaostangent.com/stuff/flashvideo/vp62.flv");
	so.write("vp62");
// ]]&gt;</script></p>
<p><strong>Conclusion</strong><br />
This process is obviously not suited for the automatic encoding process that a lot of sites seem to crave nowadays, this process is far better suited for the cash-strapped auteur who wants the most out of their videos and bandwidth and doesn’t have a large amount of videos to encode. VP6 support in ffmpeg/libavcodec is coming along, and the most recent builds of ffmpeg come with decoding support for VP6, but whether patents/license prevent encoding support is still to be seen. <span class="signOff">¶</span></p>
]]></content:encoded>
			<wfw:commentRss>http://chaostangent.com/2007/07/converting-to-flash-video-almost-free-and-not-so-easy/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Screenshotter</title>
		<link>http://chaostangent.com/2006/08/screenshotter/</link>
		<comments>http://chaostangent.com/2006/08/screenshotter/#comments</comments>
		<pubDate>Tue, 29 Aug 2006 21:34:22 +0000</pubDate>
		<dc:creator>chaostangent</dc:creator>
				<category><![CDATA[Anime]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Geekery]]></category>
		<category><![CDATA[automatic]]></category>
		<category><![CDATA[avi]]></category>
		<category><![CDATA[ffmpeg]]></category>
		<category><![CDATA[jpeg]]></category>
		<category><![CDATA[mplayer]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[png]]></category>
		<category><![CDATA[screenshots]]></category>
		<category><![CDATA[screenshotter]]></category>
		<category><![CDATA[script]]></category>
		<category><![CDATA[wmv]]></category>

		<guid isPermaLink="false">http://chaostangent.com/?p=440</guid>
		<description><![CDATA[An exploration of the different ways to automatically take a selection of screenshots from a video file. Concentrates on open-source and home-made solutions concluding with a solid first-step hybrid using mplayer and PHP.]]></description>
			<content:encoded><![CDATA[<p>An “automatic” screenshot taker is something that I’ve always wanted, but the <a href="http://www.frame-shots.com/">commercial offerings</a> leave much to be desired and the only other option seems to be the “manual” approach. I am of course talking about screenshots from video files rather than screenshots of your desktop, that sort of thing is <a href="http://www.techsmith.com/">well covered</a>.</p>
<p>One of the problems with making your own is that the options are fairly limited on just how you go about opening video files and pulling out the candy frame goodness. For Windows users, the option is to use DirectShow which I can only describe as <a href="http://en.wikipedia.org/wiki/Crystal_Maze">The Crystal Maze</a> for it’s Byzantine ways of operating are beyond mortal ken. The other option is to use a pre-built library such as <a href="http://ffmpeg.mplayerhq.hu/">ffmpeg</a> or similar. This was out as well as not only was it a whole new way of working for me (Windows development files were few and far between) it was a whole new set of a programming challenges which made the learning curve more of a learning cliff.</p>
<blockquote class="pullout"><p>“during testing I had a number of problems with this”</p></blockquote>
<p>So I turned forlornly to existing media-players in the slim hope that one of them would have the abilities required for scripting a makeshift screenshotter. <a href="http://sourceforge.net/projects/guliverkli/">Media Player Classic</a> has limited command line support, <a href="http://www.videolan.org/vlc/">VLC</a> is more geared towards client/server setup and I couldn’t even figure out whether that route would lead to any semblance of success, <a href="http://www.bsplayer.org/">BSPlayer</a>… The list goes on as to the number of players which don’t supply a full body of command line options.</p>
<p>The silver lining, the angel of hope was <a href="http://www.bsplayer.org/">MPlayer</a>. If you’re prepared to wade through a bit of fudge to get there, MPlayer provides everything you need to script a screenshotter:</p>
<ul>
<li>jump to any part of the file from the command line</li>
<li>output into different (static) formats such as PNG and JPEG</li>
<li>can output file information (length, dimensions etc.)</li>
</ul>
<p>With these three functions MPlayer is almost all you need. <strong>Almost.</strong><br />
<span id="more-440"></span><br />
First of all you need to get the MPlayer release for your architecture, for the majority of screenshot monkies, that’s Windows. Puncturing the MPlayer-Windows mantle takes a bit of pushing but essentially you can usually get away with just <a href="http://www1.mplayerhq.hu/MPlayer/releases/win32/">downloading the latest build</a>. This gives you support for a whole heap of formats (XviD, DivX, x264 and so on), however some encoders prefer to eschew open-source and go with Windows Media Video (usually of the “9” flavour). This is not available by default (as MPlayer uses ffmpeg/libavcodec and not DirectShow) so you need to grab an ethereally named <a href="http://www.mplayerhq.hu/MPlayer/releases/codecs/">codec package</a> and dump them into your MPlayer install directory. With a bit of luck and perhaps a bit of <a href="http://www.mplayerhq.hu/DOCS/HTML/en/index.html">document searching</a> you’ll have yourself a fully working command-line media player.</p>
<p>Now for the easy bit, the scripting. I chose to use PHP simply because I use it on a day-to-day basis, any kind of scripting would work though. With some document digging you can find the <a href="http://www.mplayerhq.hu/DOCS/man/en/mplayer.1.html">list of command line options</a>. For our purposes we’re only going to need to run MPlayer in two “modes”: the first is pulling the pertinent information from the video file we’re going to grab screenshots from, the second is actually pulling the screenshots out of the file.</p>
<h2>Identification</h2>
<pre><code>-vo null -nosound -frames 0 -identify</code></pre>
<p>For the impatient this sets the video output to null, disables sound, doesn’t output any frames and prints out identifying features of the video file.</p>
<h2>Screenshotting</h2>
<pre><code>-really-quiet -vo jpeg:progressive:quality=90 -nosound -ss {seek} -frames 2</code></pre>
<p>This turns off most informational otuput, sets the video output to JPEG with nice options, disables sound, seeks to the specific point in the video then outputs 2 frames (the reason for which I will explain shortly).</p>
<p>First task is to get information about the file and find out the total length of the video/stream. Running the identification command-line arguments with MPlayer and capturing the output is a simple case of running <a href="http://uk.php.net/manual/en/function.shell-exec.php">shell_exec</a> or using backticks, whichever you prefer.</p>
<pre><code>$infoOutput = shell_e xec("{$mplayerCommand} {$argsInfo} \"{$_SERVER['argv'][1]}\"");

//$matcher = '/ID_VIDEO_WIDTH=(\d+)$|ID_VIDEO_HEIGHT=(\d+)$|ID_LENGTH=(\d+\.\d+)$/m';
$matcher = '/ID_LENGTH=(\d+\.\d+)$/m';
$matches = array();

preg_match_all($matcher, $infoOutput, $matches);

$fileLength = (!empty($matches[1][0])) ? floatval($matches[1][0]) : 1140;</code></pre>
<p>The filename is pulled from the “argv” array passed to command line scripts. The commented out regex is for if you wanted to get the pixel dimensions of the video file, as it is we don’t need this information so the regex that is used is simpler and more compact. The file-length is output in seconds.milliseconds format which is then cast (as much as anything can be cast in PHP) as a float or assumed to be 24 minutes if nothing was matched.</p>
<p>The next step is to work out the interval at which you’re going to take screenshots. This is usually defined either by a frequency (take a screenshot every X seconds) or by the number of screenshots to take (100).</p>
<p>Now there is an MPlayer command line option to skip a certain number of seconds after each frame (-sstep &lt;sec&gt;) however during testing I had a number of problems with this which is why I use this less elegant but more foolproof method:</p>
<pre><code>for($i = 0; $i &lt; $screenshotCount; $i++)
{
	$offset = $i * $increment;
	$tempArgs = str_replace('{seek}', $offset, $argsPlay);
	shell_e xec("{$mplayerCommand} {$tempArgs} \"{$_SERVER['argv'][1]}\"");
}</code></pre>
<p>From the example arguments provided above, this replaces the “{seek}” token with our second offset (worked out previously) and then executes the MPlayer command with our screenshot arguments. This will dump 2 files into our working directory (use chdir to set your working directory): “00000001.jpg” and “00000002.jpg”.</p>
<p>Now, the reason for using 2 frames instead of just one is that for certain video types (WMV mostly), the first shot that is taken is <strong>always</strong> blank. The dimensions are correct, but the screenshot is just black. This is fixed by taking two screenshots as the second shot always has the content there. This is a bit of a fudge but it gets around this bizarre little occurrence.</p>
<p>That’s the meat of the screenshotter, there’s a lot which has been omitted such as ensuring the passed file exists, renaming the output files (this is neccessary otherwise the files are overwritten each cycle) and unlinking the first (possibly blank) screenshot. From here you can probably work out things yourself however I have constructed a relatively petite script which I’m going to release under the <a href="http://creativecommons.org/licenses/by/2.5/">Creative Commons Attribution License</a>.</p>
<p><a href="http://chaostangent.com/wp-content/uploads/2006/08/screenshotter.txt">screenshotter.php.txt</a></p>
<p>The main variables you’ll want to edit are $mplayerCommand, $screenshotDir and $ssCount or $ssFreq. The script creates a subdirectory within $screenshotDir of the name of the file then pumps all of the screenshots into that directory, numbering them sequentially. I have a shortcut to the script on my desktop which I can then just drag video files onto (which does the neat thing of simply appending the absolute filename onto the end of the command line which simplifies usage immensely).</p>
<p>While you can set a frequency to take screenshots at, I would strongly recommend using an absolute count as a 24 minute video file with 1 screenshot a second gives you 1440 screenshots which can take quite a while to finish and, depending on your video, not all of the shots will necessarily be very good.</p>
<p>Ways forward for this include perhaps doing batch image adjustment (levels, sharpening) as well as automatic thumbnailing (something my <a href="http://gallery.chaostangent.com">gallery</a> already does and hence omitted from this script). The built-in GD would be more than adequete for something like this although ImageMagick is perhaps swifter and more powerful but would add further overhead to the otherwise neat package.</p>
<p>Fundamentally I’ve found that taking a number of screenshots then cherry-picking the best is really the only way this script is useful. It’s good for giving on overall view of a video file rather than specific scenes within a file, for that, the “take screenshot” shortcut key is still king.</p>
<p><strong>Addendum:</strong> The eagle-eyed amongst you will notice that the “shell_exec” command in the code above has a space between the “e” and “x” in “exec”, as far as I can tell the plugin I’m using to keep the code formatting breaks WordPress when I leave the command in full in-between &lt;code&gt; tags. Bad <a href="http://www.coffee2code.com/wp-plugins/#preservecodeformatting">Preserve code formatting</a>. <span class="signOff">¶</span></p>
]]></content:encoded>
			<wfw:commentRss>http://chaostangent.com/2006/08/screenshotter/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
