<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>chaostangent &#187; Geekery</title>
	<atom:link href="http://chaostangent.com/category/geekery/feed/" rel="self" type="application/rss+xml" />
	<link>http://chaostangent.com</link>
	<description>More squirrels than sense</description>
	<lastBuildDate>Tue, 17 Aug 2010 21:26:21 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<atom:link rel='hub' href='http://chaostangent.com/?pushpress=hub'/>
		<item>
		<title>50 frames of Life</title>
		<link>http://chaostangent.com/2009/09/50-frames-of-life/</link>
		<comments>http://chaostangent.com/2009/09/50-frames-of-life/#comments</comments>
		<pubDate>Tue, 01 Sep 2009 07:16:09 +0000</pubDate>
		<dc:creator>chaostangent</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Geekery]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[avatars]]></category>
		<category><![CDATA[conways game of life]]></category>
		<category><![CDATA[game of life]]></category>
		<category><![CDATA[gravatars]]></category>
		<category><![CDATA[john horton conway]]></category>
		<category><![CDATA[optimisation]]></category>
		<category><![CDATA[php]]></category>

		<guid isPermaLink="false">http://chaostangent.com/?p=1230</guid>
		<description><![CDATA[My Sunday afternoon project wasn’t something that I could just let lie and it didn’t take long for work to start on it again. Using the list of improvements I had identified, I began with the aesthetics and then moved on to other, more number intensive areas of research.
Before even touching the code I subsumed [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://chaostangent.com/2009/08/sunday-afternoon-project-conways-game-of-life-in-php/">My Sunday afternoon project</a> wasn’t something that I could just let lie and it didn’t take long for work to start on it again. Using the list of improvements I had identified, I began with the aesthetics and then moved on to other, more number intensive areas of research.</p>
<p>Before even touching the code I subsumed everything into a Git repository; I’m a long time <a href="http://subversion.tigris.org/">Subversion</a> user but relatively new to <a href="http://git-scm.com/">Git</a> so I still regularly refer back to the “<a href="http://git-scm.com/course/svn.html">Git — SVN Crash Course</a>” which is pleasantly concise. With this done, I attacked the GIF output method first:</p>
<p class="thumbnails four"><img class="alignnone size-full wp-image-1231" title="gameoflife2-sample-1" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-sample-1.gif" alt="gameoflife2-sample-1" width="64" height="64" /> <img class="alignnone size-full wp-image-1232" title="gameoflife2-sample-2" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-sample-2.gif" alt="gameoflife2-sample-2" width="64" height="64" /> <img class="alignnone size-full wp-image-1233" title="gameoflife2-sample-3" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-sample-3.gif" alt="gameoflife2-sample-3" width="64" height="64" /> <img class="alignnone size-full wp-image-1234" title="gameoflife2-sample-4" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-sample-4.gif" alt="gameoflife2-sample-4" width="64" height="64" /></p>
<blockquote class="pullout"><p>“cooked up in a few hours and wasn’t subject to any stringent mathematical basis”</p>
</blockquote>
<p>First was visibly increasing the size of the cells, I had originally used a multiplier of four for previous iterations but that made them very indistinct, and with only fifty generations it meant a large portion of the space wasn’t used. The result was an increase in cell size to seven with a one pixel border: this was the result of a happy accident while crafting the previous post and resulted in the introductory images, however the calculations for the edge cells was incorrect which is why those animations don’t appear to “loop”  at the edges as they should. This implementation fixed that and with a vastly smaller environment (only 8x8 with a 5x5 seed), each generation of cells and their progression is easier to see. Next was addressing the colour issue, generating both a background and foreground colour met with mixed results so taking a leaf from <a href="http://scott.sherrillmix.com/blog/blogger/wp_identicon/">WP_Identicon’s book</a>, I kept the background colour constant and generated the foreground colour only:<span id="more-1230"></span></p>
<p class="thumbnails four"><img class="alignnone size-full wp-image-1235" title="gameoflife2-sample-5" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-sample-5.gif" alt="gameoflife2-sample-5" width="64" height="64" /> <img class="alignnone size-full wp-image-1236" title="gameoflife2-sample-6" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-sample-6.gif" alt="gameoflife2-sample-6" width="64" height="64" /> <img class="alignnone size-full wp-image-1237" title="gameoflife2-sample-7" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-sample-7.gif" alt="gameoflife2-sample-7" width="64" height="64" /> <img class="alignnone size-full wp-image-1238" title="gameoflife2-sample-8" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-sample-8.gif" alt="gameoflife2-sample-8" width="64" height="64" /></p>
<p>The generation was done by using different parts of the e-mail address again: the full address for the red component, the local part for the green component and the domain part for the blue component. This method produced the first four images and had a definite tendency for the green spectrum again, I then altered it slightly to compute the red component from the local part and green from the full address which resulted in a pleasing orange colour. Next was to try to tone down the rapid movement — while a tenth of second is fine for debugging and tracking progression, it’s distracting when viewed for long periods of time and the possibility of multiple avatars appearing at the same time meant slowing down the animation:</p>
<p class="thumbnails three"><img class="alignnone size-full wp-image-1238" title="gameoflife2-sample-8" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-sample-8.gif" alt="gameoflife2-sample-8" width="64" height="64" /> <img class="alignnone size-full wp-image-1239" title="gameoflife2-sample-9" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-sample-9.gif" alt="gameoflife2-sample-9" width="64" height="64" /> <img class="alignnone size-full wp-image-1240" title="gameoflife2-sample-10" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-sample-10.gif" alt="gameoflife2-sample-10" width="64" height="64" /></p>
<p>The first is the original 1/10 of a second, the second is 1/2 and the third 3/4. Obviously when viewed next to quicker varieties the effect is somewhat lost, but at 3/4 of a second per frame, a full fifty generations will last close to a minute and still remain under 15k/b which is more than acceptable for an avatar. Performing a full run on the original 1,300+ e-mail addresses, produced a set of results which coincidentally matched my previous desire to be able to identify an e-mail address’s domain at a glance. For example, Hotmail tended towards greens and oranges:</p>
<p class="thumbnails seven"><img class="alignnone size-full wp-image-1241" title="gameoflife2-hotmail-1" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-hotmail-1.gif" alt="gameoflife2-hotmail-1" width="64" height="64" /> <img class="alignnone size-full wp-image-1242" title="gameoflife2-hotmail-2" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-hotmail-2.gif" alt="gameoflife2-hotmail-2" width="64" height="64" /> <img class="alignnone size-full wp-image-1243" title="gameoflife2-hotmail-3" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-hotmail-3.gif" alt="gameoflife2-hotmail-3" width="64" height="64" /> <img class="alignnone size-full wp-image-1244" title="gameoflife2-hotmail-4" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-hotmail-4.gif" alt="gameoflife2-hotmail-4" width="64" height="64" /> <img class="alignnone size-full wp-image-1245" title="gameoflife2-hotmail-5" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-hotmail-5.gif" alt="gameoflife2-hotmail-5" width="64" height="64" /> <img class="alignnone size-full wp-image-1246" title="gameoflife2-hotmail-6" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-hotmail-6.gif" alt="gameoflife2-hotmail-6" width="64" height="64" /> <img class="alignnone size-full wp-image-1247" title="gameoflife2-hotmail-7" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-hotmail-7.gif" alt="gameoflife2-hotmail-7" width="64" height="64" /></p>
<p>while BT Internet more towards blues and pale yellows:</p>
<p class="thumbnails seven"><img class="alignnone size-full wp-image-1248" title="gameoflife2-btinternet-1" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-btinternet-1.gif" alt="gameoflife2-btinternet-1" width="64" height="64" /> <img class="alignnone size-full wp-image-1249" title="gameoflife2-btinternet-2" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-btinternet-2.gif" alt="gameoflife2-btinternet-2" width="64" height="64" /> <img class="alignnone size-full wp-image-1250" title="gameoflife2-btinternet-3" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-btinternet-3.gif" alt="gameoflife2-btinternet-3" width="64" height="64" /> <img class="alignnone size-full wp-image-1251" title="gameoflife2-btinternet-4" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-btinternet-4.gif" alt="gameoflife2-btinternet-4" width="64" height="64" /> <img class="alignnone size-full wp-image-1252" title="gameoflife2-btinternet-5" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-btinternet-5.gif" alt="gameoflife2-btinternet-5" width="64" height="64" /> <img class="alignnone size-full wp-image-1253" title="gameoflife2-btinternet-6" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-btinternet-6.gif" alt="gameoflife2-btinternet-6" width="64" height="64" /> <img class="alignnone size-full wp-image-1254" title="gameoflife2-btinternet-7" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife2-btinternet-7.gif" alt="gameoflife2-btinternet-7" width="64" height="64" /></p>
<p>These are regardless of the local part of the address, and while they’re not the “one colour per domain” that I had hoped for, in aggregate their identity is clear.</p>
<p>In a sense, that was the project completed, I had created something that was unique, quirky and most of all aesthetically pleasing; I could spend hours tweaking and adjusting parameters to make for better avatars, perhaps slowing down the animation further or altering the colour generation methods, but for what it is, the result is better than I expected. This wasn’t the end of the work though. As you may notice, none of the above animations have any protracted dead states, whereby the animation is blank or static for more than a frame — the reason for this is that while I addressed the aesthetics first, I also implemented two heuristics to stop the algorithm in those cases:</p>
<pre class="brush: php">private function isInactive(array $state)
{
	foreach($state AS $row)
	{
		foreach($row AS $cell)
		{
			if($cell == 1)
			{
				return false;
			}
		}
	}

	return true;
}

private function isEqual(array $state1, array $state2)
{
	foreach($state1 AS $h =&gt; $row)
	{
		if($row !== $state2[$h])
		{
			return false;
		}
	}

	return true;
}</pre>
<p>Both are simple and are called after a new state has been generated and output but before it has been switched for the current state, that way the final state will still be shown but no further operations will take place. These both cover the most common cases but ignores more specialised examples such as patterns with an alternating pattern (period two or above) which will not be caught. The inactivity function could be ignored altogether and rely on the equality function but this would require at least two generations of dead cells which I deemed to be unacceptably. Both of these functions are the result of optimisations meant to short-circuit generation and prevent needless calculation. However I wasn’t entirely certain that the cost of iterating through the grid was worth it, so storing away the timings for the original version of the algorithm I did a number of other benchmarks to see whether this was worth it, and to get some general statistics on the process.</p>
<h2>Statistics</h2>
<p>The first run used the first version of the algorithm (no stop checks) on a 32x32 grid over 50 generations on 1382 sample e-mail addresses. The average time for only the game to run (e.g. not counting the seed or colour generation) including GIF generation was 1.2227 seconds; with a standard deviation of 0.0623 and variance of 0.0039, the numbers are fairly solid. On an identical run but including stop checks the average time per run was 0.6297 seconds, while standard deviation and variance were 0.5097 and 0.2598 respectively — obviously the shorter average time is balanced out by the distribution of values, the quickest run finished in 0.0352 seconds while the slowest came out at 1.8953. I wanted the values to be as “real world” as possible which is why I didn’t bother checking for the time taken to process all values or take more detailed timings from within the Life algorithm itself, these numbers represent values likely to be experienced if this was to ever be put into use.</p>
<p>This proved that pruning was worthwhile overall, so I continued on with some comparative testing. Using the same sample set and generation count, an 8x8 grid had an average time per run of 0.3078 seconds while a 64x64 grid average 1.4124 seconds which is still only slightly worse than a 32x32 without any stop state checking. It’s difficult to tell with only three data points how exactly the algorithm is going to increase in time versus grid size but the correlation if it wasn’t obvious before is backed up by the statistics.</p>
<p>With these checked, I moved on to the other secondary aspects of the algorithm, namely the colour and seed generation. I was interested in both how long they took to run and also the quality of the results they produced: namely whether there were any collisions in the colours and seeds. For the original two colour generation from an e-mail address, the time taken was tied to the length of the string being operated on, even so the average time taken was ~0.0001 for strings up to 39 characters long which meant this wasn’t a very computationally expensive operation. The most five common string lengths were between 20 and 25 characters long:</p>
<table border="0">
<thead>
<tr>
<th scope="col">String length</th>
<th scope="col">Average time</th>
<th scope="col">Standard deviation</th>
<th scope="col">Variance</th>
<th scope="col">Count</th>
</tr>
</thead>
<tbody>
<tr>
<td>20</td>
<td>9.30851E-05</td>
<td>1.44625E-05</td>
<td>2.09163E-10</td>
<td>94</td>
</tr>
<tr>
<td>21</td>
<td>9.37708E-05</td>
<td>1.56694E-05</td>
<td>2.45531E-10</td>
<td>96</td>
</tr>
<tr>
<td>22</td>
<td>0.000101972</td>
<td>1.91865E-05</td>
<td>3.68121E-10</td>
<td>107</td>
</tr>
<tr>
<td>23</td>
<td>0.000102602</td>
<td>1.78043E-05</td>
<td>3.16992E-10</td>
<td>133</td>
</tr>
<tr>
<td>24</td>
<td>0.00010361</td>
<td>1.86302E-05</td>
<td>3.47085E-10</td>
<td>118</td>
</tr>
<tr>
<td>25</td>
<td>0.000108533</td>
<td>2.64901E-05</td>
<td>7.01723E-10</td>
<td>137</td>
</tr>
</tbody>
</table>
<p>Not exactly thrilling reading but demonstrates that it was running much as I expected. What was most worrying was when I checked for collisions for each colour (background and foreground): 43 foreground collisions and 83 background collisions and 43 full colour collisions; all of the e-mail addresses were unique so this was a worrying trend. I was particularly concerned with the background colour but even with such a small sample set (in comparison to the number of possible e-mail addresses) there is a 3.1% chance of a full collision which is less than ideal. The one colour generation fared little better:</p>
<table border="0">
<thead>
<tr>
<th scope="col">String length</th>
<th scope="col">Average time</th>
<th scope="col">Standard deviation</th>
<th scope="col">Variance</th>
<th scope="col">Count</th>
</tr>
</thead>
<tbody>
<tr>
<td>20</td>
<td>0.00010434</td>
<td>1.68583E-05</td>
<td>2.84203E-10</td>
<td>94</td>
</tr>
<tr>
<td>21</td>
<td>0.000108323</td>
<td>1.69948E-05</td>
<td>2.88823E-10</td>
<td>96</td>
</tr>
<tr>
<td>22</td>
<td>0.000112514</td>
<td>1.34494E-05</td>
<td>1.80885E-10</td>
<td>107</td>
</tr>
<tr>
<td>23</td>
<td>0.000111835</td>
<td>1.68996E-05</td>
<td>2.85597E-10</td>
<td>133</td>
</tr>
<tr>
<td>24</td>
<td>0.000114466</td>
<td>1.76485E-05</td>
<td>3.11469E-10</td>
<td>118</td>
</tr>
<tr>
<td>25</td>
<td>0.000120416</td>
<td>1.49733E-05</td>
<td>2.24199E-10</td>
<td>137</td>
</tr>
</tbody>
</table>
<p>Not only is it marginally slower but the number of collisions came to 147 which is 10.6% of the sample set, worse than the two colour generation and odd considering the range of colours that this function can produce. The key is in the standard deviation of the colours though: 73.9285, 72.4408, 75.2587 for the red, green and blue components — despite using most of the colour spectrum, they don’t deviate much from the mean which is increasing the likelihood of a collision.</p>
<p>While there are equally detailed statistics for the seed generation, the number show much of what one would expect: time increases according to string length and general time, like the colour generation is in the region of ~0.0001 seconds for average length addresses. What I was most interested in though is the number of identical seeds: 22. This is not entirely unexpected given that the method for generating was cooked up in a few hours and wasn’t subject to any stringent mathematical basis, 1.6% of the sample set isn’t game breaking.</p>
<p>All of this may seem academic for such a small system but the results are invaluable if I ever wanted to advance the script any — the statistics show that the further optimisation of the Life algorithm could yield quicker run times, while the seed and colour generation need work to be able to generate more unique values. The Life algorithm is well travelled by other programmers which means a lot of the complex optimisation work has already been done (<a href="http://www.ddj.com/hpc-high-performance-computing/184406478">Hash Life</a> etc.) whereas the two generators definitely need more work and a more structured approach to their construction.</p>
<p><script src="/wp-includes/js/syntaxhighlighter/shCore.js" type="text/javascript"></script><script src="/wp-includes/js/syntaxhighlighter/shBrushPhp.js" type="text/javascript"></script><script type="text/javascript">// <![CDATA[
                 SyntaxHighlighter.config.clipboardSwf = "/wp-includes/js/syntaxhighlighter/clipboard.swf"; SyntaxHighlighter.defaults.tabSize = 2; SyntaxHighlighter.defaults.toolbar = false; SyntaxHighlighter.all();
// ]]&gt;</script></p>
<h2>Release</h2>
<p>For the moment I’m finished tinkering with the algorithm itself so it makes sense to release the code in case anyone else wishes to use this as a base. The license for this release is <a href="http://creativecommons.org/licenses/by-sa/3.0/">Creative Commons Attribution-Share Alike 3.0</a> which means you can download, redistribute, play and alter but give credit where it’s due. This is version 1.1 which includes the checks for stop conditions.</p>
<p><strong>Conway’s Game of Life in PHP + Seed and Colour generation — version 1.1</strong><br />
<a href="http://chaostangent.com/wp-content/uploads/2009/09/life-1.1.zip">ZIP — 7kb</a> <span class="signOff">¶</span></p>
]]></content:encoded>
			<wfw:commentRss>http://chaostangent.com/2009/09/50-frames-of-life/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sunday afternoon project: Conway’s Game of Life in PHP</title>
		<link>http://chaostangent.com/2009/08/sunday-afternoon-project-conways-game-of-life-in-php/</link>
		<comments>http://chaostangent.com/2009/08/sunday-afternoon-project-conways-game-of-life-in-php/#comments</comments>
		<pubDate>Sun, 30 Aug 2009 20:44:07 +0000</pubDate>
		<dc:creator>chaostangent</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Geekery]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[animated gif]]></category>
		<category><![CDATA[animation]]></category>
		<category><![CDATA[automata]]></category>
		<category><![CDATA[avatar]]></category>
		<category><![CDATA[cellular automata]]></category>
		<category><![CDATA[conways game of life]]></category>
		<category><![CDATA[game of life]]></category>
		<category><![CDATA[gd]]></category>
		<category><![CDATA[gravatar]]></category>
		<category><![CDATA[life]]></category>
		<category><![CDATA[php]]></category>

		<guid isPermaLink="false">http://chaostangent.com/?p=1201</guid>
		<description><![CDATA[On a quiet Sunday afternoon of a Bank holiday weekend, a project is born marrying the cellular automata of John Horton Conway's Game of Life and the automatic generation of avatars. While not entirely successful, the foundation has been laid for further improvement.]]></description>
			<content:encoded><![CDATA[<p class="thumbnails two"><a href="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife-1.gif"><img class="alignnone size-full wp-image-1204" title="gameoflife-1" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife-1.gif" alt="gameoflife-1" width="248" height="140" /></a> <a href="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife-2.gif"><img class="alignnone size-full wp-image-1205" title="gameoflife-2" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife-2.gif" alt="gameoflife-2" width="248" height="140" /></a></p>
<p>As a way of spending my bank holiday Sunday afternoon, I decided to embark on a small project; I didn’t know what the project would be when I first began browsing through <a href="http://en.wikipedia.org">Wikipedia</a> but eventually I ended up in <a href="http://cplus.about.com/lr/programming_challenges/183021/1/">About.com’s C++ challenge section</a>, <a href="http://cplus.about.com/od/programmingchallenges/a/challenge17.htm">one of which</a> concerned <a href="http://en.wikipedia.org/wiki/Conway%27s_Game_of_Life">John Horton Conway’s “Game of Life”</a>: a rudimentary cellular automaton which, after its inception in 1970,  had immeasurable impact on fields as diverse as philosophy and theology. After toying with some ideas, I decided to build a script which automatically creates animations of a number of generations of the game. From that seed the project grew into the first steps towards an avatar system, much like the automatically generated <a href="http://en.gravatar.com/">Gravatars</a> that currently adorn so many Wordpress based blogs.</p>
<blockquote class="pullout"><p>“I wanted something that was deterministic and identifiable”</p>
</blockquote>
<p>The first step was getting the algorithm working, and as I had already decided to make it web-based, that meant a PHP implementation. Using only the Wikipedia page as reference, I threw together a very basic script that allowed me to enter in some settings (grid dimensions, seed and generation limit) and for it to spit out the states between the seed and the generation cut off. After some wrangling with minor bugs (spelling errors, incorrect typing etc.) an unoptimised first version of the algorithm was complete:<span id="more-1201"></span></p>
<pre class="brush: php">private function tick($currentState)
{
	// copy
	$newState = $currentState;
	$width = count($currentState[0]); $height = count($currentState);

	for($h = 0; $h &lt; $height; $h++)
	{
		for($w = 0; $w &lt; $width; $w++)
		{
			$neighbours = 0;
			for($i = 0; $i &lt; 8; $i++)
			{
				$newH = $h; $newW = $w;
				switch($i)
				{
					case 0:
						$newW -= 1; $newH -= 1;
						break;
					case 1:
						$newH -= 1;
						break;
					case 2:
						$newW += 1; $newH -=1;
						break;
					case 3:
						$newW += 1;
						break;
					case 4:
						$newW += 1; $newH += 1;
						break;
					case 5:
						$newH += 1;
						break;
					case 6:
						$newW -= 1; $newH += 1;
						break;
					case 7:
						$newW -= 1;
						break;
				}

				$newW = ($newW &lt; 0) ? ($width + $newW) : $newW;
				$newW = ($newW &gt;= $width) ? ($newW - $width) : $newW;

				$newH = ($newH &lt; 0) ? ($height + $newH) : $newH;
				$newH = ($newH &gt;= $height) ? ($newH - $height) : $newH;

				$neighbours += ($currentState[$newW][$newH]);
			}

			if($currentState[$w][$h] == 1)
			{
				if(($neighbours &lt; 2) || ($neighbours &gt; 3))
				{
					$newState[$w][$h] = 0;
				}
			}
			else
			{
				if($neighbours == 3)
				{
					$newState[$w][$h] = 1;
				}
			}
		}
	}

	return $newState;
}</pre>
<p>Part of the development of this involved settling on a suitable debug output. Initially this was just a pre-formatted output of each state with a 0 representing an inactive cell and a 1 representing an active one; this is fine for quickly checking the state but matching up against more detailed debug output (such as neighbours for a specific cell) proved tricky, especially when you take into account line-heights and other annoyances. I settled on a quick HTML table which had a caption for the current generation of the algorithm and also added co-ordinate references for easy look ups.</p>
<p><a href="http://chaostangent.com/wp-content/uploads/2009/08/debugoutput.png"><img class="alignnone size-medium wp-image-1224" title="debugoutput" src="http://chaostangent.com/wp-content/uploads/2009/08/debugoutput-540x227.png" alt="debugoutput" width="540" height="227" /></a></p>
<p>This is fine for checking a limited number of states or small grids but when those variables climb higher then browser rendering time and page payload size becomes an issue. As mentioned previously, this is the most basic implementation of the algorithm and has not been subject to any optimisation: each cell requires eight lookups and with no heuristic pruning larger grids will take exponentially longer to calculate. There are obvious improvements that could be made as well as storage alterations (<a href="http://en.wikipedia.org/wiki/Quadtree">quad trees</a> et. al.) that could benefit, however until I could measure the algorithm’s performance, this is a decent first attempt. The next step was to output something more useful than HTML tables — enter <a href="http://uk3.php.net/manual/en/ref.image.php">GD</a>. Putting together a very quick function to create a static GIF file:</p>
<pre class="brush: php">private function outputStateToGif(array $state, $generation, $outputDir, $foreground, $background)
{
	$width = count($state[0]); $height = count($state);

	$img = imagecreate($width * 2, $height * 2);
	imagefill($img, 0, 0, imagecolorallocate($img, $background[0], $background[1], $background[2]));

	$onColour = imagecolorallocate($img, $foreground[0], $foreground[1], $foreground[2]);

	for($h = 0; $h &lt; $height; $h++)
	{
		for($w = 0; $w &lt; $width; $w++)
		{
			if($state[$w][$h] == 1)
			{
				imagefilledrectangle($img, $w*2, $h*2, $w*2+1, $h*2+1, $onColour);
			}
		}
	}

	$outputDir = rtrim($outputDir, "/");
	$filename = sprintf("%04d", $generation);

	imagegif($img, "{$outputDir}/{$filename}.gif");
	imagedestroy($img);
}</pre>
<p>With all of this in place it meant that I could now create animated GIF files of the Game of Life without any hassle — combining the GIF files together at the end is a simple case of firing up <a href="http://www.imagemagick.org/script/index.php">ImageMagick</a> with the command:</p>
<pre>convert -delay 10 -loop 0 output/*.gif output/output.gif</pre>
<p>This will merge all of the GIF files together and loop at the end with each frame <a href="http://www.imagemagick.org/script/command-line-options.php#delay">lasting a 10th of a second</a> (default ticks per section is 100). There is the possibility of optimising the animated GIF further with transparency so that each frame takes up the minimum amount of space however with 200 frame animations being scarcely over 100 kilobytes, there doesn’t seem much point especially when taking into account the complexities involved.</p>
<p>With the algorithm in place and output more or less sewn up, it was time to concentrate on the avatar aspect of the project. Ideally I wanted something that was deterministic and identifiable — both of these are currently fulfilled by the automatically generated Gravatar icons which are based off <a href="http://scott.sherrillmix.com/blog/blogger/wp_identicon/">Scott Sherrill-Mix’s WP_Identicon plugin</a>. I would be examining how to  turn an e-mail address into a seed for the algorithm and generating colours to make them unique: trickier said than done with innumerable ways to achieve these goals, but producing <em>good</em> results is tricky.</p>
<p>To generate the seed I converted each character of the e-mail address to a number and then performed some operations on that. Using the built-in <a href="http://uk3.php.net/manual/en/function.ord.php">PHP function ord()</a> was out of the question as this would only deal with ASCII characters (until PHP get their finger out and natively support Unicode) and this would potentially be dealing with Unicode characters; I briefly considered rolling my own function but stumbled upon an <a href="http://hsivonen.iki.fi/php-utf8/">implementation by Henri Sivonen</a> which was based off code from the <a href="http://lxr.mozilla.org/seamonkey/source/intl/uconv/src/nsUTF8ToUnicode.cpp">Mozilla project</a> which would have been scrutinised far more than something I would have cooked up on a sleepy Sunday afternoon. The utf8ToUnicode() function takes a string as a parameter and returns an array of Unicode code point integers: perfect for what I was going to use them for. I spent a great deal of time trying out different operations on those values and eventually settled on one that worked for seeds of dimensions less than 6:</p>
<pre class="brush: php">$pieces = utf8ToUnicode($email);

$previous = 124;
foreach($pieces AS $piece)
{
	$k = intval(sprintf("%01b", (($piece * $previous) &gt;&gt; PHP_INT_SIZE) &amp; 0x1));
	$seed[] = (count($seed) &lt; ($width * $height)) ? $k : (array_shift($seed) &amp; $k);
	$previous = $piece;
}</pre>
<p>After multiplying the value by the previous one (a static initial value is used, I went for 124 as a general ASCII midpoint value), the result is shifted to the right to obtain a single binary value. If the string is longer than the seed (e.g. greater than 25 for a 5x5 grid) then it shifts the first entry from the generated seed and bitwise ANDs it with the value.</p>
<p>This fulfils the deterministic nature of the seed and means the entire string is used rather than stopping when the seed length is reached. If the string is shorter than the seed, it is padded with inactive cells (0’s). This method produces decent results, while they weren’t exactly wildly different from each other (i.e. the chance of seed collisions is fairly high) it successfully went from a string to a seed with acceptable results. I had initially considered cribbing some one-way hash functions from algorithms such as <a href="http://csrc.nist.gov/groups/ST/toolkit/index.html">SHA</a> or <a href="http://labs.calyptix.com/haval.php">HAVAL</a> and flicked through <a href="http://www.schneier.com/book-applied.html">Applied Cryptography</a> to that end, however I eventually decided that this would be overkill for such a simple implementation.</p>
<p>The next step was generating colours. I wanted the domain part of an e-mail address to represent the background colour while the local part would represent the foreground colour; this way it would be obvious who was using <a href="http://mail.google.com">Google Mail</a> or <a href="http://www.hotmail.com">Live Mail</a> etc. but visitors would still be able to remain unique. This generation is definitely an area for improvement as the results will show:</p>
<pre class="brush: php">public static function colours($string)
{
	list($local, $fqdn) = explode("@", $string);
	$multiplier = array_sum(utf8ToUnicode($string));

	$localProduct = array_sum(utf8ToUnicode($local)) * $multiplier;
	$fqdnProduct = array_sum(utf8ToUnicode($fqdn)) * $multiplier;

	$fg = array(($localProduct &gt;&gt; 16) &amp; 0xFF, ($localProduct &gt;&gt; 8) &amp; 0xFF, $localProduct &amp; 0xFF);
	$bg = array(($fqdnProduct &gt;&gt; 16) &amp; 0xFF, ($fqdnProduct &gt;&gt; 8) &amp; 0xFF, $fqdnProduct &amp; 0xFF);

	return array($fg, $bg);
}</pre>
<p>In short this sums the Unicode values of each part of the address and multiplies it by the sum of the entire string, the reason for the multiplication is that the values within a common e-mail address are usually fairly low (ironically within the first 128 ASCII characters) and the resultant number definitely favoured the green / blue end of the spectrum, multiplication introduces a red component in most cases. I toyed with other methods of producing a usable number such as using the product of the values rather than a summation, unfortunately this created a number that was too large to be useful — frequently breaking the maximum integer value limit.</p>
<p>With both the seed and the colours now able to be generated from an e-mail address, it was time to trial it out on a large number of possible addresses.</p>
<p class="thumbnails eight"><img class="alignnone size-full wp-image-1208" title="gameoflife-sample-1" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife-sample-1.gif" alt="gameoflife-sample-1" width="64" height="64" /> <img class="alignnone size-full wp-image-1209" title="gameoflife-sample-2" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife-sample-2.gif" alt="gameoflife-sample-2" width="64" height="64" /> <img class="alignnone size-full wp-image-1210" title="gameoflife-sample-3" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife-sample-3.gif" alt="gameoflife-sample-3" width="64" height="64" /> <img class="alignnone size-full wp-image-1211" title="gameoflife-sample-4" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife-sample-4.gif" alt="gameoflife-sample-4" width="64" height="64" /> <img class="alignnone size-full wp-image-1212" title="gameoflife-sample-5" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife-sample-5.gif" alt="gameoflife-sample-5" width="64" height="64" /> <img class="alignnone size-full wp-image-1213" title="gameoflife-sample-6" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife-sample-6.gif" alt="gameoflife-sample-6" width="64" height="64" /> <img class="alignnone size-full wp-image-1214" title="gameoflife-sample-7" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife-sample-7.gif" alt="gameoflife-sample-7" width="64" height="64" /> <img class="alignnone size-full wp-image-1215" title="gameoflife-sample-8" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife-sample-8.gif" alt="gameoflife-sample-8" width="64" height="64" /> <img class="alignnone size-full wp-image-1216" title="gameoflife-sample-9" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife-sample-9.gif" alt="gameoflife-sample-9" width="64" height="64" /> <img class="alignnone size-full wp-image-1217" title="gameoflife-sample-10" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife-sample-10.gif" alt="gameoflife-sample-10" width="64" height="64" /> <img class="alignnone size-full wp-image-1218" title="gameoflife-sample-11" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife-sample-11.gif" alt="gameoflife-sample-11" width="64" height="64" /> <img class="alignnone size-full wp-image-1219" title="gameoflife-sample-12" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife-sample-12.gif" alt="gameoflife-sample-12" width="64" height="64" /> <img class="alignnone size-full wp-image-1220" title="gameoflife-sample-13" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife-sample-13.gif" alt="gameoflife-sample-13" width="64" height="64" /> <img class="alignnone size-full wp-image-1221" title="gameoflife-sample-14" src="http://chaostangent.com/wp-content/uploads/2009/08/gameoflife-sample-14.gif" alt="gameoflife-sample-14" width="64" height="64" /></p>
<p>This is a random sampling from over 1,300 addresses which I used to get benchmarks and timing data (a topic for another time) — each is a 32x32 grid (double sized to 64x64 for the images) and are limited to 50 generations. The colours are obviously all shifted towards the blue/green area of the spectrum due to the way they are generated, there are no bright reds or certain colour mixtures such as yellows or oranges which makes them very bland, especially when combined with the choice of foreground colours which can easily lack contrast. The next problem is the mixed bag of animations given a 5x5 seed, obviously I wasn’t expecting a raft of infinite growth seeds but, having looked at them in aggregate, the majority tend to dissolve after only a few generations or become static, both of which make for boring avatars.</p>
<h2>Ways forward</h2>
<p>I’m certainly not finished with this idea yet as I think it definitely has legs although the measure will be in the details. The first aspect to address is the visuals, I had expected more growth within 50 generations but a lot of the grids could be reduced down to 16x16 which would allow for visibly larger and subsequently more interesting cell animations — as the environment is toroidal in nature this may turn out to be beneficial to growth. Colours are also high priority, choosing a background colour based upon domain name of the e-mail was perhaps the wrong way to go and settling on a single background colour and a different foreground colour will likely help immeasurably with aesthetics.</p>
<p>I’ve still to collate the timing data I gathered from the run on the e-mail database however my initial examination indicates that the Game of Life algorithm itself likely doesn’t need overhauling to improve speed, proposed optimisations such as <a href="http://tomas.rokicki.com/hlife/">Hashlife</a> improve speed at the cost of memory and with such small grids the effect would likely be negligible. The most time consuming aspect of the script is the generation of the GIF files and subsequent I/O which is done regardless of whether the state has normalised or entirely died out — pruning and state checking would likely be immensely beneficial.</p>
<p><script src="/wp-includes/js/syntaxhighlighter/shCore.js" type="text/javascript"></script><script src="/wp-includes/js/syntaxhighlighter/shBrushPhp.js" type="text/javascript"></script><script type="text/javascript">// <![CDATA[
                SyntaxHighlighter.config.clipboardSwf = "/wp-includes/js/syntaxhighlighter/clipboard.swf"; SyntaxHighlighter.defaults.tabSize = 2; SyntaxHighlighter.defaults.toolbar = false; SyntaxHighlighter.all();
// ]]&gt;</script></p>
<h2>Conclusion</h2>
<p>For a Sunday afternoon the results are not exactly ground breaking but are interesting and act as a good base for further enhancement — the possibilities are immense and it’s always gratifying to see a project come together in a short space of time. That said, I think I’ve seen enough animated GIFs of tiny blinking blocks for one day. <span class="signOff">¶</span></p>
]]></content:encoded>
			<wfw:commentRss>http://chaostangent.com/2009/08/sunday-afternoon-project-conways-game-of-life-in-php/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Evangelion 2.0</title>
		<link>http://chaostangent.com/2009/08/evangelion-2-0/</link>
		<comments>http://chaostangent.com/2009/08/evangelion-2-0/#comments</comments>
		<pubDate>Mon, 10 Aug 2009 10:33:51 +0000</pubDate>
		<dc:creator>chaostangent</dc:creator>
				<category><![CDATA[Anime]]></category>
		<category><![CDATA[Geekery]]></category>
		<category><![CDATA[1.0]]></category>
		<category><![CDATA[asuka]]></category>
		<category><![CDATA[evangelion]]></category>
		<category><![CDATA[evangelion 1.0]]></category>
		<category><![CDATA[evangelion 2.0]]></category>
		<category><![CDATA[film]]></category>
		<category><![CDATA[franchise]]></category>
		<category><![CDATA[gainax]]></category>
		<category><![CDATA[gendou]]></category>
		<category><![CDATA[mari]]></category>
		<category><![CDATA[mecha]]></category>
		<category><![CDATA[misato]]></category>
		<category><![CDATA[movie]]></category>
		<category><![CDATA[review]]></category>
		<category><![CDATA[shinji]]></category>
		<category><![CDATA[you are not alone]]></category>
		<category><![CDATA[you can not advance]]></category>

		<guid isPermaLink="false">http://chaostangent.com/?p=616</guid>
		<description><![CDATA[An emphatic and gushing review about the second in the series of new films for the Evangelion franchise: Evangelion 2.0: You Can (Not) Advance. Written shortly after watching the film, the review stays clear of spoilers and wild speculation.]]></description>
			<content:encoded><![CDATA[<p id="evangelion20"><a href="http://www.macromedia.com/go/getflashplayer">Get Flash</a> to see this player.</p>
<p><script src="http://ajax.googleapis.com/ajax/libs/swfobject/2.2/swfobject.js" type="text/javascript"></script> <script type="text/javascript">// <![CDATA[
         swfobject.embedSWF("/flash/player.swf", "evangelion20", "540", "324", "9.0.0", null, { file: "/wp-content/uploads/2009/08/Evangelion%202.0%20Trailer%20%28From%20Evangelion%201.11%29.mp4", image: "/wp-content/uploads/2009/08/Evangelion%202.0%20Trailer%20%28From%20Evangelion%201.11%29.jpg", backcolor: "EEEEE6", frontcolor: "333333", lightcolor: "D54000", screencolor: "FBFBF9" }, { allowfullscreen: "true", seamlesstabbing: "true" });
// ]]&gt;</script></p>
<p>Evangelion occupies a very special place in my heart: I watched the series on VHS in 1997 when I was fourteen and — without hyperbole — it was a life changing experience. A series that was smart and brutally obtuse and flitted from Jungian psychology to religious dogma was revelatory for me at the time and it questioned a lot of what I had not yet fully formed questions about; so as well as being gob-smackingly awesome, it changed who I was and consequently who I am. For those reasons I am utterly fanatical about the franchise and concept. Like a delusional lover I put up with a lot of the nonsense that GAINAX throws: I own the original series on twelve VHS tapes which I upgraded to DVD when they were first released, then upgraded <em>them </em>to the Platinum DVDs; I did manage to stay away from the deluge of tat that has been continuously released but it’s fair to say my own lot of Evangelion merchandise is not insignificant. In what is a truly savvy move, just as patience for the constant re-releases was beginning to wane news came of the Rebuild project.</p>
<blockquote class="pullout"><p>“If 1.0 was meant to prime the fans for what was to come, 2.0 det­on­ates with mag­num force.”</p>
</blockquote>
<p>Not just a fresh coat of paint but a fresh take — that was the promise at least. It was hard to get excited, especially when there was also word of a live action Hollywood adaptation bolstered by some mightily uninspiring concept art — the only promise being that WETA are on the case. Regardless, the news came out over a year before the first Rebuild release so there was plenty of time to build up my scepticism. The first movie, “Evangelion 1.0: You Are (Not) Alone” released in 2007, deserves a post of its own and the number of pages I’ve devoted to notes and thoughts and theories bordering on absurd. The second movie, “Evangelion 2.0: You Can (Not) Advance” released just this summer is easy to write about because I have not yet had a chance to analyse and unpick it as I am wont to do. For want of a better description, this is a review rather than an unravelling.<br />
<span id="more-616"></span><strong>Now spoiler free!<br />
</strong></p>
<p>What is most striking about the movie is how quickly it progresses: the slow methodical pace of the first is gone with two angels are dispatched in the first fifteen minutes with newcomer Mari introduced and series staple Asuka entering in a typically absurdist fashion. The first forty minutes peg the movie as a set up, a careful manoeuvring of protagonists for the inevitable firestorm of the next three movies. This is thrown dramatically aside as the final hour  starts up and doesn’t stop until the  post-credits preview. Ordinarily when anime series are transformed into films they suffer because the typical three-act movie format is lost in a staccato fire of crisis and resolution common to episode content. The first movie managed the transition excellently by making the fifth angel (fourth in the movie chronology, natch) a far more dramatic set piece — helped along by some stunning scoring by Shiro Sagisu. Evangelion 2.0 ignores the tempo set by its predecessor and instead of slow build and release, it mimics modern movies such as The Dark Knight with an extended roller-coaster of a finale. That this movie cuts through vast swathes of the series’ story in such a stunning way is testament to its quality.</p>
<p><a href="http://chaostangent.com/wp-content/uploads/2009/08/1247352345470.jpg"><img class="alignnone size-medium wp-image-621" title="Plug suit Asuka" src="http://chaostangent.com/wp-content/uploads/2009/08/1247352345470-540x405.jpg" alt="Plug suit Asuka" width="540" height="405" /></a><br />
<cite class="caption">Image originally from <a href="http://blog-soth.blogspot.com/2009/06/evangelion-20-summary.html">Soth’s blog post: Evangelion 2.0 summary</a></cite></p>
<p>In part though, it is still constricted by the original timeline — while the minutiae of certain events are changed, the ethos and chronology remain. Every scene has been stripped back to its most bare essentials which keeps the wicked pace up but of course loses some of the precipitous depth the series was noted for. For instance: the rivalry between Rei and Asuka is accentuated and condensed, culminating in a iconic scene within an elevator; the series let this silent tension drag out for minutes while 2.0 has no such luxury. This isn’t a desire for more budget-saving static shots, but more a desire for the characters to be able to breathe — Evangelion has a sizeable cast with a lot of history and the danger with both the pace and the movie format is that they’re not allowed to expand and instead just become catalysts. Shinji has already grown into his role as has Misato and to a lesser extent Rei, but the deluge of Asuka, Mari, Kaji and Kaworu threatens to drown existing characters. The converse of this is how much texture is now given over to the world, especially Tokyo-3 which, now given the budgetary freedom, feels like a real city bustling with people and vehicles; elements which were shown without context before such as the NERV pyramid are made more tangible and given edges rather than remaining painterly backdrops.</p>
<p>Visually one cannot find fault, the cryptic opening with Mari is swift and brutal, set pieces like the climactic Geofront battle are sweeping and epic while incidental details like Misato’s hand movements are given love and attention — it all adds up to a spectacular looking piece of media and the inevitable high-definition release is all the more enticing. Of course, GAINAX being GAINAX means a propensity for jiggling breasts and casual nudity even during otherwise deathly serious moments,  time is given over to the curves of the female cast. Although curves have certainly not graced Asuka who initially resembles a bobble-head figure with what first looks like a slightly oversized head but is in fact a severely fleshless body: stick thin arms and legs and a torso that seems too petite to ensconce vital organs yet just enough to keep her lamentably sizeable chest attached. The score, while still universally excellent, has some questionable points usage. The use of classical music continues from the series with some evocative re-recordings as well as brand new movements to accompany equally stirring scenes — unfortunate then that the decision was made to use a floaty, child-like song for two of the most traumatic scenes. The juxtaposition is as jarring as was intended, but when “Angel of Doom” rises in the climax of 1.0, it is disappointing to be treated twice to something that is by comparison sub-standard, especially when the barn-storming “The Final Decision We All Must Take” promised so much on the soundtrack release.</p>
<p>If 1.0 was meant to prime the fans for what was to come, 2.0 detonates with magnum force. Given my history with the series, I cannot be objective about the movie, I cannot envisage what people who have only seen these movies view Evangelion as; I carry over twelve years of baggage with me and from that view, “Evangelion 2.0: You Can (Not) Advance” is an absolute stonker of a film. It walks that line between pandering and reimagining with aplomb, gripping beyond its run time there are few words I can find to express my abject joy when watching this. The myriad possibilities of every deviation from the canon, every glimpse of worlds not seen is enough to occupy me for weeks. This is my new obsession. <span class="signOff">¶</span></p>
]]></content:encoded>
			<wfw:commentRss>http://chaostangent.com/2009/08/evangelion-2-0/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Building the carousel</title>
		<link>http://chaostangent.com/2009/08/building-the-carousel/</link>
		<comments>http://chaostangent.com/2009/08/building-the-carousel/#comments</comments>
		<pubDate>Mon, 03 Aug 2009 21:30:57 +0000</pubDate>
		<dc:creator>chaostangent</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Geekery]]></category>
		<category><![CDATA[carousel]]></category>
		<category><![CDATA[component]]></category>
		<category><![CDATA[delicious]]></category>
		<category><![CDATA[feed]]></category>
		<category><![CDATA[framework]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[javascript]]></category>
		<category><![CDATA[last.fm]]></category>
		<category><![CDATA[lastfm]]></category>
		<category><![CDATA[library]]></category>
		<category><![CDATA[lovefilm]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[prototype]]></category>
		<category><![CDATA[rest]]></category>
		<category><![CDATA[rss]]></category>
		<category><![CDATA[scriptaculous]]></category>
		<category><![CDATA[xmlrpc]]></category>
		<category><![CDATA[youtube]]></category>
		<category><![CDATA[zend]]></category>
		<category><![CDATA[zend framework]]></category>

		<guid isPermaLink="false">http://chaostangent.com/?p=603</guid>
		<description><![CDATA[
The newest addition to chaostangent.com is the carousel nestling comfortably at the foot of every page. Sporting a variety of “social media” feeds as well as other morsels, it showcases a number of interesting technologies and techniques including: a fully looping carousel (JavaScript and CSS), integration with numerous external APIs (PHP, Zend Framework), screen-scraping and [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://chaostangent.com/wp-content/uploads/2009/08/img-carousel.jpg"><img class="alignnone size-full wp-image-607" title="chaostangent.com footer carousel" src="http://chaostangent.com/wp-content/uploads/2009/08/img-carousel.jpg" alt="chaostangent.com footer carousel" width="540" height="151" /></a></p>
<p>The newest addition to chaostangent.com is the carousel <a href="#carousel">nestling comfortably at the foot of every page</a>. Sporting a variety of “social media” feeds as well as other morsels, it showcases a number of interesting technologies and techniques including: a fully looping carousel (JavaScript and CSS), integration with numerous external APIs (<a href="http://www.php.net">PHP</a>, <a href="http://framework.zend.com">Zend Framework</a>), screen-scraping and local caching of results to name but a few. It successfully fulfils the primary goal I had for it: cramming as much functionality into a contained a space as reasonably possible.</p>
<blockquote class="pullout"><p>“I can just boot up Zend_Service_Delicious and be done with it right? If only things were that simple.”</p>
</blockquote>
<h2>JavaScript</h2>
<p>The <a href="http://www.smileycat.com/design_elements/carousels/">carousel interface</a> is design du-jour at the moment — sported by sites such as <a href="http://www.apple.com/uk/mac/">Apple</a>, <a href="http://www.bbc.co.uk/iplayer">BBC iPlayer</a> and <a href="http://www.gametrailers.com">Gametrailers</a> — they manage selective display of information while still providing a high degree of interactivity. In short: they’re swish and solve the problem of too much to feature in too little space. The carousel library I am using is a simplified, stripped-down version of one I developed for a large work project — for this reason I’m unable to release it under any kind of license. The original has a number of features that I wouldn’t be using including automated construction of a “jump to” control and being able to navigate over a number of entries at once. My library is the only one I know of which successfully loops, providing an “infinite” carousel of sorts; <a href="http://www.prototype-ui.com/">other publicly available libraries</a> cease at either end of the carousel which in some situations is more intuitive but the challenge of making one <em>not </em>do this was posed to me, and I couldn’t very well pass it up.<br />
<span id="more-603"></span><br />
The library works as such:</p>
<ul>
<li>If moving right on the carousel (usually seen as “advancing” or moving forwards in countries with left-to-right written language) the first element within the carousel has its margin tweened from zero to negative its width e.g. if an element is 100 pixels wide the margin would tween from 0 to –100 pixels. This tween uses the <a href="http://script.aculo.us">Scriptaculous effects library</a> and “pulls” the item out of sight.</li>
<li>Once this is complete the now hidden element is removed  from the start of the carousel items and appended  to the end.</li>
<li>With this done the now appended element’s margin is set back to zero.</li>
</ul>
<ul>
<li>If moving left on the carousel (usually seen as “receding” or moving backwards) the last element of the carousel is modified with a negative margin equal to its width.</li>
<li>That element is then appended to the beginning of the carousel items.</li>
<li>Once done, the negative margin is then tweened to zero, effectively “pushing” the carousel item into view.</li>
</ul>
<p>Both effects can be done in a few lines of JavaScript and the act of moving an element from one end of the carousel to another can be done in a single line, e.g. for the “move right” action:</p>
<pre class="brush: javascript">this.elem.insert(this.elem.childElements().first().remove().setStyle({"marginLeft": 0}));</pre>
<p>The majority of the class (thank you <a href="http://www.prototypejs.org/api/class">Prototype</a>) concerns itself with ensuring that there are no glitches during the transition: it locks itself to prevent repeated clicks which could cause the carousel items to resequence themselves and the visual smoothness to be lost. The transition itself is non-standard and was originally pioneered by <a href="http://www.robertpenner.com/">Robert Penner for Flash</a> and is commonly termed the “Easing equations” — these provide smooth, life like movement for animations. The equation used for this is the “EaseFromTo” equation adapted by <a href="http://kendsnyder.com/sandbox/easing/">Ken Snyder for Scriptaculous</a>.</p>
<h2>Markup and styling</h2>
<p>The carousel would not operate as it does without some fairly dense and brutal markup and styling. It is comprised of an outside container which acts as a window onto the contents of the unordered list below it: each item within the list is treated as a single carousel item. The unordered list must be in one line otherwise there is a jarring “drop in” effect for items; for this reason the container has its overflow set to hidden, the unordered list has its white-space set to “nowrap” and the list items are set to display as inline-blocks. That’s the crux of the styling, obviously with IE6 and 7 not supporting inline-block displays, this won’t work but thankfully they abuse the “inline” property enough for them to play nicely:</p>
<pre class="brush: css">#external.scripted { overflow: hidden; }
#external.scripted &gt; ul { white-space: nowrap; overflow: hidden; }
#external &gt; ul &gt; li { display: inline-block; width: 277px; }

/* In style-ie6.css */
#external ul li { display: inline; }

/* In style-ie7.css */
#external &gt; ul &gt; li { display: inline; }</pre>
<p>The benefit to this markup and styling is that there are numerous possibilities for accessibility improvements when JavaScript is disabled: if the information is unimportant (e.g. not the primary navigation on the page which is a bad idea anyway) then the default visible carousel items can be left as-is; for a quick and dirty accessible solution, removing the hidden overflow and white-space declarations from the carousel means all of the items will follow the normal page flow — this however can cause a conspicuous “snap-in” effect when the script is loaded. The most effective method however is to change the overflow-x property to “auto” to allow for horizontal scrolling — a native and lo-fi way of providing access to the full carousel; this has its own foibles though including the necessity of setting a width on the default, no-script carousel, this is offset by loading of the scripted controls being more subtle than the other methods. I opted not to implement this as the contents are superfluous enough to be ignored, especially so close to the foot of the page.</p>
<p>Probably the most taxing part of the markup was working out the width a margins for each carousel item so that it aligned with the columns / grid I had set up for the site — this involved a bit of number crunching and a selection of scribbled diagrams.</p>
<h2>PHP</h2>
<p>Before chaostangent.com was a blog it was a splash page that served much the same purpose as the carousel: it aggregated a lot of social media into a small space (before it did that it was a simple splash page, and before that it was a blog again but I digress). For this reason a lot of the work involving PHP had already been done — or so I initially thought. There a subtle tribulations associated with the back end code that didn’t become apparent until I began implementing it.</p>
<p>The Zend Framework supplies a lot of <a href="http://framework.zend.com/manual/en/zend.service.html">Zend_Service_*</a> classes which — in theory — take a lot of the effort out of interacting with external APIs. I can for instance just boot up <a href="http://framework.zend.com/manual/en/zend.service.delicious.html">Zend_Service_Delicious</a> and be done with it right? If only things were that simple. I needed to cache the results of the queries — using <a href="http://framework.zend.com/manual/en/zend.cache.html">Zend_Cache</a>, natch — so that the queries wouldn’t be done on a page request and bog down the server and the target API — however passing some Zend_Service_* objects into Zend_Cache proved problematic as they weren’t set up for serialization and thus not correctly stored. For this reason I use the <a href="http://framework.zend.com/manual/en/zend.feed.html">Zend_Feed</a> component to read in my <a href="http://feeds.delicious.com/v2/rss/chaostangent?count=15">delicious RSS feed</a> which provides all that I need and was able to be successfully serialized and cached; Zend_Feed was likewise used for the two <a href="http://www.irisassociates.com">other</a> <a href="http://japanographia.com">blogs</a> I post to. The YouTube listing of my favourite videos is <a href="http://gdata.youtube.com/feeds/base/users/ChaosTangent/favorites?client=ytapi-youtube-user&amp;v=2">available as an RSS feed</a>, however the feed doesn’t contain interesting video information such as rating, length and so forth, so using the <a href="http://framework.zend.com/manual/en/zend.gdata.html">Zend_Gdata</a> component seemed like a sure fit. Unfortunately querying something relatively simple like a favourite video feed for a user causes <strong>a lot</strong> of data to be generated, so much so that even with only four results the cache file for YouTube comes out at over half a megabyte and obviously spikes the memory usage when loading this in. There is also the bizarre deficiency that while the RSS feed contained the date and time of when I had made a video a favourite, the Zend_Gdata classes did not. So I either compromised on my desires for the YouTube element of the carousel, rolled my own storage for the feed or just suck it up. I opted for the latter and vowed to optimise it for memory usage at a later time.</p>
<h3>Last.fm</h3>
<p><a href="http://ws.audioscrobbler.com/1.0/user/chaostangent/recenttracks.rss">My last.fm feed</a> was to be one of the highlights of the carousel so I put the most effort into it. Lamentably, the Zend_Service component was for the ageing <a href="http://framework.zend.com/manual/en/zend.service.audioscrobbler.html">AudioScrobbler service</a> which while  functionality compatible with the <a href="http://www.last.fm/api">updated last.fm API</a> but omitted several necessary features. This included the guarantee that results for my most recent tracks would include images and links. I pontificated writing a whole new Zend_Service component but put that on the back burner and dove into using the API directly. To cut a long story short I shied away from using <a href="http://framework.zend.com/manual/en/zend.rest.html">Zend_Rest</a> as PHP’s function naming <a href="http://uk.php.net/manual/en/functions.user-defined.php">didn’t allow</a> for periods within function names which more or less forced me to use the XML-RPC component: <a href="http://framework.zend.com/manual/en/zend.xmlrpc.html">Zend_XmlRpc</a>. Working around its refusal to return objects — instead defaulting to quote-escaped XML strings which I then converted to <a href="http://uk.php.net/manual/en/book.simplexml.php">SimpleXML</a> elements — I conditionally stacked up a <a href="http://www.last.fm/api/show?service=278">number</a> <a href="http://www.last.fm/api/show?service=407">of</a> <a href="http://www.last.fm/api/show?service=290">queries</a> to ensure that the information I needed was, within reasonable parameters, always available. A glitch did occur with using SimpleXML and the <a href="http://uk.php.net/manual/en/function.simplexml-element-xpath.php">xpath function</a> whereby it tended to cache the xpath result if it was used on a separate child node from the same parent. For example: the main XML RPC call to get my recent tracks</p>
<pre class="brush: php">$result = $xmlrpc-&gt;call("user.getRecentTracks", array($p));
$sx = new SimpleXMLElement(stripslashes($result));
foreach($sx-&gt;recenttracks-&gt;track AS $track)</pre>
<p>Now if I iterate over each track and do</p>
<pre class="brush: php">$image = (string)reset($track-&gt;xpath("//image[@size='small']"));</pre>
<p>$image should contain the individual image for each track. Not so, $image will always retain the <em>first</em> image that xpath() returned as the results are silently cached by SimpleXML / PHP. To get around this you have to splinter the SimpleXMLElement from its parent with an otherwise redundant</p>
<pre class="brush: php">$track = new SimpleXMLElement($track-&gt;asXML());</pre>
<p>A simple fix but a pain to debug.</p>
<h3>LOVEFiLM</h3>
<p>The last element I wanted to include on my carousel was my <a href="http://www.lovefilm.com/">LOVEFiLM</a> (similar to <a href="http://www.netflix.com/">Netflix</a> in the US) recent DVD rental list. LOVEFiLM is <a href="http://lovefilmaffiliates.blogspot.com/">apparently trialling</a> a <a href="http://twitter.com/LOVEFiLMAPI">fully fledged API</a> developed by in conjunction with an external company; however I didn’t need a full API, only a list of my most recently rented and rated. <a href="http://www.lovefilm.com/account/previously_rated.html">The page</a> for this was uniform enough so I set about trying to find a way to pull in that page and working on it from there. The most likely candidate was a cookie left by the website called “lovefilm_session” which has a lifetime of two weeks and the value looks like a standard MD5 hash. PHP <a href="http://uk.php.net/manual/en/book.session.php">doesn’t place any IP restrictions on sessions</a> so I was hoping that the <a href="http://www.lovefilm.com/corporate/jobs_info.html?editorial_id=4963">Perl</a> implementation the site was using was similar and simply sending the cookie along with a request for the page. This worked a treat in local testing and thankfully carried across to production without incident. The next task was getting the desired information out — being an in-production site meant that simply loading the HTML (despite being under the XHTML 1.0 transitional DTD) into an XML interpreter threw up numerous errors and was unworkable which meant regular expressions were my only recourse.</p>
<p>Using a suitably generic regex to get the title, rating and the URL for the DVD in question required some fine tuning — it still needs to match on certain patterns which makes it inherently fragile. Hopefully by the time LOVEFiLM update their site their API will have been released for consumption.<br />
<script src="/wp-includes/js/syntaxhighlighter/shCore.js" type="text/javascript"></script> <script src="/wp-includes/js/syntaxhighlighter/shBrushPhp.js" type="text/javascript"></script><br />
<script src="/wp-includes/js/syntaxhighlighter/shBrushJScript.js" type="text/javascript"></script><script src="/wp-includes/js/syntaxhighlighter/shBrushCss.js" type="text/javascript"></script> <script type="text/javascript">// <![CDATA[
             SyntaxHighlighter.config.clipboardSwf = "/wp-includes/js/syntaxhighlighter/clipboard.swf"; SyntaxHighlighter.defaults.tabSize = 2; SyntaxHighlighter.defaults.toolbar = false; SyntaxHighlighter.all();
// ]]&gt;</script></p>
<h2>Conclusion</h2>
<p>Overall the carousel took a full weekend of planning, design, build and testing as well as some last minute tweaks such as digging into the <a href="http://codex.wordpress.org/Function_Reference/get_the_time">WordPress date functions</a> for <a href="http://codex.wordpress.org/Function_Reference/date_i18n">internationalising dates and times</a> to complete but the result is better than I could have hoped for. There are always more social media sites out there so the possibilities for additions and enhancements are great — I can only see the carousel getting better as time goes on. <span class="signOff">¶</span></p>
]]></content:encoded>
			<wfw:commentRss>http://chaostangent.com/2009/08/building-the-carousel/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Tidbits from gallery.chaostangent.com</title>
		<link>http://chaostangent.com/2009/08/tidbits-from-gallery-chaostangent-com/</link>
		<comments>http://chaostangent.com/2009/08/tidbits-from-gallery-chaostangent-com/#comments</comments>
		<pubDate>Sat, 01 Aug 2009 14:38:14 +0000</pubDate>
		<dc:creator>chaostangent</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Geekery]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[firebug]]></category>
		<category><![CDATA[framework]]></category>
		<category><![CDATA[functions]]></category>
		<category><![CDATA[gallery.chaostangent.com]]></category>
		<category><![CDATA[gd]]></category>
		<category><![CDATA[image]]></category>
		<category><![CDATA[imagemagick]]></category>
		<category><![CDATA[javascript]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[resizing]]></category>
		<category><![CDATA[sql]]></category>
		<category><![CDATA[swfupload]]></category>
		<category><![CDATA[zend framework]]></category>

		<guid isPermaLink="false">http://chaostangent.com/?p=594</guid>
		<description><![CDATA[Exploring some parts of the code behind gallery.chaostangent.com including database functions for dealing with modified preorder tree traversal models, image resizing in PHP and JavaScript numeracy functions.]]></description>
			<content:encoded><![CDATA[<p>These are some of the neater parts of <a href="http://gallery.chaostangent.com">gallery.chaostangent.com</a> that don’t warrant a full exploration on their own but serve the goal of making the application more streamlined. I’ve crafted these examples to be focused so they don’t contain superfluous details like error checking, timestamp columns and the like.</p>
<h2>Database</h2>
<p>The gallery schema is as follows:</p>
<pre class="brush: sql">CREATE TABLE IF NOT EXISTS `galleries` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `left` int(10) unsigned NOT NULL default '0',
  `right` int(10) unsigned NOT NULL default '0',
  `parent` int(10) unsigned NOT NULL default '0',
  `title` tinytext NOT NULL,
  `directory` tinytext NOT NULL,
  PRIMARY KEY  (`id`),
  KEY `parent` (`parent`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 ;</pre>
<p>This covers both the <a href="http://www.sitepoint.com/article/hierarchical-data-database/2/">Modified Preorder Tree Traversal</a> (‘left‘ and ‘right‘ columns) model as well as the more standard hierarchical model (‘parent‘ column). I’m still undecided as to whether indexing the ‘left‘ and ‘right‘ columns provides any benefits. Most of the queries on the gallery table involve getting the direct children of a particular node; the breadcrumb trail at the top of the page however is built using the ‘left‘ and ‘right‘ columns:</p>
<pre class="brush: sql">SELECT * FROM `galleries` WHERE (`left` &gt;= ?) AND (`right` &lt;= ?) ORDER BY `left`</pre>
<p>Doing a multi-column index in MySQL <a href="http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html">works from the left column onwards</a>, so for the above query, indexing on ‘left‘ and ‘right‘ would be a benefit. However when inserting and deleting nodes, queries are done singularly e.g. one for ‘left‘ and one for ‘right‘ which having an index on one and not the other may turn out to be detrimental in terms of update times. I could always do two indexes:</p>
<pre class="brush: sql">ALTER TABLE `galleries` ADD INDEX ( `left` , `right` ) ;
ALTER TABLE `galleries` ADD INDEX ( `right` , `left` ) ;</pre>
<p>This runs the risk though of having a table that’s more index than data. I haven’t done a full benchmark of the different queries for each scenario but I would imagine only for large trees would indexing provide any tangible benefit.<br />
<span id="more-594"></span><br />
The two database functions which do a lot of the heavy lifting for the gallery table are insertion and deletion:</p>
<pre class="brush: sql">CREATE FUNCTION `addGallery`(_parent INT, _title TEXT, _directory TEXT) RETURNS int(11)
BEGIN
  SELECT `left`, `right` INTO @pleft, @pright FROM `galleries` WHERE id = _parent LIMIT 1;
  UPDATE `galleries` SET `right` = `right` + 2 WHERE `right` &gt; (@pright - 1);
  UPDATE `galleries` SET `left` = `left` + 2 WHERE `left` &gt; (@pright - 1);
  INSERT INTO `galleries` (`left`, `right`, parent, title, directory) VALUES (@pright, (@pright + 1), _parent, _title, _directory);
  RETURN LAST_INSERT_ID();
END

CREATE FUNCTION `deleteGallery`(_id INT) RETURNS int(11)
BEGIN
  SELECT `left`, `right` INTO @left, @right FROM `galleries` WHERE id = _id;
  DELETE FROM `images` WHERE gallery_id = _id;
  DELETE FROM `galleries` WHERE id = _id;
  SELECT ROW_COUNT() INTO @ret;
  UPDATE `galleries` SET `right` = (`right` - 2) WHERE `right` &gt; @right;
  UPDATE `galleries` SET `left` = (`left` - 2) WHERE `left` &gt; @left;
  RETURN @ret;
END</pre>
<p>I’ve yet to find a good way of labelling an SQL function’s parameters as they usually have an identical name to columns I’m using within queries. The add function can insert a node anywhere, whereas the delete function can only consistently delete leaf nodes. One of the major drawbacks to MPTT model is that moving existing nodes about the tree or deleting subtrees is tricky as it involves either retaining a lot of data or re-keying the entire table after the operation, neither of which are ideal. There’s nothing complex going on in the functions, once you get your head around how MPTT works these should become self-explanatory. The added benefit of wrapping these operations up are that they’re treated as transactional which saves an extra two queries (“START TRANSACTION” and “COMMIT / ROLLBACK”) if these were being done in code.</p>
<h2>Images</h2>
<p>One of the many great things about the <a href="http://framework.zend.com">Zend Framework</a> is that the developers have managed to <a href="http://framework.zend.com/manual/en/zend.form.standardElements.html#zend.form.standardElements.file">streamline file uploading for forms</a> which means there’s no more explicit checking of error conditions, temporary files and whatnot — getting an image from the user into the application is now relatively painless. On a <a href="http://framework.zend.com/manual/en/zend.db.table.row.html#zend.db.table.row.extending.insert-update">pre-insert hook</a> for an Image model I do some sanity checks (directories writeable etc.) then do a simple <a href="http://uk3.php.net/manual/en/function.getimagesize.php">getimagesize()</a> and <a href="http://uk3.php.net/manual/en/function.filesize.php">filesize()</a> to grab the file’s important measurements. Once the database row has been inserted, I have a post-insert hook that generates all the different versions — the reason this is post rather than pre is that versions are named according to the ID of the image in the database, the uploaded image meanwhile retains its original filename wherever possible.</p>
<p>The image function I always seem to use is “fit to area”: you have a dimension that you’d like an image to fit within and retain its original proportions:</p>
<pre class="brush: php">private function fitToArea($width, $height, $target)
{
	$newWidth = $newHeight = $target;
	if($width &gt; $height)
	{
		$newHeight = round(($height / $width) * $target);
	}
	elseif($height &gt; $width)
	{
		$newWidth = round(($width / $height) * $target);
	}

	return array(
		0 =&gt; $newWidth, "width" =&gt; $newWidth,
		1 =&gt; $newHeight, "height" =&gt; $newHeight
	);
}</pre>
<p>Variations of this can be made for performing “fit to width” and “fit to height” or what gallery.chaostangent.com used to do which was absolutely square thumbnails. The above could be boiled down into a couple of <a href="http://uk3.php.net/manual/en/language.operators.comparison.php">ternary operations</a> but I like to keep it expanded and easy to follow.</p>
<p>The physical resizing of the image is done one of two ways: <a href="http://www.imagemagick.org/">ImageMagick</a> or <a href="http://uk3.php.net/manual/en/ref.image.php">GD</a>. The former is the preferred method but the latter is more widely supported and is cross platform due to its nature as a PHP module rather than an external executable (N.B. I’m aware there exists an <a href="http://uk3.php.net/manual/en/book.imagick.php">ImageMagick module for PHP</a> but have not used it as in theory it has all the problems of GD in terms of memory usage and time-outs so when I use “ImageMagick” here, I’m referring to the command line executable). The ImageMagick command which does the work is:</p>
<pre>convert "filename"[0] -strip -resize widthxheight -sharpen 0x1.0 -quality quality -colorspace RGB "target"</pre>
<p>There are a few non-standard parts in there:</p>
<ul>
<li><a href="http://www.imagemagick.org/script/command-line-options.php#strip">–strip</a> gets rid of superfluous information (EXIF, comments, colour profiles etc.)</li>
<li><a href="http://www.imagemagick.org/script/command-line-options.php#colorspace">–colorspace</a> forces the resultant image into RGB which is supported across all browsers, JPGs can also be CMYK which is a bit iffy with browser support</li>
<li><a href="http://www.imagemagick.org/script/command-line-options.php#sharpen">–sharpen</a> image convolution which sharpens the image, sharpening should <em>always</em> be done on any image reduction</li>
<li>[0] this selects the first frame of what could be a multi-frame image (animated GIF or PNG), some versions of ImageMagick will force animated file type creation, overriding whatever you may have in the target if this isn’t present</li>
</ul>
<p>Doing this with GD takes a lot more code which means a lot more chances for errors to crop up. The first thing you have to do is get the image type so you can load it into GD’s proprietary format, you can get this with getimagesize() then using one of the <a href="http://uk3.php.net/manual/en/function.imagecreatefromjpeg.php">imagecreatefromx()</a> functions. Once you have that, you have to check if the image is true colour — 8bit PNGs and GIFs use palettes which make resizing/resampling ugly:</p>
<pre class="brush: php">if(!imageistruecolor($res))
{
	$tc = imagecreatetruecolor($imageInfo[0], $imageInfo[1]);
	imagecopy($tc, $res, 0, 0, 0, 0, $imageInfo[0], $imageInfo[1]);
	imagedestroy($res);

	$res = $tc;
	$tc = null;
}</pre>
<p>I believe this was originally taken from a PHP.net comment so kudos to the original author. Once you’re sure you have a true colour image:</p>
<pre class="brush: php">$tRes = imagecreatetruecolor($width, $height);
imagecopyresampled($tRes, $res, 0, 0, 0, 0,
	$width, $height, $imageInfo[0], $imageInfo[1]);</pre>
<p>This copies and resamples the image to the desired size. $width and $height are the target sizes while $imageInfo contains the original image dimensions. At this point you can output to a JPG and be done with it, however I believe in the benefits of sharpening which lamentably GD does not have an in-built function for. In comes <a href="http://loriweb.pair.com/8udf-sharpen.html">image convolutions</a>:</p>
<pre class="brush: php">imageconvolution($tRes, array(
	array(-1,-1,-1), array(-1,20,-1), array(-1,-1,-1)
), 12, 0);</pre>
<p>The array notion is a little annoying but essentially this applies a matrix, divisor and offset to the image (every pixel for every channel) which accents the edges making the image appear crisper. Depending on the types of images you’re dealing with will define which central number and divisor you use but I recommend playing with the values to get the best result. Convolving an image is not a cheap operation and for large images this can be lengthy and computationally intensive; there is also the problem that the <a href="http://uk3.php.net/manual/en/function.imageconvolution.php">imageconvolution()</a> function didn’t exist prior to PHP 5.1 so if you’re using an earlier version (and not matter what people say, PHP 4 is <em>still</em> in use) then you’re out of luck unless you want to do the convolution by hand using <a href="http://uk3.php.net/manual/en/function.imagecolorat.php">imagecolorat()</a>.</p>
<h2>JavaScript</h2>
<p>Apart from the image addition page, there is only a smattering of JavaScript throughout the site to enhance certain aspects. The possibility exists for me to do AJAX calls for galleries so that a user never has to reload the page however the payload for an AJAX request isn’t going to be much more than for a full page request — if the design was more complex then there would be an argument for it however as it is, there isn’t the justification for either loading the HTML directly or loading XML or JSON and transforming that on the page. I did end up adding a small bit of JavaScript to vertically centre the images within a gallery as CSS doesn’t do this reliably:</p>
<pre class="brush: javascript">$$("#gallery li a img").each(function(s) {
	s.setStyle({
		// quick and dirty vertical centering
		marginTop: Math.round((175 - s.getHeight()) / 2)+"px"
	});
});</pre>
<p>This uses <a href="http://www.prototypejs.org/api/utility/dollar-dollar">Prototype’s selection function</a>, ordinarily by the point I reach this function I’ve already assigned #gallery to a variable which means I can do a scoped selection (e.g. <a href="http://www.prototypejs.org/api/element/select">variable.select(“li a img”)</a> rather than using $$()). I hard code the value just for expediency, you could just as easily find out the height of the containing li element using s.up(“li”).getHeight() however for large pages of images this could be slow as you’re then doing an extra DOM call per image.</p>
<p>As I <a href="http://chaostangent.com/2009/07/rebuilding-gallery-chaostangent-com/">mentioned before</a> <a href="http://swfupload.org/">SWFUpload</a> requires a lot of JavaScript upfront to make it play nice — I usually create an object with all the <a href="http://demo.swfupload.org/Documentation/#events">SWFUpload function hooks</a> and then just fill them in as and when I require them. This means I can have a skeleton object which I can drag and drop into any project where I’m using SWFUpload. I find it useful to set the <a href="http://demo.swfupload.org/Documentation/#debug">debug function</a> to output to the <a href="http://getfirebug.com/console.html">Firebug console (console.log)</a> and to turn on debugging so I know what’s going on. Thankfully the library comes with several helpers which cover just about everything you could want to do with it: speed, cookie and queue integrate well and do what you would expect. One of the most helpful functions I wrote concerned converting from bytes into a more sensible denomination (kilobytes, megabytes, gigabytes) dependent on the value provided:</p>
<pre class="brush: javascript">var fileSize = function(sizeInBytes)
{
	if(sizeInBytes &gt; 1073741824)
	{
		return Math.round((sizeInBytes * 100) / 1073741824) / 100 + " GB";
	}
	else if(sizeInBytes &gt; 1048576)
	{
		return Math.round((sizeInBytes * 100) / 1048576) / 100 + " MB";
	}
	else if(sizeInBytes &gt; 1024)
	{
		return Math.round(sizeInBytes / 1024) + " KB";
	}
	return sizeInBytes + " B";
};</pre>
<p>It takes account of JavaScript’s lack of a fully featured round() function and multiplies and divides accordingly. This works in numerous places such as totally up the selected file sizes and the current speed of the upload.<br />
<script src="/wp-includes/js/syntaxhighlighter/shCore.js" type="text/javascript"></script> <script src="/wp-includes/js/syntaxhighlighter/shBrushPhp.js" type="text/javascript"></script><br />
<script src="/wp-includes/js/syntaxhighlighter/shBrushSql.js" type="text/javascript"></script> <script src="/wp-includes/js/syntaxhighlighter/shBrushJScript.js" type="text/javascript"></script><br />
<script type="text/javascript">// <![CDATA[
         SyntaxHighlighter.config.clipboardSwf = "/wp-includes/js/syntaxhighlighter/clipboard.swf"; SyntaxHighlighter.defaults.tabSize = 2; SyntaxHighlighter.defaults.toolbar = false; SyntaxHighlighter.all();
// ]]&gt;</script></p>
<h2>Conclusion</h2>
<p>There are a raft of other parts to gallery.chaostangent.com which merit exploring but are more intrinsically tied to the context of the site rather than the above which are useful in isolation. <span class="signOff">¶</span></p>
]]></content:encoded>
			<wfw:commentRss>http://chaostangent.com/2009/08/tidbits-from-gallery-chaostangent-com/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Calculating the geodesic distance between two points</title>
		<link>http://chaostangent.com/2009/07/calculating-the-geodesic-distance-between-two-points/</link>
		<comments>http://chaostangent.com/2009/07/calculating-the-geodesic-distance-between-two-points/#comments</comments>
		<pubDate>Mon, 27 Jul 2009 22:11:24 +0000</pubDate>
		<dc:creator>chaostangent</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Geekery]]></category>
		<category><![CDATA[airy]]></category>
		<category><![CDATA[antipodal]]></category>
		<category><![CDATA[calculation]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[distance]]></category>
		<category><![CDATA[geodesic]]></category>
		<category><![CDATA[haversine]]></category>
		<category><![CDATA[latitude]]></category>
		<category><![CDATA[longitude]]></category>
		<category><![CDATA[maths]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[script]]></category>
		<category><![CDATA[suppliers]]></category>
		<category><![CDATA[vincenty]]></category>
		<category><![CDATA[wgs84]]></category>

		<guid isPermaLink="false">http://chaostangent.com/?p=569</guid>
		<description><![CDATA[Tasked with setting up a supplier search, I opted to use the most complex and precise formula available for calculating the distance between two points. I then implemented into a database function for easy computational overhead.]]></description>
			<content:encoded><![CDATA[<p>I was recently tasked with recreating an existing supplier search for a client; I was provided with a database of suppliers, most of which had been geocoded, and not much else. This scenario is fairly standard when dealing with <a href="http://www.littlechef.co.uk/findalittlechef.php">mapping applications</a>: a user enters in a postcode and the system will return a list of the closest suppliers to that location. The postcode part of this equation is well travelled — the <a href="http://www.postoffice.co.uk">Post Office</a> in the UK will not relinquish the mapping from a postcode to a latitude, longitude tuple without a <a href="http://www.royalmail.com/portal/rm/jump2?mediaId=400085&amp;catId=400084">large outlay of cash</a> (and numerous non-disclosure agreements), the easiest option is to use an external service for this. I opted for <a href="http://www.postcodeanywhere.co.uk">PostcodeAnywhere</a> as I had used them before with great success. The latter part of this challenge — the return of the closest database entries — was something that I wanted to try myself as I didn’t known when I would get such an opportunity again.</p>
<blockquote class="pullout"><p>“if something is worth doing, then it’s worth overdoing”</p>
</blockquote>
<p>To say there are many different ways of calculating the distance between two points would be an understatement. One which I had used before involved northing and easting co-ordinates from a known point within the UK (usually the centroid or London). Using this meant a smattering of trigonometry would be enough to return a decent list of matches; this always struck me as crude, despite it’s usefulness, using an antiquated and subjective co-ordinate system seemed the wrong way to approach the problem. Latitude and longitude are globally recognised and provide a precise way of defining points on the globe — reading up on how they are calculated was the step one. Step two was finding an algorithm that calculated the distance between two arbitrary points. The first one I found was the <a href="http://en.wikipedia.org/wiki/Haversine_formula">Haversine</a> <a href="http://www.movable-type.co.uk/scripts/latlong.html">formula:</a> simple, easy to follow and easy to implement. Knowing that this formula was based upon the assumption that the Earth was perfectly spherical grated slightly with me — I reasoned there must be a more accurate algorithm. I found this precision in <a href="http://en.wikipedia.org/wiki/Vincenty%27s_formulae">Vinencty’s</a> <a href="http://www.movable-type.co.uk/scripts/latlong-vincenty.html">algorithm</a>, it was then I decided to enact a contrived but deliciously fun maxim: <em>if something is worth doing, then it’s worth overdoing</em>.<br />
<span id="more-569"></span><br />
Vincenty’s formula is accurate to within half a millimetre if given the correct variables for your location — as the earth is an ellipsoid, and a non-uniform one at that (pesky hills), your location will determine just how accurate the formula is. Most systems use the generic and perfectly suitable <a href="http://en.wikipedia.org/wiki/WGS_84"><abbr title="World Geodetic System">WGS</abbr>–84</a> variables which are usually accurate to within 2 metres — this is the system <a href="http://en.wikipedia.org/wiki/Global_Positioning_System">GPS</a> uses. As all the suppliers I would be searching for and all the postcodes would be within the UK, I could use the more precise Airy (1830) set — likely named after <a href="http://en.wikipedia.org/wiki/George_Biddell_Airy">George Biddell Airy</a> for his work on planetary densities. The maths involved in the formula is dense to say the least and I would be lying if I said I grokked it but the implementation is straightforward.</p>
<p>I had originally envisaged doing some sort of segmenting of the suppliers prior to working out distances in a script; as my brain mulled over the possibilities (closest postcode, caching of major city locations, co-ordinate system conversions) I realised that setting it up as a database function would solve all of these problems. Firing up <a href="http://www.phpmyadmin.net/">phpMyAdmin</a> I bashed out an attempt and after a couple of fixes (mostly syntactic foibles of <a href="http://dev.mysql.com/doc/refman/5.0/en/create-procedure.html">MySQL</a>) it was working a treat:</p>
<pre><code>DELIMITER //
DROP FUNCTION IF EXISTS distanceVincenty//
CREATE FUNCTION distanceVincenty(lat1 FLOAT, lon1 FLOAT, lat2 FLOAT, lon2 FLOAT) RETURNS INT
BEGIN

DECLARE a, b, f, L, U1, U2, sinU1, cosU1, sinU2, cosU2 DOUBLE;
DECLARE lambda, lambdaP, sinLambda, cosLambda DOUBLE;
DECLARE sinSigma, cosSigma, sigma, sinAlpha, cosSqAlpha, cos2SigmaM, C DOUBLE;
DECLARE iterLimit INT;
DECLARE uSq, A1, B1, deltaSigma, s DOUBLE;

SET a = 6377563.396, b = 6356256.909, f = (1 / 299.3249646);
SET L = RADIANS(lon2 - lon1);
SET U1 = ATAN((1 - f) * TAN(RADIANS(lat1)));
SET U2 = ATAN((1 - f) * TAN(RADIANS(lat2)));
SET sinU1 = SIN(U1), cosU1 = COS(U1);
SET sinU2 = SIN(U2), cosU2 = COS(U2);

SET lambda = L, lambdaP = 0, iterLimit = 100;
mainLoop: REPEAT
	SET sinLambda = SIN(lambda), cosLambda = COS(lambda);
	SET sinSigma = SQRT((cosU2 * sinLambda) * (cosU2 * sinLambda) + (cosU1 * sinU2 - sinU1 * cosU2 * cosLambda) * (cosU1 * sinU2 - sinU1 * cosU2 * cosLambda));
	SET cosSigma = sinU1 * sinU2 + cosU1 * cosU2 * cosLambda;
	IF sinSigma = 0 THEN RETURN 0.0; END IF;

	SET sigma = ATAN2(sinSigma, cosSigma);
	SET sinAlpha = cosU1 * cosU2 * sinLambda / sinSigma;
	SET cosSqAlpha = 1 - sinAlpha * sinAlpha;
	SET cos2SigmaM = cosSigma - 2 * sinU1 * sinU2 / cosSqAlpha;
	IF cos2SigmaM IS NULL THEN SET cos2SigmaM = 0; END IF;

	SET C = f / 16 * cosSqAlpha * (4 + f * (4 - 3 * cosSqAlpha));
	SET lambdaP = lambda;
	SET lambda = L + (1 - C) * f * sinAlpha * (sigma + C * sinSigma * (cos2SigmaM + C * cosSigma * (-1 + 2 * cos2SigmaM * cos2SigmaM)));

	SET iterLimit = iterLimit - 1;
UNTIL ((ABS(lambda - lambdaP) &gt; 1E-12) AND (iterLimit &gt; 0))
END REPEAT mainLoop;

SET uSq = cosSqAlpha * (a * a - b * b) / (b * b);
SET A1 = 1 + uSq / 16384 * (4096 + uSq * (-768 + uSq * (320 - 175 * uSq)));
SET B1 = uSq / 1024 * (256 + uSq * (-128 + uSq * (74 - 47 * uSq)));
SET deltaSigma = B1 * sinSigma * (cos2SigmaM + B1 / 4 * (cosSigma * (-1 + 2 * cos2SigmaM * cos2SigmaM) - B1 / 6 * cos2SigmaM * (-3 + 4 * sinSigma * sinSigma) * (-3 + 4 * cos2SigmaM * cos2SigmaM)));
SET s = b * A1 * (sigma - deltaSigma);

RETURN ROUND(s);
END;
//</code></pre>
<p>All of the trigonometry functions are built into MySQL, even helpful ones like ATAN2 and its ilk. The function returns the distance between the points in millimetres which can then be easily transformed into your chosen unit of choice.</p>
<p>As you can no doubt guess from the code above, this isn’t exactly computationally cheap. Potentially for nearly antipodal points the main loop will repeat up to a iterLimit times (100 above) before continuing. As well as this, depending on the construction of your database and SQL statement, you could end up doing this calculation for every record in your table, e.g.:</p>
<pre><code>SELECT id, title, address, distanceVincenty(latitude, longitude, @lat, @lon) AS distance FROM `suppliers` ORDER BY distance DESC</code></pre>
<p>will force MySQL to calculate the distance for every row and then order accordingly. I’ve yet to do any benchmarks as it would be lunacy to put into production, however queries which ordinarily took a few hundred milliseconds started taking up to two seconds to complete. Thankfully the system I was working on solved this problem itself by having each supplier categorised and rated which meant searches rarely returned more than ten to twenty results before distance calculation; the possibility still exists of doing segmentation based on addresses prior to the calculation but that will be dependant on your specific requirements.</p>
<p>So while I doubt any users of the new supplier search realise it, their results are accurate to within a few millimetres, potentially saving them microns of shoe leather in walking or giving them true results for the case when rival suppliers are mere millimetres apart. For anyone who isn’t interested in meridians and arc tangents then the Vincenty function is most certainly overkill and sticking with Haversine is likely the smarter move, but for those valuing absolute precision, this certainly provides a great little exercise. <span class="signOff">¶</span></p>
]]></content:encoded>
			<wfw:commentRss>http://chaostangent.com/2009/07/calculating-the-geodesic-distance-between-two-points/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Rebuilding gallery.chaostangent.com</title>
		<link>http://chaostangent.com/2009/07/rebuilding-gallery-chaostangent-com/</link>
		<comments>http://chaostangent.com/2009/07/rebuilding-gallery-chaostangent-com/#comments</comments>
		<pubDate>Sun, 26 Jul 2009 18:09:31 +0000</pubDate>
		<dc:creator>chaostangent</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Geekery]]></category>
		<category><![CDATA[cake]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[framework]]></category>
		<category><![CDATA[galleries]]></category>
		<category><![CDATA[gd]]></category>
		<category><![CDATA[imagemagick]]></category>
		<category><![CDATA[images]]></category>
		<category><![CDATA[modified preorder tree traversal]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[php5]]></category>
		<category><![CDATA[prado]]></category>
		<category><![CDATA[symfony]]></category>
		<category><![CDATA[thumbnails]]></category>
		<category><![CDATA[zend framework]]></category>

		<guid isPermaLink="false">http://chaostangent.com/?p=477</guid>
		<description><![CDATA[All about the recent major update to the application powering gallery.chaostangent.com. A brief history of the application and its purpose as well as some implementation details and an in-depth look at the updates and the reasons behind them.]]></description>
			<content:encoded><![CDATA[<p><a href="http://chaostangent.com/wp-content/uploads/2009/07/img-gallerychaostangentcom.jpg"><img class="alignnone size-medium wp-image-484" title="gallery.chaostangent.com front page" src="http://chaostangent.com/wp-content/uploads/2009/07/img-gallerychaostangentcom-540x408.jpg" alt="gallery.chaostangent.com front page" width="540" height="408" /></a><br />
<a href="http://gallery.chaostangent.com">gallery.chaostangent.com</a> is an application for storing and organising images – ostensibly a very simple desire but one I found not catered for by <a href="http://gallery.menalto.com/">existing</a> <a href="http://coppermine-gallery.net/">web</a> <a href="http://wordpress.org/extend/plugins/nextgen-gallery/">applications</a> when it was first conceived in 2005. The concept was an application that was simple and easy to use while still allowing for a degree of organisation to ensure images weren’t stored in a single “pool”.</p>
<blockquote class="pullout"><p>“With a small, well-defined feature set it seemed like a good time to address some of the issues which had crept in”</p>
</blockquote>
<h2>Background</h2>
<p>When I first started developing the application, PHP 5 hadn’t been released for very long and was <a href="http://gophp5.org/node/7">receiving a mixed reception</a>. Regardless, I started developing using a custom built framework I had cobbled together from scratch – one that would eventually go on to be refined and used in some of my work projects. With the lack of other mature frameworks to compare with, it was rough round the edges and did little more than segment out code into the <a href="http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller">MVC pattern</a> and even then it wasn’t an entirely clean encapsulation; it was however useful.<br />
<span id="more-477"></span><br />
The first version had the following functionality:</p>
<ul>
<li> Uploading of images using a form</li>
<li> Hierarchical galleries in a folder like tree structure – mirrored in the file structure on the server</li>
<li> Hidden galleries for concealing galleries from the frontend</li>
<li> User management with access control lists</li>
<li> Batch image upload from a folder on the server</li>
</ul>
<p>The batch image upload was added when it became obvious that using a web form to add multiple images was a tedious and protracted affair. The batch upload allowed a user to transfer files onto the server using whatever method they desire (e.g. FTP) and then specify that folder for trawling. This made adding hundreds of images a breeze, despite being less than optimal or straightforward.</p>
<h2>Technical</h2>
<p>The application used the “<a href="http://www.sitepoint.com/article/hierarchical-data-database/2/">Modified Pre-Order Tree Traversal</a>” mechanism for storing hierarchical data. This provided its own set of problems; for instance: to get the first level descendants of a using only <abbr title="Modified Pre-order Tree Traversal">MPTT</abbr> you have to hit the entire gallery sub-tree, so for the root node this was the entire tree. Using a hybrid approach solves this problem by storing the gallery parent not just the left and right values; this makes descendant selection trivial.</p>
<p>Another technical hurdle was the thumbnail creation. The most commonly used image library within PHP is <a href="http://uk3.php.net/manual/en/ref.image.php">GD</a> which is really only suitable for smaller images. As it operates within PHP’s memory, larger images cause more memory to be used until the server imposed limit (<a href="http://uk3.php.net/manual/en/ini.core.php#ini.memory-limit">memory_limit in php.ini</a>) is reached. This causes odd states whereby an image has been uploaded but no way of telling whether the script would time-out or hit the memory limit prior to beginning processing.</p>
<p>The solution to this dilemma was to switch to using the command line tool <a href="http://www.imagemagick.org/">ImageMagick</a>. As an external executable, PHP’s memory limit is no longer an issue; however ImageMagick comes with its own foibles. This involved lesser travelled areas of image manipulation such as <a href="http://www.imagemagick.org/script/command-line-options.php#colorspace">colourspaces</a> and multi-frame images (animated GIFs et. al.).</p>
<h2>Problems</h2>
<p>For a number of years the application worked admirably, only a small selection of niggles remained:</p>
<ul>
<li>Batch image upload process</li>
<li> Odd hidden galleries logic</li>
<li> Tedious bulk image deletion</li>
<li> Square thumbnails</li>
<li> Changing the name of a gallery changed the filesystem folder name which altered the URLs for the images</li>
</ul>
<p>Since its inception, a selection of PHP frameworks had been released and matured such as <a href="http://www.pradosoft.com/">Prado</a>, <a href="http://cakephp.org/">CakePHP</a> and <a href="http://www.symfony-project.org/">Symfony</a> and the behemoth of <a href="http://rubyonrails.org/">Ruby on Rails</a> was dominating development at the time. With a small, well-defined feature set it seemed like a good time to address some of the issues which had crept in.</p>
<h2>False starts</h2>
<p>Despite improvements to the bespoke framework I used like request routing, several attempts at improving the application met with the problem that rebuilding it didn’t fundamentally improve it and the updated framework didn’t make coding any quicker or simpler, just different. These conclusions made me down tools and re-evaluate the rebuild.</p>
<h2>Version 2.0</h2>
<p>Over a year later I began work again, this time using the oft mentioned and newly released <a href="http://framework.zend.com">Zend Framework</a>. I quickly surpassed the functionality milestone I had reached with older versions – mainly due to my focus on individual functionality. I had set up a <a href="http://subversion.tigris.org/">Subversion</a> repository and disciplined myself into making smaller, more frequent commits rather than monolithic end-of-the-day updates. After some iterative improvements the application retained the core functionality of version 1.0 and most importantly, built upon it:</p>
<ul>
<li>Thumbnails are now proportional – aesthetics are easy to change, thumbnails aren’t</li>
<li> Any number of different image versions can be generated e.g. not just thumbnails; parameters are flexible and easy to extend</li>
<li> Removed hidden galleries – despite improving the logic, their usefulness was always in question</li>
<li> Users are no longer subject to an <abbr title="Access Control List">ACL</abbr> – designed for small, trustworthy communities rather than sprawling user bases</li>
<li> Galleries now separate out their display and filesystem names – each can be changed independently of the other</li>
<li> Image uploads are now done using <a href="http://swfupload.org/">SWFUpload</a> if available</li>
<li> Some actions use <abbr title="Asynchronous JavaScript and XML">AJAX</abbr> if available to enhance usability</li>
<li> Galleries can be output in <abbr title="eXtensible Markup Language">XML</abbr> and <abbr title="JavaScript Object Notation">JSON</abbr> for external processing</li>
</ul>
<p>With such an array of improvements it wasn’t long before the build was tested and put into place. Despite being incompatible with the previous version, this turned out to be a good time to clean out the cruft and start afresh.</p>
<h2>Technical</h2>
<p>The first 2.0 build mirrored the code layout of the 1.0 build, so image sizing was done within the controller which made them needlessly large and unwieldy. The second 2.0 build sectioned image sizing out into the models, triggered on database hooks (pre-insert, post-insert etc.). This was initially tricky as the tightly coupled functionality seemed an odd fit until it was massaged into being more generic and loosely coupled.</p>
<p>Image versions are now calculated on-the-fly rather than being stored in the database. Some image versions may or may not exist depending on the settings a user has entered and the image in question – e.g. a thumbnail is set to always be generated, while a blog sized image may only be present if the image is large enough. This granularity of control allows for a variety of usage scenarios.</p>
<p>Using SWFUpload was an early decision: moving away from the server based folder upload was a high priority. Implementing SWFUpload is tricky as it requires a lot of JavaScript upfront to work well; the Flash cookie bug also cropped up, polluting an otherwise pristine authentication system.</p>
<p>The database structure was simplified even further than version 1.0. Images no longer stored the filename of the thumbnail (a result of allowing multiple image versions) or the filesize of the image, dimensions are retained as these would be too costly to calculate on the fly. I originally planned to use triggers for some of the more complex database actions like insertion and deletion within the gallery tree however this isn’t possible as triggers can’t modify the table they’re fired on – this is a general database principal rather than the standard bloody-mindedness usually exhibited by MySQL. I ended up using database functions which reduced the error checking required in the PHP code.</p>
<h2>Future</h2>
<p>I am aiming to get a release candidate completed soon. I initially stayed away from a public release due to the usage of a custom framework that was used heavily in commercial projects, and the gnawing doubt that it wasn’t yet good enough for exposure to my peers. With these barriers now removed, it remains for me to thoroughly test and document the system and package it; aspects which I take for granted (having access to the database, knowledge of where the ImageMagick executable is etc.) will need to be addressed prior to release. There are other tertiary concerns as well such as a bug tracker, support forums and so forth.</p>
<p>I am still aiming to further refine the existing functionality – primarily the production of image versions and SWFUpload integration. With the ability to format galleries in XML and JSON, it opens up a number of possibilities, including integration with Wordpress’s media navigator or sidebar widgets. <span class="signOff">¶</span></p>
]]></content:encoded>
			<wfw:commentRss>http://chaostangent.com/2009/07/rebuilding-gallery-chaostangent-com/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Replacing a server – part 2: the plan</title>
		<link>http://chaostangent.com/2009/07/replacing-a-server-%e2%80%93-part-2-the-plan/</link>
		<comments>http://chaostangent.com/2009/07/replacing-a-server-%e2%80%93-part-2-the-plan/#comments</comments>
		<pubDate>Thu, 23 Jul 2009 16:28:31 +0000</pubDate>
		<dc:creator>chaostangent</dc:creator>
				<category><![CDATA[Geekery]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[debian]]></category>
		<category><![CDATA[dedicated]]></category>
		<category><![CDATA[deployment]]></category>
		<category><![CDATA[email]]></category>
		<category><![CDATA[exim]]></category>
		<category><![CDATA[fedora]]></category>
		<category><![CDATA[ftp]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[plan]]></category>
		<category><![CDATA[postfix]]></category>
		<category><![CDATA[server]]></category>
		<category><![CDATA[ubuntu]]></category>
		<category><![CDATA[wishlist]]></category>

		<guid isPermaLink="false">http://chaostangent.com/?p=469</guid>
		<description><![CDATA[The second in a multi-part series on lessons learned and plans made (not necessarily in that order) for replacing an existing dedicated server with a newer model. Covers choosing what to change in software and hardware and forming a deployment plan.]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.flickr.com/photos/timdorr/2248248866/"><img class="alignnone size-medium wp-image-471" title="The whole orange by Tim Dorr" src="http://chaostangent.com/wp-content/uploads/2009/07/2248248866_caa4b7b42b_b-540x368.jpg" alt="The whole orange by Tim Dorr" width="540" height="368" /></a><br />
<cite class="caption"><a href="http://www.flickr.com/photos/timdorr/2248248866/">The whole orange by Tim Dorr</a> used under Creative Commons Attribution-Non-Commercial-Share Alike license</cite></p>
<p>The <a href="http://chaostangent.com/2009/07/replacing-a-server-%e2%80%93-part-1-the-audit/">first part of this series</a> was a heavy on lists and common sense and light on the details. Cacti tend to be more interesting than audits, despite their importance, and the amount of work being put into such a security hazard can seem ill spent when all you want to do is get down and start fiddling. This is what the plan is all about. It marries riotous list-making with tinkering joy.</p>
<blockquote class="pullout"><p>“what may seem expensive now may be cheap in comparison to possible hair-pulling later”</p>
</blockquote>
<h2>Toys!</h2>
<p>The first thing I did when the prospect of a new server arose, before looking at prices or stats, was make a wishlist of everything that I wanted. Despite working without incident for so long, there are places and processes where certain aspects could be smoother – this is the case with any computer and having the time to figure out improvements is a rare joy; opposed something breaking and a near-as-dammit replacement is swiftly procured.</p>
<p>The wishlist was split into areas which are a pain to currently work with and areas where it would be good to try something different. The latter is obviously the more contentious – why change if it works – however I always like to try something new for every project, how else can I learn?<br />
<span id="more-469"></span><br />
The most pressing area for me to address was the way Apache handles domains. I had originally stuck with each virtual host defined in one file, however for sites with many aliased domains this proved unwieldy so I switched to using <a href="http://httpd.apache.org/docs/2.2/vhosts/mass.html">Mass Virtual Hosting</a> that Apache provides. I set up a “domains” folder and populated it with soft links to the appropriate web directories. This worked and meant Apache restarts were reduced; however the unintended consequence of this was that <a href="http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html">mod_rewrite</a> no longer worked unless a <a href="http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewritebase">RewriteBase</a> statement was included. A small but vital change that causes some hassle when moving from a development to a production environment.</p>
<p>Also high on my list was to change the way permissions and PHP are run. The existing server uses a familiar but difficult to work with setup, whereby all servable files are owned by a single UID and GID (“ftp” for both in my case) and Apache is run similarly (“apache”). If PHP wishes to write to a directory it needs to be world writable. While not exactly a security nightmare, it still feels slightly uncivilised. There are numerous ways around this and thankfully <a href="http://blog.stuartherbert.com/php/category/the-web-platform/">Stuart Herbert summarises them all</a> with his own conclusions and (slightly superfluous) benchmarks. <a href="http://mpm-itk.sesse.net/">mpm-itk</a> certainly seems like the way forward given its active development and production server usage.</p>
<p>This leads to shell accounts; whereas in my existing set up I used the bare minimum of this lead to other problems down the line involving e-mail and FTP. There seems to be increasing pressure to abandon FTP as a protocol given its <a href="http://pintday.org/whitepapers/ftp-review.shtml">inherent weaknesses in security</a> and optimisation; with shell accounts for all sites I can abandon plain FTP and stick to <a href="http://en.wikipedia.org/wiki/SSH_file_transfer_protocol">SFTP</a> (<strong>not</strong> FTP over SSL) and feel like I’m being progressive.</p>
<p>E-mail is always a pain to set up and manage, but <a href="http://www.postfix.org/">postfix</a> made it almost bearable when coupled with the <a href="http://oreilly.com/catalog/9780596002121/">superb O’Reilly book</a> on it. I’ve heard nothing but good things about <a href="http://www.exim.org/">Exim</a> and with a <a href="http://oreilly.com/catalog/9780596000981/">similar book</a> to go along with it, I’m certain the end result will be utterly indiscernible from a postfix installation.</p>
<p>My existing server came pre-installed with <a href="http://fedoraproject.org/">Fedora</a>, unfortunately it is notoriously hard to update to major versions in-place which meant rolling my own versions of key services such as Apache (2.2.6), PHP (5.2.6) and MySQL (5.0.45). I would still be doing that – nothing beats an up-to-date, custom compilation – but the new server came with Ubuntu which meant going back to the dreamy apt-get. Having been “brought up” on the Debian way of thinking, Fedora seems to organise itself a little peculiarly.</p>
<p>Hard drive set up would be largely the same – two hard drives running in RAID-1 (striping) mode. The logic of this is questionable as it only provides for the scenario when one drive fails, this is unlikely if both hard drives are from the same manufacturing batch. Word to the wise: when buying multiple same-company hard drives, order from different places or check the serial numbers aren’t too close otherwise all your drives will fail around the same time. Thankfully the new server would be using 15,000 <abbr title="Revolutions per minute">RPM</abbr> <abbr title="Serial Attached SCSI">SAS</abbr> drives rather than standard 7,200 RPM <abbr title="Serial ATA">SATA</abbr>–I so at least if they fail there is more of a chance of being wildly catastrophic.</p>
<p>With a plan being only as good as its implementation how does one prepare beyond a simple list?</p>
<h2>Virtualisation</h2>
<p>Virtual machines and virtualisation have come a long way, with computing power ever increasing and the capabilities of multi-core processors, running a virtual machine has become a day-to-day occurrence for some. Fundamentally what this also allows you to do is try out some of your more fringe ideas in a safe environment before you run off and try implementing them on a computer several hundred miles from you.</p>
<p>Grabbing a free virtual machine like <a href="http://www.virtualbox.org/">Sun’s VirtualBox</a> means that for open source solutions like Fedora and <a href="http://www.ubuntu.com/">Ubuntu</a>, the cost to tinker and refine is measured in time rather than money. For <a href="http://www.microsoft.com/windowsserver2008/en/us/default.aspx">other platforms</a> there is likely a cost to pay for the OS and related software, however what may seem expensive now may be cheap in comparison to possible hair-pulling later and there is always the possibility of license transferral.</p>
<h2>Deployment</h2>
<p>There comes a time when all the planning in the world doesn’t compare to cutting your teeth on the target system. This is where the deployment plan comes in: that space of time when the urge to dive straight in is greatest and possibly most costly. Lamentably deployment plans are a uniquely personal affair and are tailored around your existing set up and choices made earlier. Whether you’re implementing something intricately complex or just throwing up the usual suspects, even just knowing which order to take them in can help smooth out what is going to be a busy and fraught process.</p>
<p>My own experience is as in-depth as earlier steps: controlling which services would go up first, what metrics I use to consider a service “ready”, which order the sites were going to be transferred, estimated dates when these would happen and so on. My service queue was: security (firewall, SSHblack etc.), MySQL (can be run without dependencies and can optionally drive other services), e-mail (get the worst out of the way early), Apache, PHP, tuning, benchamarking. My site queue was staged and involved a small, wholly controlled (DNS, database, no external services etc.) site first, then larger sites in batches over a weekend, then the largest sites last once everything had been run in for a few weeks.</p>
<p>Regardless of your plan, you are likely going to be running two servers in parallel; and given the nature of DNS, you are likely going to be in a position where databases and logs do not match up. Again how you deal with these is up to you, if you can write them off as “just part of the process” then all the better; if you already have a database delta script handy, superb – these are all considerations based on your circumstances. Perhaps this is a non-issue given your centralised NFS storage and redundant <a href="http://www.mysql.com/products/database/cluster/">MySQL cluster</a>.</p>
<h2>Conclusion</h2>
<p>As much as I advocate meticulous planning, there are some people who will still dive straight in and come out without any trench stories to share. For me, planning helps to reassure myself that I have everything sorted, if not in my head then on paper, and also those around me that I may be mad, but at least there’s a method. The core message is: <strong>do what makes you feel comfortable and reassures everyone involved</strong>. <span class="signOff">¶</span></p>
]]></content:encoded>
			<wfw:commentRss>http://chaostangent.com/2009/07/replacing-a-server-%e2%80%93-part-2-the-plan/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Replacing a server – part 1: the audit</title>
		<link>http://chaostangent.com/2009/07/replacing-a-server-%e2%80%93-part-1-the-audit/</link>
		<comments>http://chaostangent.com/2009/07/replacing-a-server-%e2%80%93-part-1-the-audit/#comments</comments>
		<pubDate>Sun, 19 Jul 2009 13:57:52 +0000</pubDate>
		<dc:creator>chaostangent</dc:creator>
				<category><![CDATA[Geekery]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[dedicated server]]></category>
		<category><![CDATA[dns]]></category>
		<category><![CDATA[domains]]></category>
		<category><![CDATA[email]]></category>
		<category><![CDATA[fedora]]></category>
		<category><![CDATA[firewall]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[server]]></category>
		<category><![CDATA[switch]]></category>
		<category><![CDATA[unix]]></category>
		<category><![CDATA[upgrade]]></category>

		<guid isPermaLink="false">http://chaostangent.com/?p=461</guid>
		<description><![CDATA[The first in a multi-part series on lessons learned and plans made (not necessarily in that order) for replacing an existing dedicated server with a newer model. Covers auditing the existing server and reasons for doing so.]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.flickr.com/photos/jaxmac/193001857/"><img class="alignnone size-medium wp-image-462" title="Windows Servers - Data Center" src="http://chaostangent.com/wp-content/uploads/2009/07/193001857_fe1716b0ef_o-540x405.jpg" alt="Windows Servers - Data Center" width="540" height="405" /></a><br />
<cite class="caption"><a href="http://www.flickr.com/photos/jaxmac/193001857/">Windows Servers — Data Center</a> by jaxmac used under Creative Comments Attribution-Non-Commercial-No Derivative Works license</cite></p>
<p>For a small digital agency, running an off-site server is as important as it is unglamorous. You don’t get any of the desirable super-tech of running a cluster but all of the headaches of running a constantly used, high-availability external computer. My workplace’s existing dedicated server (which I championed, configured and maintain) is used to provide web hosting to a variety of clients – both large and small – and for the past three and half years has provided a flawless service. Upgrading is not to be taken lightly and the reasons for doing so must always result in a better service to clients – whether that’s decreased work load for you or improved site responsiveness. For me it boiled down to entropy: three and a half years is a long time for hardware to run and it will eventually fail and make my day/week/month hell on toast.</p>
<p>First step on this crazy adventure: audit.</p>
<h2>Cleaning house</h2>
<blockquote class="pullout"><p>“the more you know and the better prepared you are, the easier it’s going to be”</p>
</blockquote>
<p>Audit is a filthy word round most parts and conjures up images of bespectacled pencil-pushers or greasy tax collectors. Despite this, documenting what you have is the first step to getting something better. When I begun this process however I found that, like a house, over time a server accumulates clutter: old domains, long since defunct sites, errant processes; automation only goes so far before a cleaner has to step in.</p>
<p>Spending a day archiving and removing cruft is tantamount to dusting the shelves and throwing away old books and furniture before moving house – it reduces the effort required later in the process. My removals included:</p>
<ul>
<li>Domain name end-points – for ones which had either expired or the persons / companies had moved on</li>
<li>E-mail accounts – accounts for expired domains are useless, just as accounts for long since lapsed campaigns are</li>
<li>Test folders – a separate test environment means accumulation of in-progress sites was inevitable. I found a year without modification is a good metric for when to cull</li>
<li>Errant services – automated / scheduled processes such as a log-parsers; awstats was set to run on Apache’s log files – no longer necessary when every site we host uses Google Analytics</li>
<li>Old databases – very few of these but the odd one sometimes slips through</li>
</ul>
<p>After archiving, it was time for the document itself.<br />
<span id="more-461"></span></p>
<h2>In-depth</h2>
<p>Putting as much information into the audit as possible is immeasurably important for a server move. Anything and everything to do with the hardware, software, services, clients, accounts, databases, configuration files, contacts to name but a few. Doing this creates a single tome to refer to rather than having to scrabble around for details if, in the worst case, things go awry (as oft do).</p>
<p>A non-exhaustive list of areas I documented included:</p>
<ul>
<li>Hardware – what is the server running on, processor speed, RAM size (latency, bad sectors), hard drive set up (RAID, size, throughput)</li>
<li>Operating system – flavour, version, kernel version, partitions (including flags), upgrade system, custom repositories</li>
<li>Core services – what the server does and doesn’t offer</li>
<li>Key applications – what is running all the time, version numbers, patch levels and dependencies</li>
<li>Scheduled tasks – what runs when and its purpose, could be as simple as logwatch, or perhaps as specialised as a point-in-time backup script</li>
<li>Domain end-points – even if you don’t run your own DNS server, your web server needs to know what points where</li>
<li>Domain names – nameservers, which company manages them, who to speak with to get domains repointed / altered</li>
<li>Clients – who has a stake with what on the server, includes contact numbers for content providers, technical liaison, who manages the domain, who pays the bills. List associated domain names, e-mail addresses etc. alongside</li>
<li>Databases – not just MySQL/Postgres, include details of any that the mail server may build, any embedded ones (SQLite), password caches</li>
<li>E-mail addresses – service type (POP3, IMAP, SSL), quotas, usernames, passwords, retention policies</li>
<li>Config files – any modified or customised configuration should be included in full (probably in an appendix) with the location of the file and related service</li>
<li>Firewall rules – a simple iptables output may be all that’s necessary</li>
<li>Open ports – anything that (x)inetd may respond to, non-standard port definitions, port knocking</li>
<li>Shell accounts – usernames, home directories (both location and contents)</li>
</ul>
<h2>Security</h2>
<p>With the document created, there now exists an all-access pass to your server and will be coveted by anyone with nefarious purposes in mind. How you secure this document is entirely up to you: whether you print out one copy and encrypt the digital version, set up a trusted third party to hold on to it or go for public-private key encryption and hand it out to trusted members of your company.</p>
<p>Also be sure to note that the document is not just beneficial for you, but also for anyone who may have to pick up where you left off. If things go awry mid-move, say a sudden onset of kidnapping befalls you, colleagues and contractors will benefit enormously from the document. Who knows, they may even chip in for your ransom. Locking the document away where only you can access it may not be the best move.</p>
<p>In theory the utility of the document is time limited as you’re aiming to get your new server ready soon which would make the existing one obsolete; the key message really is: <strong>don’t underestimate how sensitive a server audit document actually is</strong>.</p>
<h2>Conclusion</h2>
<p>Creating a server audit document serves two primary purposes:</p>
<ol>
<li>A document that wholly encapsulates your server providing a single reference point for any query you or anyone may have</li>
<li>It gives you a better understanding of the server as a whole</li>
</ol>
<p>This contributes towards the maxim that I find so important: <strong>the more you know and the better prepared you are, the easier it’s going to be</strong>. <span class="signOff">¶</span></p>
]]></content:encoded>
			<wfw:commentRss>http://chaostangent.com/2009/07/replacing-a-server-%e2%80%93-part-1-the-audit/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cloud computing</title>
		<link>http://chaostangent.com/2008/09/cloud-computing/</link>
		<comments>http://chaostangent.com/2008/09/cloud-computing/#comments</comments>
		<pubDate>Mon, 01 Sep 2008 18:29:00 +0000</pubDate>
		<dc:creator>chaostangent</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Geekery]]></category>
		<category><![CDATA[bandwidth]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[cloud computing]]></category>
		<category><![CDATA[infrastructure]]></category>
		<category><![CDATA[internet]]></category>
		<category><![CDATA[reliability]]></category>
		<category><![CDATA[server]]></category>
		<category><![CDATA[service]]></category>
		<category><![CDATA[storage]]></category>

		<guid isPermaLink="false">http://chaostangent.com/?p=3</guid>
		<description><![CDATA[
King Cloud by akakumo used under Creative Commons Attribution-Share Alike
The term “cloud computing” is being bandied about more and more recently, sometimes termed “x as a service”, its proponents make it out to be the embodiment of an ideology whereby one doesn’t worry about the details and simply wants to get things done. From my [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.flickr.com/photos/kky/704056791/"><img class="alignnone size-medium wp-image-413" title="King Cloud by akakumo" src="http://chaostangent.com/wp-content/uploads/2008/09/704056791_8f5db72f63_o-540x405.jpg" alt="King Cloud by akakumo" width="540" height="405" /></a><br />
<cite class="caption"><a href="http://www.flickr.com/photos/kky/704056791/">King Cloud by akakumo</a> used under Creative Commons Attribution-Share Alike</cite></p>
<p>The term “cloud computing” is being bandied about more and more recently, sometimes termed “x as a service”, its proponents make it out to be the embodiment of an ideology whereby one doesn’t worry about the details and simply wants to get things done. From my perspective as a developer, the most interesting parts of the <a href="http://en.wikipedia.org/wiki/Cloud_computing">CC paradigm</a> revolve around infrastructure, service and storage but unlike a great many others, I’m unwilling to jump head-first into using CC implementations.</p>
<p>Growing up for me has always been about trying to get the most amount of bandwidth realistically available to me, often times verbally fighting for it, be it with my sister or the IT providers at my university. Coming from that background I have a healthy respect for how precious people make bandwidth out to be and the detrimental effects not having enough of it can cause. In this light, you can understand why I’m wary of cloud computing. Internet access is still not as ubiquitous as many people, most densely-packed city dwellers, make it out to be. The application end of the CC scale I’m always going to meet with scepticism, my documents are stored on my hard drive which is eminently more tangible than an increasingly ephemeral idea of connectivity.</p>
<blockquote class="pullout"><p>“five nines uptime isn’t what these services are pushing as their tagline”</p>
</blockquote>
<p>Other uses of CC though include offering a service beneficial to developers and producers alike, and this for me is where the allure begins. Not having to worry about storage requirements or dedicated server space for a project is an enticing prospect, cutting out a swathe of niggles and possible overheads, breaking it down to what many feel is the future: it just works. Being able to simply sign up and start pulling and pushing data through a well defined API, to a service rather than a dirty filesystem has an elegance to it. Or perhaps the idea that servers are no longer tied to a physical machine, instances just minutes away from being summoned to life as quickly as they can be brought down.<br />
<span id="more-3"></span><br />
<a href="http://www.flickr.com/photos/wtlphotos/494749811/"></a><a href="http://www.flickr.com/photos/wtlphotos/494749811/"><img class="alignnone size-medium wp-image-414" title="Cotton Ball Clouds / Nuvole a Batuffolo" src="http://chaostangent.com/wp-content/uploads/2008/09/494749811_7f338637c0_b-540x329.jpg" alt="Cotton Ball Clouds / Nuvole a Batuffolo" width="540" height="329" /></a><br />
<cite class="caption"><a href="http://www.flickr.com/photos/wtlphotos/494749811/">Cotton Ball Clouds / Nuvole a Batuffolo by WTL photos</a> used under Creative Commons Attribution-No Derivative Works</cite></p>
<p>One large drawback to all of this ease though is the one aspect which has held me back from dropping my name into the lovingly documented <a href="http://aws.amazon.com/">Amazon Web Services</a>: reliability. The most high profile hits recently included <a href="http://www.theregister.co.uk/2008/02/15/amazon_s3_outage_feb_2008/">Amazon’s S3 outage</a> or <a href="http://www.washingtonpost.com/wp-dyn/content/article/2008/08/11/AR2008081101894.html">GMail’s downtime</a> both of which may be isolated incidents, but demonstrate that five nines uptime isn’t what these services are pushing as their tagline, only ease of use. Of course neither Amazon or Google want downtime and the hit to their credibility will take time to heal, but it highlights that these are not services to be relied on <em>yet</em>. Most current thinking is to use these services as side-cars or backups rather than your main technology platform and to rely upon a robust application to deal with hiccups; perhaps using <a href="http://aws.amazon.com/ec2">EC2</a> as load balancer or for surge capacity, or <a href="http://aws.amazon.com/s3/">S3</a> to store larger files with graceful fall-over to a generic “These files are not currently available” message.</p>
<p>At the moment the services occupy a part of my brain that is doing things backwards: trying to construct uses for the services to justify signing up for them. I’m always looking to learn new technologies, but for the moment I have no use for them that isn’t already filled more than adequately by other services. I run a high-availability dedicated server for work which has no trouble handling several large capacity sites and a shared hosting solution with <a href="http://www.dreamhost.com">Dreamhost</a> for this site which allows me to tinker and write without any artifical boundaries. For me, I enjoy the finer details of running these services, crafting and honing them to be as swift and efficient as possible; building robustness is part of that, but assuming the underlying foundations you’re building on are less reliable than your application and that downtime should be factored into project scopes are foreign concepts to me. <span class="signOff">¶</span></p>
]]></content:encoded>
			<wfw:commentRss>http://chaostangent.com/2008/09/cloud-computing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Building a better screenshotter</title>
		<link>http://chaostangent.com/2008/08/building-a-better-screenshotter/</link>
		<comments>http://chaostangent.com/2008/08/building-a-better-screenshotter/#comments</comments>
		<pubDate>Sun, 31 Aug 2008 12:00:18 +0000</pubDate>
		<dc:creator>chaostangent</dc:creator>
				<category><![CDATA[Anime]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Geekery]]></category>
		<category><![CDATA[avisynth]]></category>
		<category><![CDATA[bframe]]></category>
		<category><![CDATA[hidef]]></category>
		<category><![CDATA[highdefinition]]></category>
		<category><![CDATA[iframe]]></category>
		<category><![CDATA[mpeg]]></category>
		<category><![CDATA[mplayer]]></category>
		<category><![CDATA[pframe]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[screenshots]]></category>
		<category><![CDATA[scripting]]></category>

		<guid isPermaLink="false">http://chaostangent.com/?p=7</guid>
		<description><![CDATA[
My previous forays into crafting an automatic screenshot taker were, at the time, very successful. The system managed to pump out usable images in a fraction of the time it would have taken me to seek and do them manually; I even extended the script to handle multiple file-inputs which made ‘capping an entire series [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignnone size-medium wp-image-417" title="High definition snow" src="http://chaostangent.com/wp-content/uploads/2008/08/5cmpersecond-01-540x303.jpg" alt="High definition snow" width="540" height="303" /></p>
<p>My <a href="http://chaostangent.com/2006/08/screenshotter/">previous forays</a> into crafting an automatic screenshot taker were, at the time, very successful. The system managed to pump out usable images in a fraction of the time it would have taken me to seek and do them manually; I even extended the script to handle multiple file-inputs which made ‘capping an entire series a breeze. Lamentably, this was a honeymoon period before cracks started to show, followed by gaping chasms.</p>
<p>The only workaround the first screenshotter used was a glitch for Windows Media files which meant the first frame sought was always blank, it swerved around this limitation by taking two shots and discarding the first. This symptom, however, was indicative of what would become a persistent problem.</p>
<p><strong>Background</strong></p>
<blockquote class="pullout"><p>“The thing to understand is that seeking in a video files is very difficult”</p>
</blockquote>
<p>The first significant problem I encountered with the setup was with the series <a href="http://chaostangent.com/2007/09/claymore/">Claymore</a>, a great many of the resulting images seemed to have a lot of “bleed through”, as if one frame were being intermingled with another, this was above an beyond the standard cross-fade transition screenshots that were common. At the time I assumed it was because the files I used were modern H264 MKV files rather than the standard XViD ones I had been using before, or that the encoding was particularly shoddy. After downloading an updated version of <a href="http://www.mplayerhq.hu/">mplayer for Windows</a> the problem seemed to disappear; I ended up regenerating a lot of the images for episodes which were most severe offenders.</p>
<p>After a spate of swift updates, I didn’t blog anime any more so the screenshotter shortcut on my desktop lay dormant until I decided to unleash some madness on Strawberry Panic. While the setup worked, it was producing an unusual amount of exact duplicate images, despite being over five seconds apart. I realised there was a fundamental underlying cause for this that an mplayer update wouldn’t fix. True high-definition versions (not upscales) of certain releases were now readily available, namely the seminal Ghost in the Shell: Standalone Complex (and associated movie Solid State Society) and a selection of <a href="http://www.animenewsnetwork.com/encyclopedia/people.php?id=3487">Makoto Shinkai</a> works including <a href="http://chaostangent.com/2007/10/five-centimeters-per-second/">5 centimeters per second</a>, which I wanted to pluck some quality captures from (for desktop wallpaper or other purposes). These files did not agree with the screenshotter at all and stoically produced correct resolution but entirely black captures which was less than useful.<br />
<span id="more-7"></span><br />
<img class="alignnone size-medium wp-image-418" title="High definition solitude" src="http://chaostangent.com/wp-content/uploads/2008/08/5cmpersecond-02-540x303.jpg" alt="High definition solitude" width="540" height="303" /></p>
<p><strong>The cause</strong></p>
<p>While not entirely sure of the cause, my hunch lay with the way newer files operate. Modern video compression usually comes in <a href="http://en.wikipedia.org/wiki/B-frame">three frame types</a>, I-Frame (usually termed keyframes), P-Frame which are predictive frames (sometimes thought of as “only the bits that change”) and B-Frames which which are also predictive frames but don’t just rely on the previous decoded frames but the forward frames as well. H264, the most modern MPEG standard also gives three other frame types: SI, SP and multi-frame motion frames, all of which are outside the scope as to why they’re different. The thing to understand is that seeking in a video files is very difficult as not only does the decoder need to potentially decode all the way back to an I-frame, but also forward to get all the data needed which can be time consuming and intensive.</p>
<p>The older screenshotter worked by repeatedly spawning mplayer with command line arguments stating it should jump to a specific point, play only 2 frames which it outputs as images and then exit. The scripting part (handled by PHP) was doing all the calculations as to where to seek to rather than expecting mplayer to do this (the reason for this will be explained below). The problem of duplicate screenshots was likely being caused by forcing mplayer to seek to an arbitrary point where it would display the first frames it could reliably show, and in some highly compressed files, this could be the same across one or more screengrab attempts. The blank frame problem I postulated was probably a computational bottleneck, my computer can’t play most 1080p files in real time and can baulk at some 720p ones, my theory is that mplayer was doing the best job it could without desynching and displayed a blank frame so it could “catch up” as the file progressed, which it couldn’t do due to only forcing two frames.</p>
<p>In short, because mplayer was being continually restarted, the screenshots lacked the “context” of the rest of the file and thus were substandard or just plain missing.</p>
<p><img class="alignnone size-medium wp-image-419" title="High definition consumerism" src="http://chaostangent.com/wp-content/uploads/2008/08/5cmpersecond-03-540x303.jpg" alt="High definition consumerism" width="540" height="303" /></p>
<p><strong>Version 2</strong></p>
<p>The original screenshotter layered PHP on top of mplayer because of a number of perceived deficits in mplayer’s operation. mplayer has always had a “framestep” option (which in the most recent versions has been bundled into the video filter architecture) which seemed like exactly what was needed, unfortunately it only forced the <em>rendering </em>of those particular frames rather than doing what I had hoped and only playing those frames; using this option means taking screenshots would take as long as the file itself.</p>
<p>My next thought was to use something a bit more specialised for scripting, <a href="http://avisynth.org/">AviSynth</a> sprung immediately to mind. As well as a cornucopia of other options, it had a function which seemed to be what I was looking for: <a href="http://avisynth.org/mediawiki/SelectEvery">SelectEvery</a>. My hope that it would provide a simple solution to this problem was dashed as it did exactly the same as the mplayer framestep option, rendering only the selected frames rather than compressing.</p>
<p>Version 2 is more of a compromise than a solution. Essentially mplayer needed to play through the file in its entirety to be able to extract good screenshots but needed to be quick enough to be usable. The result is using previously mentioned options of mplayer (frameskip, JPEG output) but forcing the frame rate to be fast so that mplayer can tear through the file as fast as it can. Thankfully by default mplayer doesn’t do any frame skipping and it requires you to stipulate it skip frames (aggressively or otherwise). The final command line I used for my shortcut was:</p>
<pre><code>mplayer.exe -quiet -nosound -vo jpeg:progressive:quality=85:outdir=screenshots -fps 1000 -vf framestep:i120</code></pre>
<p>I would have used –really-quiet except my version of mplayer didn’t seem to like that option. Dropping this into a shortcut on my desktop, I’m able to drag a file onto the shortcut and for everything to just whir away in the background. The lower case “i” in front of the framestep intervals tells mplayer to spit out an “I!” everytime it captures a frame which you can disable if you want the minimum of fuss. The framestep value is in frames which means the time between files is dependant on the frame of the video you’re capturing, however I usually do screenshots in depth and then prune them afterwards, you can adjust this to suit your style.</p>
<p><img class="alignnone size-medium wp-image-420" title="High definition grey day" src="http://chaostangent.com/wp-content/uploads/2008/08/5cmpersecond-04-540x303.jpg" alt="High definition grey day" width="540" height="303" /></p>
<p><strong>Conclusions and ways forward</strong></p>
<p>In terms of results, the images dotted about this article say all that they need to really. The number of duplicates is none existent and there is no apparent visual distortion or corruption or errant blanks even with high definition files.</p>
<p>As far as benchmarks are concerned, I haven’t yet done any tests to see whether this way is faster or more complete than my previous version. Setting the frame rate so high may be detrimental rather than a beneficial as mplayer may be pushing harder to get through the file than it should be, FPS could be optimal at a much lower value (perhaps a multiple of the actual framerate).</p>
<p>This method also requires a lot more maintenance on the part of the user as the screenshots are unceremoniously dumped into a single directory and are subject to being overwritten if doing a sequence of files. There may yet be a place for scripting in this version, although I’m loathe to do so as the purity of keeping the method entirely within a shortcut is not to be underestimated. <span class="signOff">¶</span></p>
]]></content:encoded>
			<wfw:commentRss>http://chaostangent.com/2008/08/building-a-better-screenshotter/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Expectancy: PHP 5.3</title>
		<link>http://chaostangent.com/2008/08/expectancy-php-5-3/</link>
		<comments>http://chaostangent.com/2008/08/expectancy-php-5-3/#comments</comments>
		<pubDate>Wed, 27 Aug 2008 18:18:56 +0000</pubDate>
		<dc:creator>chaostangent</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Geekery]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[namespaces]]></category>
		<category><![CDATA[oop]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[php4]]></category>
		<category><![CDATA[php5]]></category>
		<category><![CDATA[php5.2]]></category>
		<category><![CDATA[php5.3]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://chaostangent.com/?p=9</guid>
		<description><![CDATA[
The release of PHP 5.3 is due sometime soon and with a feature freeze in place since the 24th of July and a pre-release alpha now available, it’s worth exploring some of the many additions and changes that are going to be introduced.
As PHP is the language I most frequently work in and one which [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignnone size-full wp-image-422" title="Pretty Hard Panda" src="http://chaostangent.com/wp-content/uploads/2008/08/php.gif" alt="Pretty Hard Panda" width="120" height="67" /></p>
<p>The release of PHP 5.3 is due sometime soon and with a feature freeze in place since the 24th of July <em>and</em> a pre-release alpha now available, it’s worth exploring some of the many additions and changes that are going to be introduced.</p>
<p>As PHP is the language I most frequently work in and one which I’ve done all sorts with (from web applications, to <a href="http://192.168.1.65/blog.chaostangent.com/archives/370">file exploration</a> to <a href="http://192.168.1.65/blog.chaostangent.com/archives/14">media player scripting</a>), I like to think I’m sensitive to deficiencies and oddities in the released implementations. Version 5.3 contains a lot of elements backported from the still distant version 6, the most glaring omission being end-to-end Unicode support without mb_* fudges or iconv; being able to use string-backed functions like array_unique() without suspicion will be a big help, but I digress.</p>
<p>The most high-profile addition is that of namespaces, gone will be the warts that dot current frameworks (e.g. Zend_Db_Table_Rowset) which will make different frameworks and modules far easier to use and far more friendly when you want them to play nicely together.</p>
<blockquote class="pullout"><p>“PHP and MySQL have always been bedfellows despite their conflicting release licenses”</p>
</blockquote>
<p>Static functions have also been promoted to all a lot of the meta-programming niceties that member functions have including true overloading support which will allow first level abstractions such as database wrappers to not require instantiation before being called (which I discovered around the same time as <a href="http://192.168.1.65/blog.chaostangent.com/archives/40">my get_class exploration</a>). For instance, if using an ORM, doing People::getAllById() will now be easier to achieve. Along side this many of the magic methods have been tightened up to make them less ambiguous (__get can only be public and not static, signatures enforced etc.)</p>
<p>Looking through some of the other <a href="http://wiki.php.net/doc/scratchpad/upgrade/53">changes detailed in the PHP Wiki</a> it seems that a selection of new functions surrounding garbage collection are now being exposed including checking whether it is enabled, and selectively enabling or disabling it. Whether this is a mistake (close by get_extension_funcs() is detailed as a new function but <a href="http://uk3.php.net/manual/en/function.get-extension-funcs.php">appears to have been in since PHP4</a>) and these are bleed-throughs from the Zend Engine is unclear, but without some surrounding memory management facilities, it would seem unwise to disable or allow disabling of garbage collection.</p>
<p>On the extension front numerous ones have been standardised and moved into the PECL system which goes some way to neatening things up; the change <a href="http://blog.felho.hu/what-is-new-in-php-53-part-3-mysqlnd.html">some are talking about</a> is the choice between a local MySQL library (mysqlnd) versus the native libmysql library that comes when compiling against a MySQL release. PHP and MySQL have always been bedfellows despite their conflicting release licenses (especially so since Sun gobbled up MySQL) so this seems like a smart move for all concerned with separate code-base, better engine integration and statistical analysis now possible (<a href="http://www.hristov.com/andrey/projects/php_stuff/pres/mysqlnd_vikinger.pdf">PDF details</a>).</p>
<p>What all of this adds up to is a release that’s solid on paper, but the bum-rush for patches is sure to be as swift as any other PHP release. Especially with the OO enhancements though, it feels like these should have been included from day one, as not only will there now be a disjoint between PHP4 and PHP5 shared servers, but PHP5.2 and PHP5.3 as well. For someone who runs their own server this is not massive worry, especially when the list of backwards compatibility changes are so small, but for service providers (hosts, ISPs etc.) still dragging their feet over 4 &gt; 5 &gt; 5.2, this adds another step of complexity.</p>
<p>The real test will obviously be the frameworks and high profile applications that PHP utilises and with word that the <a href="http://framework.zend.com/">Zend Framework</a> won’t be <a href="http://www.nabble.com/PHP-5.3-Namespaces-on-ZF-td18836642.html">supporting namespaces until its 2.0</a> release next year the lead time could be immense, especially when you consider phpBB, what was once considered the yardstick of PHP usage, <a href="http://www.phpbb.com/support/documentation/3.0/quickstart/quick_requirements.php">still supports 4.3</a> with its most recent version, the playing field for cutting edge PHP seems less than agile. <span class="signOff">¶</span></p>
]]></content:encoded>
			<wfw:commentRss>http://chaostangent.com/2008/08/expectancy-php-5-3/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Deconstruction part 2</title>
		<link>http://chaostangent.com/2008/01/deconstruction-part-2/</link>
		<comments>http://chaostangent.com/2008/01/deconstruction-part-2/#comments</comments>
		<pubDate>Thu, 10 Jan 2008 19:08:37 +0000</pubDate>
		<dc:creator>chaostangent</dc:creator>
				<category><![CDATA[Anime]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Geekery]]></category>
		<category><![CDATA[battle moon wars]]></category>
		<category><![CDATA[bits]]></category>
		<category><![CDATA[bytes]]></category>
		<category><![CDATA[c]]></category>
		<category><![CDATA[compression]]></category>
		<category><![CDATA[decompile]]></category>
		<category><![CDATA[deconstruction]]></category>
		<category><![CDATA[hex]]></category>
		<category><![CDATA[lz77]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[xml]]></category>
		<category><![CDATA[xvi]]></category>
		<category><![CDATA[yanepak]]></category>
		<category><![CDATA[yanesdk]]></category>
		<category><![CDATA[yaneurao]]></category>

		<guid isPermaLink="false">http://chaostangent.com/?p=17</guid>
		<description><![CDATA[Attacking those “random” files a couple of days ago provided enough of a challenge to keep me interested for a few hours, especially as it seemed like I was treading new ground in terms of spec’ing out previously unexplored file formats. It turned out that the files had already been mapped and successfully decompressed and [...]]]></description>
			<content:encoded><![CDATA[<p>Attacking those “random” files a <a href="http://chaostangent.com/2008/01/deconstruction/">couple of days ago</a> provided enough of a challenge to keep me interested for a few hours, especially as it seemed like I was treading new ground in terms of spec’ing out previously unexplored file formats. <a href="http://chaostangent.com/2008/01/deconstruction/#comments">It turned out</a> that the files had already been mapped and successfully decompressed and the only thing left to do was build an unpacker which was in the pipeline. It seemed my work wasn’t exactly fruitless but other, probably smarter people had everything under control. I wasn’t about to let that stop me though.</p>
<p><em>Note (2008–01-11): The full (official?) SDK for this file format <a href="http://yaneurao.hp.infoseek.co.jp/yaneSDK2nd/">has been located</a> which includes both a packer and an unpacker as well as other tools I’m sure are useful for working on the file format. The full name of the file format is “Yaneurao” with the SDK going by the nomenclature of “yaneSDK” which is the stem for the file format signature of “yanepkDx”. There is already a <a href="http://yanesdkdotnet.sourceforge.jp/">.NET version of the SDK</a> so if you’re interested in my deconstruction process then read on, otherwise I would recommend using the official/fully-featured SDKs.</em></p>
<blockquote class="pullout"><p>“Then, in that moment of lucid elation, I realised exactly what was going wrong.”</p>
</blockquote>
<p>The compression format was identified as <abbr title="Lempel-Ziv-Storer-Szymanski">LZSS</abbr> and reading through <a href="http://sekai.insani.org/archives/24">several</a> <a href="http://oldwww.rasip.fer.hr/research/compress/algorithms/fund/lz/lzss.html">sites</a> revealed that some of the data I had initially spotted but attributed to SHIFT JIS (or at one point a Unicode Byte Order Marker, perfect for a non-Unicode file) were the tell-tale signatures of LZSS; the gradual degradation into junk data was also typical of the algorithm as the further into the file the stream progresses, the more back references are present.</p>
<p><img class="alignnone size-full wp-image-425" title="LZSS" src="http://chaostangent.com/wp-content/uploads/2008/01/06.png" alt="yanePkDX" width="382" height="20" /><br />
While I hadn’t heard of LZSS, it came as no surprise that it was a modified version of <a href="http://en.wikipedia.org/wiki/LZ77">LZ77</a> which I had come across before though never toyed with. Having to <a href="http://www.cs.duke.edu/courses/spring03/cps296.5/papers/ziv_lempel_1977_universal_algorithm.pdf">dig through a dense PDF</a> was not my idea of fun and my university days had proven that reading academic proofs rarely lead to workable implementations for me so I <a href="http://www.google.co.uk/search?q=lzss+php">searched for a ready-made PHP version</a> which (for reasons which will soon become glaringly apparent) didn’t prove fruitful. After coming up against dead-ends with other languages I settled on the <a href="http://www.koders.com/c/fidC554142F5E42CA3433CD4C8B9043D09C8A092DF8.aspx">defacto C version</a> which seemed most other versions I found were based off.<br />
<span id="more-17"></span><br />
Ignoring my <a href="/stuff/deconstruction/deconstructor.zip">original deconstruction script</a> for the moment, I worked on the assumption that each individual file contained within the large .dat files were individually compressed given that each file had a readable opening section of bytes and (according to the LZSS spec) didn’t have any back references. Like with other implementations of algorithms I didn’t fully understand, I copied the C code more or less exactly, altering formatting to my tastes and altering code to take into account any PHP idioms that I could foresee. Checking things over, I pumped in one of the compressed files and, unsurprisingly, the output file was more or less blank. After rechecking the code and running it again, the output file was once again filled with spaces and some sporadic junk bytes that didn’t look familiar.</p>
<p>The script wasn’t even outputting the uncompressed data at the beginning of the file and the output was larger than the input but still not the size flagged in the original .dat files. After scratching my head for a while I set about spitting out some debug data to pinpoint what had gone wrong and where. The algorithm is broken down into roughly three main sections, in two main control structures. Putting in some basic output formatting to check each section was executing proved that each section was being run in a way that I could only assume was correct:</p>
<p><img class="alignnone size-full wp-image-426" title="The output attacks" src="http://chaostangent.com/wp-content/uploads/2008/01/07.png" alt="The output attacks" width="520" height="144" /></p>
<p>This assumption of course turned out to be false but I wouldn’t realise this until later in the day. The LZSS algorithm uses a number of constants to define things such as the size of the sliding buffer window, maximal reference length and minimal reference length (a change from the LZ77 algorithm to prevent the encoding being longer than the original) so I tweaked the values first with sensible then ridiculous values only to have the script spit out similarly broken output. The C algorithm also had several places where it used hex values to do bitwise operations, converting these to decimal (obviously) proved ineffective and I was ready by now to admit that I was stumped. I had been working on it for a while now so I took a break for lunch, during which I decided to ditch the C algorithm and start from scratch so that I actually understood what was going on.</p>
<p>This proved even more torturous so I switched back to my original script and started spitting out some fairly detailed output including: the section of the algorithm, the current byte location in the stream, the hex value of the most pertinent read byte and the binary value of that byte.</p>
<p><img class="alignnone size-full wp-image-427" title="It just keeps coming!" src="http://chaostangent.com/wp-content/uploads/2008/01/08.png" alt="It just keeps coming!" width="520" height="144" /></p>
<p>This more or less nailed down that the entire implementation was broken, the values it was generating from the very beginning were incorrect which of course meant all the back references and so forth were incorrect. Using the binary output and a bit of paper I worked out what the values were <em>supposed</em> to be and started following the values through the algorithm. This part was absolutely essential to working out what was wrong with the implementation of the algorithm as it elucidated what each part did:</p>
<ol>
<li>The first section (which I had termed “FLAGS”) worked out whether a byte was a control byte and set a flags variable</li>
<li>The second section (which I had termed “AND1”) assumed it was reading a raw byte and simply wrote it to the output stream (and the buffer).</li>
<li>The third section (which I had termed “CONTROL”) read the two control bytes which formed a back reference and then read the appropriate data from the buffer and subsequently the output.</li>
</ol>
<p>From my output it was apparent the meat of the algorithm, reading the raw data, wasn’t being done. Then, in that moment of lucid elation, I realised exactly what was going wrong.</p>
<p>PHP was grabbing a byte from the input file as a string, and being a loosely-typed language meant that when it came to doing bitwise operations, the underlying type was incredibly important. I’m more than willing to admit that this state of affairs was my own damn fault for prototyping this in a language that wasn’t built for algorithms and bit level operations and had I done this in a strongly-typed language, everything would have been dandy. Of course, had I simply dumped the implementation I found into a C file and compiled away, I wouldn’t really understand what was going on, so my retardedness didn’t go to waste.</p>
<p>Long story short, forcing the read bytes into an integer using the ord() function (and intval() just to make sure) solved the issue and the file I was working on transformed before my eyes.</p>
<p><img class="alignnone size-full wp-image-428" title="Taming the algorithm" src="http://chaostangent.com/wp-content/uploads/2008/01/09.png" alt="09" width="520" height="144" /></p>
<p><img class="alignnone size-full wp-image-429" title="Almost got it..." src="http://chaostangent.com/wp-content/uploads/2008/01/10.png" alt="Almost got it..." width="520" height="150" /></p>
<p>Almost.</p>
<p>Turns out what “sage” had said in my comments on the original version of my unpacker was slightly wrong, the sliding window wasn’t 256 bytes (0x100) but the standard LZSS implementation window size of 4096 bytes which means that nothing really needed to be changed from the standard C implementation of the algorithm. As a proof of concept:</p>
<p><a href="http://chaostangent.com/wp-content/uploads/2008/01/chara_init_third.xml.lzss">Sample LZSS compressed file</a>, <a href="http://chaostangent.com/wp-content/uploads/2008/01/chara_init_third.xml">Sample uncompressed file</a></p>
<p>So I now present version 1.1 of the deconstructor script which is released under the same <a href="http://creativecommons.org/licenses/by/2.0/">Creative Commons Attribution 2.0 Generic license</a>.</p>
<p><a href="http://chaostangent.com/wp-content/uploads/2008/01/deconstructor1.1.zip">deconstructor1.1.zip (1.4KB)</a></p>
<p>The usage is exactly the same:<br />
<code>php deconstructor.php data1.dat output\</code></p>
<p>The only difference will be the output spat out by the script which will tell you when a file has been decompressed and whether it succeeded or failed (done by checking the canonical size in the .dat file versus the output size).</p>
<p><strong>To-do</strong><br />
At the moment the script outputs a file to a temporary name and then operates on that file. This isn’t optimal but I was having trouble getting my implementation to work in-stream, probably due to fatigue. I may or may not fix that for the PHP version as the next step is to drop the entire deconstructor into a C or C++ file and do a native compile so you don’t have to mess around with PHP and I feel like I’ve developed something in a big-boys’ language. If I get the time and the inclination I may do that over the weekend.</p>
<p>As well as the unpacker, I get the feeling that the <a href="http://blog.seiha.org/">friend</a> who this is a favour for will require a repacker which will obviously mean doing the LZSS algorithm in reverse and also bundling everything into a .dat file. Should be an intriguing challenge to see if I’ve learned anything from this little endeavour. <span class="signOff">¶</span></p>
]]></content:encoded>
			<wfw:commentRss>http://chaostangent.com/2008/01/deconstruction-part-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Deconstruction</title>
		<link>http://chaostangent.com/2008/01/deconstruction/</link>
		<comments>http://chaostangent.com/2008/01/deconstruction/#comments</comments>
		<pubDate>Mon, 07 Jan 2008 19:41:31 +0000</pubDate>
		<dc:creator>chaostangent</dc:creator>
				<category><![CDATA[Anime]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Geekery]]></category>
		<category><![CDATA[battle moon wars]]></category>
		<category><![CDATA[bits]]></category>
		<category><![CDATA[bytes]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[decompile]]></category>
		<category><![CDATA[deconstruction]]></category>
		<category><![CDATA[hex]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[translation]]></category>
		<category><![CDATA[werk]]></category>
		<category><![CDATA[xml]]></category>
		<category><![CDATA[xvi]]></category>
		<category><![CDATA[yanepak]]></category>

		<guid isPermaLink="false">http://chaostangent.com/?p=15</guid>
		<description><![CDATA[Out of curiosity and a favour to someone, I decided to take a look at some random .dat files that were ripe for the translating; what ensued was a morning of head scratching, hex scrying and using some of the lesser used PHP functions.
Sample File 1, Sample File 2, Sample File 3
All screenshots taken from [...]]]></description>
			<content:encoded><![CDATA[<p>Out of curiosity and a <a href="http://blog.seiha.org/">favour to someone</a>, I decided to take a look at some random .dat files that were ripe for the translating; what ensued was a morning of head scratching, hex scrying and using some of the lesser used PHP functions.</p>
<p><a href="http://192.168.1.65/blog.chaostangent.com/stuff/deconstruction/data1.dat">Sample File 1</a>, <a href="http://192.168.1.65/blog.chaostangent.com/stuff/deconstruction/data3.dat">Sample File 2</a>, <a href="http://192.168.1.65/blog.chaostangent.com/stuff/deconstruction/data5.dat">Sample File 3</a></p>
<p><em>All screenshots taken from data1.dat, sample file 1 and the window is resized for the most appropriate screenshot rather than general workability.</em></p>
<blockquote class="pullout"><p>“so garbled that it sent a few hundred bell tones to my computer speaker”</p>
</blockquote>
<p>First thing I did was to crank open the lovely XVI32 hex editor and have a look at the sample files provided, their .dat extension more or less indicated they were a proprietary format and were unlikely to relinquish their secrets easily. What was known was that the files contained a header portion, a bundle of XML files in a contiguous stream and a lot of junk data. The XML files could be seen and their encoding was stated as SHIFT JIS and, after cursing its existence, I attributed the junk data to that which seemed like a good place to start.</p>
<p><a href="http://chaostangent.com/wp-content/uploads/2008/01/01.png"><img class="alignnone size-full wp-image-434" title="yanepkDx" src="http://chaostangent.com/wp-content/uploads/2008/01/01.png" alt="01" width="320" height="20" /></a><br />
The first eight bytes seemed to be a file signature, but <a href="http://www.google.com/search?q=yanepk">Google</a> <a href="http://www.google.com/search?q=yanepkdx">searches</a> for all or parts of the signature were fruitless which meant it was time to pick things apart.</p>
<p><a href="http://chaostangent.com/wp-content/uploads/2008/01/02.png"><img class="alignnone size-full wp-image-435" title="Crazy bytes" src="http://chaostangent.com/wp-content/uploads/2008/01/02.png" alt="02" width="320" height="20" /></a><br />
The next four bytes were different for each file and at first I thought it was part of the block format that made up the header part of the file but the section repetition for the header block didn’t match up so after converting it to a variety of different number formats (I’m no hex wizard and I originally thought it was only a two byte short rather than a four byte integer or long) and assumed it was an unisgned long (32 bits) in Little Endian order.<br />
<span id="more-15"></span><br />
<a href="http://chaostangent.com/wp-content/uploads/2008/01/03.png"><img class="alignnone size-full wp-image-436" title="Hex assault" src="http://chaostangent.com/wp-content/uploads/2008/01/03.png" alt="Hex assault" width="443" height="438" /></a><br />
The next section pattern repeated a number of times until the file obviously started with the embedded XML files. After a bit of byte counting and “duh” moments, the general format of the section is:</p>
<p><code>256 bytes - file path and name<br />
4 bytes - unsigned long<br />
4 bytes - unsigned long<br />
4 bytes - unsigned long</code></p>
<p>At a total of 268 bytes for each block, this layout repeats for precisely the number of times specified by the very first unsigned long (after the file signature). So the entire header block consists of:</p>
<p><code>8 bytes - signature "yanepkDx"<br />
4 bytes - number of header entries<br />
(number of header entries * 268 bytes) - header entries</code></p>
<p>This was all well and good but didn’t really illuminate exactly what the three numbers were. After pulling out all the entries, a few things became clear:</p>
<ul>
<li>The first number in each block increases for each successive block</li>
<li>The second number was always larger than or equal to the third number</li>
<li>The first number plus the third number always equalled the first number of the block immediately after the current one</li>
</ul>
<p>So without resorting to rocket science the first number is the absolute byte offset of the filename, the second number was a bit of a mystery, the third number is the length in bytes of the data in the file. After pushing this info through a script it became obvious this was the defacto format of the file, no complex tree structures or other nasties were awaiting; the XML files were pulled out without problem and within a few minutes their original file structure was recreated.</p>
<p>All done right? Wrong. My initial thought that the XML files were SHIFT JIS encoded was indeed correct, however it didn’t solve the junk that proliferated <strong>some</strong> the files.</p>
<p><a rel="text/xml" href="/stuff/deconstruction/arive_boss_plane.xml">Sample un-junked file</a>, <a rel="text/xml" href="/stuff/deconstruction/chara_growth.xml">Sample junked file</a><br />
<a href="http://chaostangent.com/wp-content/uploads/2008/01/04.png"><img class="alignnone size-full wp-image-437" title="Junked file" src="http://chaostangent.com/wp-content/uploads/2008/01/04.png" alt="Junked file" width="520" height="150" /></a><br />
Trying to shift the format into different encodings using known functions only seemed to jumble the junk around rather than get rid of it. It now became apparent that the data was more than likely compressed or otherwise encoded which illuminated what the mysterious second number was in each of the header blocks. The third number represented the packed size of the data, the second represented the unpacked size; this was obvious as the smaller, un-junked files had the same values for each, usually less than 150 bytes.</p>
<p><a href="http://chaostangent.com/wp-content/uploads/2008/01/05.png"><img class="alignnone size-full wp-image-438" title="Junk output" src="http://chaostangent.com/wp-content/uploads/2008/01/05.png" alt="Junk output" width="520" height="144" /></a><br />
Running both the individual files and the larger .dat file through various decompressors proved less than useful as most of the time the file became so garbled that it sent a few hundred bell tones to my computer speaker making it sound like it was having a seizure. I tried various versions and functions of the gzip/zlib library, bzip2, LHA (of which I knew the Japanese were particularly fond of) and of course good old fashioned zip. It stood to reason that the compression wasn’t going to be processor intensive (very few game compression schemes are) which more or less ruled out predictive text algorithms (PPM et al) as well as ACE and 7z formats. The files also seemed to lack any form of dictionary entries as for each file the XML declaration was always in tact which meant that the compression seemed to start an arbitrary length into the file (which would explain why the smaller files were untouched).</p>
<p>This is unfortunately as far as I got after a mornings work and spent a decent amount of time attempting to track down information. The game the files comes from is <a href="http://blog.seiha.org/?p=92">Battle Moon Wars Act 3</a> and it seems that they use TYPE MOON characters, other games of which have been successfully translated which may be one avenue to investigate. The <a href="http://en.wikipedia.org/wiki/Battle_Moon_Wars">developers of the game are “Werk”</a> and if any of their other games (either in the series or otherwise) had been pulled apart, it may give some indication as to where to go forward. There does seem to be information in someone’s brain as not only was an <a href="/stuff/deconstruction/data1unpacked.dat">“unpacked” version of data1.dat unearthed</a>, but <a href="http://forums.visualnews.net/showthread.php?t=11925">forum</a> <a href="http://nrvnqsr.proboards20.com/index.cgi?action=display&amp;board=doujin&amp;thread=1124787854&amp;page=3">posts</a> indicate that work had already begun (if not already aborted) on the technical side of things.</p>
<p>For today at least I’m done with attempting to reverse-engineer arbitrary files and perhaps after sleeping on it some bright idea will be revealed to me that daylight failed to illuminate. For now there is the command line PHP script I quickly prototyped to deconstruct the .dat files (released under the <a href="http://creativecommons.org/licenses/by/2.0/">Creative Commons Attribution 2.0 Generic License</a>) and the promise of further work in the future:</p>
<p><a href="/stuff/deconstruction/deconstructor.zip">deconstructor.zip (1KB)</a></p>
<p>Things should be self explanatory from the file; get a command line PHP interpreter set up and run “deconstructor.php” with the name of the file to tear apart and optionally an output folder e.g.<br />
<code>php deconstructor.php data1.dat output\</code></p>
<p>This is an open call for anyone who wants to help with the effort to scry the encoding/compression of the XML files whether you already know or want to take a stab at it, you are more than welcome. <span class="signOff">¶</span></p>
]]></content:encoded>
			<wfw:commentRss>http://chaostangent.com/2008/01/deconstruction/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Converting to Flash video — (almost) free and not so easy</title>
		<link>http://chaostangent.com/2007/07/converting-to-flash-video-almost-free-and-not-so-easy/</link>
		<comments>http://chaostangent.com/2007/07/converting-to-flash-video-almost-free-and-not-so-easy/#comments</comments>
		<pubDate>Thu, 19 Jul 2007 09:55:19 +0000</pubDate>
		<dc:creator>chaostangent</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Geekery]]></category>
		<category><![CDATA[bitrate]]></category>
		<category><![CDATA[ffmpeg]]></category>
		<category><![CDATA[flash]]></category>
		<category><![CDATA[flv]]></category>
		<category><![CDATA[free]]></category>
		<category><![CDATA[h263]]></category>
		<category><![CDATA[mpeg]]></category>
		<category><![CDATA[opensource]]></category>
		<category><![CDATA[vfw]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[video for windows]]></category>
		<category><![CDATA[virtualdub]]></category>
		<category><![CDATA[vp6]]></category>
		<category><![CDATA[vp62]]></category>

		<guid isPermaLink="false">http://chaostangent.com/?p=19</guid>
		<description><![CDATA[How do you convert an arbitrary video file into a playable Flash video using freely available programs and methods? After close to an afternoon of searching, testing and head-scratching, I finally have a whole answer that can be applied ad-hoc to almost any video you can get your hands on.
This “guide” (more anecdotal than how-to) [...]]]></description>
			<content:encoded><![CDATA[<p>How do you convert an arbitrary video file into a playable Flash video using freely available programs and methods? After close to an afternoon of searching, testing and head-scratching, I finally have a whole answer that can be applied ad-hoc to almost any video you can get your hands on.</p>
<p>This “guide” (more anecdotal than how-to) assumes knowledge of video encoding basics, I’m not going to cover the difference between container and video formats or how to use <a href="http://virtualdub.org/">VirtualDub</a>, there are plenty of other tutorials and guides that cover those topics.<br />
<span id="more-19"></span><br />
The Flash Video container format (FLV) supports two major video formats: the first is codenamed “Sorenson Spark” and is a variant on the H.263 standard; the second is “On2 TrueMotion VP6”. The former is well supported in many encoding and decoding tools and libraries, the latter isn’t. It will come as no surprise then that the latter allows for far greater compression at similar visual quality when compared apples-to-apples to the H.263 format. To give you a quantifiable measure of this: I managed to get more than 2 times greater compression and better visual quality when using the VP6 codec. In short, this is the codec you want to use to get the most out of your movies.</p>
<p><strong>Quick and dirty</strong><br />
If you just want to get a compatible FLV file quickly and painlessly and aren’t worried about size or quality overmuch, then grab yourself a copy of <a href="http://ffmpeg.mplayerhq.hu/">ffmpeg</a> (a recent compiled Windows binary can be <a href="http://ffdshow.faireal.net/mirror/ffmpeg/">found here</a>) and put it in a place where your command-line of choice can find it. Then punch in:</p>
<p><code>ffmpeg -i "yourvideofilegoeshere.avi" outfile.flv</code></p>
<p>Voila, in no time flat you’ll have an all-singing all-dancing .flv file ready for whatever you have in store for it. If you’re feeling particularly awesome, you can even control the output size of the video:</p>
<p><code>ffmpeg -i "totallyawesomekittenvideo.avi" -s 320x240 outfile.flv</code></p>
<p>ffmpeg, converts using the H.263 video flavour and MP3 audio format which is <em>probably</em> fine for most people. The problem with this process is the output is less than stellar and suffers from a tremendous amount of artefacting. I wanted to exercise a little more control over the visual quality. Some searching revealed byzantine quantizer settings which made the command line look like a calculator had exploded:</p>
<p><code>ffmpeg -i "hahathatguytotallysucks.avi" -qcomp 0.6 -qmax 15 -qdiff 4 -i_qfactor 0.71428572 -b_qfactor 0.76923078 -maxrate 972800 -s 320x240 -b 819200 -refs 1 -subq 1 -y outfile.flv</code></p>
<p>Tweaking these options gives variable results, but nothing close to the kind of quality / file-size ratio I wanted. The VP6 codec seemed worth trying out. Unfortunately using the VP6 codec is rife with hurdles; the primary one is that it is entirely proprietary, <a href="http://www.on2.com/">On2</a> own licenses and patents and probably crocodiles with bazookas to protect the codec; some companies have obtained licenses to use it in their products (On2 of course having their own implementation) which means the easiest and most pain-free route is to buy one of those products and bask in the fully-licensed glory.</p>
<p>What isn’t widely publicised is that On2 released a version of the VP6 codec for “Personal use” but no longer provide it for download on their website. A cursory search on Google (lets say “<a href="http://www.google.com/search?q=vp6+vfw+codec">vp6 vfw codec</a>”) returns some good matches. After downloading and installing, I now had the ability to encode to VP6 as long as it’s for “Personal use” according to the license agreement. This little endeavour was for my own curiosity rather than monetary gain which I’m sure falls under that stipulation.</p>
<p>Codec in hand, in theory it should be as simple as encoding to VP6 using something like VirtualDub and then muxing everything together into an FLV file. If only things were that simple. As far as I could see, there exists no standalone set of FLV muxing tools (like the seminal MKVtoolnix suite). However, ffmpeg can output to an FLV file and provides the ability to do a straight copy (i.e. no transcoding) of the source video, that could work…</p>
<p>No.</p>
<p>For the VP6 codec to be recognised within an FLV file, the container needs to have special bits set which indicate to the player that it’s going to receive VP6 video content rather than H.263/Spark; ffmpeg doesn’t write these bits as it doesn’t “officially” deal with VP6. After much searching, I stumbled upon a way to modify ffmpeg to write these bits but the patch hasn’t been merged into the main ffmpeg branch yet. You can download the patch and a pre-compiled Windows binary from <a href="http://sh0dan.blogspot.com/2006/09/command-line-flash-8-flv-encoding.html">a blog</a> which also offers an alternate method of achieving what I’m describing.</p>
<p><strong>The crux</strong><br />
If you haven’t been keeping up, here’s the short version of it. We want to make an FLV using VP6. Vanilla ffmpeg doesn’t do this so we need a modified ffmpeg to do it. Use the above blog page or modify your ffmpeg source to get ffmpeg to do what we want.</p>
<p>Open up your source video in VirtualDub and apply any filters you want (contrast, brightness, resize etc.) but make sure the “Flip vertically” filter is somewhere in that mix. Go to compression and select the VP6 codec; if you’re using VP 6.2 you can do two-pass encoding which, potentially gives better results than a one-pass encode. Make sure “Use source audio” is selected and save your video down. You’ll have an .avi file which has VP6 video and the original source audio.</p>
<p>Now use your modified version of ffmpeg and use the following (substituting filenames where applicable):</p>
<p><code>ffmpeg -y -i "ooooohprettyprettyflowers.avi" -vcodec copy outfile.flv</code></p>
<p>If you want to control the audio compression a little better add the options for that:</p>
<p><code>ffmpeg -b 128 -ac 2 -ar 44100 -y -i "ohwowexplosions.avi" -vcodec copy outfile.flv</code></p>
<p>You will now have “outfile.flv” which is your final Flash video file, ready for uploading. Of course, the proof of the pudding is in the tasting:</p>
<p id="h263"><a href="http://www.macromedia.com/go/getflashplayer">Get Flash</a> to see this player.</p>
<p>h263.flv — 1,466KB<br />
<script type="text/javascript">// <![CDATA[
	var so = new SWFObject("http://192.168.1.65/blog.chaostangent.com/stuff/flashvideo/mediaplayer.swf","h263","512","384","7");
	so.addParam("allowfullscreen","true");
	so.addVariable("file","http://192.168.1.65/blog.chaostangent.com/stuff/flashvideo/h263.flv");
	so.write("h263");
// ]]&gt;</script></p>
<p id="vp62"><a href="http://www.macromedia.com/go/getflashplayer">Get Flash</a> to see this player.</p>
<p>vp62.flv — 676KB<br />
<script type="text/javascript">// <![CDATA[
	var so = new SWFObject("http://192.168.1.65/blog.chaostangent.com/stuff/flashvideo/mediaplayer.swf","vp62","512","384","8");
	so.addParam("allowfullscreen","true");
	so.addVariable("file","http://192.168.1.65/blog.chaostangent.com/stuff/flashvideo/vp62.flv");
	so.write("vp62");
// ]]&gt;</script></p>
<p><strong>Conclusion</strong><br />
This process is obviously not suited for the automatic encoding process that a lot of sites seem to crave nowadays, this process is far better suited for the cash-strapped auteur who wants the most out of their videos and bandwidth and doesn’t have a large amount of videos to encode. VP6 support in ffmpeg/libavcodec is coming along, and the most recent builds of ffmpeg come with decoding support for VP6, but whether patents/license prevent encoding support is still to be seen. <span class="signOff">¶</span></p>
]]></content:encoded>
			<wfw:commentRss>http://chaostangent.com/2007/07/converting-to-flash-video-almost-free-and-not-so-easy/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Screenshotter</title>
		<link>http://chaostangent.com/2006/08/screenshotter/</link>
		<comments>http://chaostangent.com/2006/08/screenshotter/#comments</comments>
		<pubDate>Tue, 29 Aug 2006 21:34:22 +0000</pubDate>
		<dc:creator>chaostangent</dc:creator>
				<category><![CDATA[Anime]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Geekery]]></category>
		<category><![CDATA[automatic]]></category>
		<category><![CDATA[avi]]></category>
		<category><![CDATA[ffmpeg]]></category>
		<category><![CDATA[jpeg]]></category>
		<category><![CDATA[mplayer]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[png]]></category>
		<category><![CDATA[screenshots]]></category>
		<category><![CDATA[screenshotter]]></category>
		<category><![CDATA[script]]></category>
		<category><![CDATA[wmv]]></category>

		<guid isPermaLink="false">http://chaostangent.com/?p=440</guid>
		<description><![CDATA[An exploration of the different ways to automatically take a selection of screenshots from a video file. Concentrates on open-source and home-made solutions concluding with a solid first-step hybrid using mplayer and PHP.]]></description>
			<content:encoded><![CDATA[<p>An “automatic” screenshot taker is something that I’ve always wanted, but the <a href="http://www.frame-shots.com/">commercial offerings</a> leave much to be desired and the only other option seems to be the “manual” approach. I am of course talking about screenshots from video files rather than screenshots of your desktop, that sort of thing is <a href="http://www.techsmith.com/">well covered</a>.</p>
<p>One of the problems with making your own is that the options are fairly limited on just how you go about opening video files and pulling out the candy frame goodness. For Windows users, the option is to use DirectShow which I can only describe as <a href="http://en.wikipedia.org/wiki/Crystal_Maze">The Crystal Maze</a> for it’s Byzantine ways of operating are beyond mortal ken. The other option is to use a pre-built library such as <a href="http://ffmpeg.mplayerhq.hu/">ffmpeg</a> or similar. This was out as well as not only was it a whole new way of working for me (Windows development files were few and far between) it was a whole new set of a programming challenges which made the learning curve more of a learning cliff.</p>
<blockquote class="pullout"><p>“during testing I had a number of problems with this”</p></blockquote>
<p>So I turned forlornly to existing media-players in the slim hope that one of them would have the abilities required for scripting a makeshift screenshotter. <a href="http://sourceforge.net/projects/guliverkli/">Media Player Classic</a> has limited command line support, <a href="http://www.videolan.org/vlc/">VLC</a> is more geared towards client/server setup and I couldn’t even figure out whether that route would lead to any semblance of success, <a href="http://www.bsplayer.org/">BSPlayer</a>… The list goes on as to the number of players which don’t supply a full body of command line options.</p>
<p>The silver lining, the angel of hope was <a href="http://www.bsplayer.org/">MPlayer</a>. If you’re prepared to wade through a bit of fudge to get there, MPlayer provides everything you need to script a screenshotter:</p>
<ul>
<li>jump to any part of the file from the command line</li>
<li>output into different (static) formats such as PNG and JPEG</li>
<li>can output file information (length, dimensions etc.)</li>
</ul>
<p>With these three functions MPlayer is almost all you need. <strong>Almost.</strong><br />
<span id="more-440"></span><br />
First of all you need to get the MPlayer release for your architecture, for the majority of screenshot monkies, that’s Windows. Puncturing the MPlayer-Windows mantle takes a bit of pushing but essentially you can usually get away with just <a href="http://www1.mplayerhq.hu/MPlayer/releases/win32/">downloading the latest build</a>. This gives you support for a whole heap of formats (XviD, DivX, x264 and so on), however some encoders prefer to eschew open-source and go with Windows Media Video (usually of the “9” flavour). This is not available by default (as MPlayer uses ffmpeg/libavcodec and not DirectShow) so you need to grab an ethereally named <a href="http://www.mplayerhq.hu/MPlayer/releases/codecs/">codec package</a> and dump them into your MPlayer install directory. With a bit of luck and perhaps a bit of <a href="http://www.mplayerhq.hu/DOCS/HTML/en/index.html">document searching</a> you’ll have yourself a fully working command-line media player.</p>
<p>Now for the easy bit, the scripting. I chose to use PHP simply because I use it on a day-to-day basis, any kind of scripting would work though. With some document digging you can find the <a href="http://www.mplayerhq.hu/DOCS/man/en/mplayer.1.html">list of command line options</a>. For our purposes we’re only going to need to run MPlayer in two “modes”: the first is pulling the pertinent information from the video file we’re going to grab screenshots from, the second is actually pulling the screenshots out of the file.</p>
<h2>Identification</h2>
<pre><code>-vo null -nosound -frames 0 -identify</code></pre>
<p>For the impatient this sets the video output to null, disables sound, doesn’t output any frames and prints out identifying features of the video file.</p>
<h2>Screenshotting</h2>
<pre><code>-really-quiet -vo jpeg:progressive:quality=90 -nosound -ss {seek} -frames 2</code></pre>
<p>This turns off most informational otuput, sets the video output to JPEG with nice options, disables sound, seeks to the specific point in the video then outputs 2 frames (the reason for which I will explain shortly).</p>
<p>First task is to get information about the file and find out the total length of the video/stream. Running the identification command-line arguments with MPlayer and capturing the output is a simple case of running <a href="http://uk.php.net/manual/en/function.shell-exec.php">shell_exec</a> or using backticks, whichever you prefer.</p>
<pre><code>$infoOutput = shell_e xec("{$mplayerCommand} {$argsInfo} \"{$_SERVER['argv'][1]}\"");

//$matcher = '/ID_VIDEO_WIDTH=(\d+)$|ID_VIDEO_HEIGHT=(\d+)$|ID_LENGTH=(\d+\.\d+)$/m';
$matcher = '/ID_LENGTH=(\d+\.\d+)$/m';
$matches = array();

preg_match_all($matcher, $infoOutput, $matches);

$fileLength = (!empty($matches[1][0])) ? floatval($matches[1][0]) : 1140;</code></pre>
<p>The filename is pulled from the “argv” array passed to command line scripts. The commented out regex is for if you wanted to get the pixel dimensions of the video file, as it is we don’t need this information so the regex that is used is simpler and more compact. The file-length is output in seconds.milliseconds format which is then cast (as much as anything can be cast in PHP) as a float or assumed to be 24 minutes if nothing was matched.</p>
<p>The next step is to work out the interval at which you’re going to take screenshots. This is usually defined either by a frequency (take a screenshot every X seconds) or by the number of screenshots to take (100).</p>
<p>Now there is an MPlayer command line option to skip a certain number of seconds after each frame (-sstep &lt;sec&gt;) however during testing I had a number of problems with this which is why I use this less elegant but more foolproof method:</p>
<pre><code>for($i = 0; $i &lt; $screenshotCount; $i++)
{
	$offset = $i * $increment;
	$tempArgs = str_replace('{seek}', $offset, $argsPlay);
	shell_e xec("{$mplayerCommand} {$tempArgs} \"{$_SERVER['argv'][1]}\"");
}</code></pre>
<p>From the example arguments provided above, this replaces the “{seek}” token with our second offset (worked out previously) and then executes the MPlayer command with our screenshot arguments. This will dump 2 files into our working directory (use chdir to set your working directory): “00000001.jpg” and “00000002.jpg”.</p>
<p>Now, the reason for using 2 frames instead of just one is that for certain video types (WMV mostly), the first shot that is taken is <strong>always</strong> blank. The dimensions are correct, but the screenshot is just black. This is fixed by taking two screenshots as the second shot always has the content there. This is a bit of a fudge but it gets around this bizarre little occurrence.</p>
<p>That’s the meat of the screenshotter, there’s a lot which has been omitted such as ensuring the passed file exists, renaming the output files (this is neccessary otherwise the files are overwritten each cycle) and unlinking the first (possibly blank) screenshot. From here you can probably work out things yourself however I have constructed a relatively petite script which I’m going to release under the <a href="http://creativecommons.org/licenses/by/2.5/">Creative Commons Attribution License</a>.</p>
<p><a href="http://chaostangent.com/wp-content/uploads/2006/08/screenshotter.txt">screenshotter.php.txt</a></p>
<p>The main variables you’ll want to edit are $mplayerCommand, $screenshotDir and $ssCount or $ssFreq. The script creates a subdirectory within $screenshotDir of the name of the file then pumps all of the screenshots into that directory, numbering them sequentially. I have a shortcut to the script on my desktop which I can then just drag video files onto (which does the neat thing of simply appending the absolute filename onto the end of the command line which simplifies usage immensely).</p>
<p>While you can set a frequency to take screenshots at, I would strongly recommend using an absolute count as a 24 minute video file with 1 screenshot a second gives you 1440 screenshots which can take quite a while to finish and, depending on your video, not all of the shots will necessarily be very good.</p>
<p>Ways forward for this include perhaps doing batch image adjustment (levels, sharpening) as well as automatic thumbnailing (something my <a href="http://gallery.chaostangent.com">gallery</a> already does and hence omitted from this script). The built-in GD would be more than adequete for something like this although ImageMagick is perhaps swifter and more powerful but would add further overhead to the otherwise neat package.</p>
<p>Fundamentally I’ve found that taking a number of screenshots then cherry-picking the best is really the only way this script is useful. It’s good for giving on overall view of a video file rather than specific scenes within a file, for that, the “take screenshot” shortcut key is still king.</p>
<p><strong>Addendum:</strong> The eagle-eyed amongst you will notice that the “shell_exec” command in the code above has a space between the “e” and “x” in “exec”, as far as I can tell the plugin I’m using to keep the code formatting breaks WordPress when I leave the command in full in-between &lt;code&gt; tags. Bad <a href="http://www.coffee2code.com/wp-plugins/#preservecodeformatting">Preserve code formatting</a>. <span class="signOff">¶</span></p>
]]></content:encoded>
			<wfw:commentRss>http://chaostangent.com/2006/08/screenshotter/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
