What does half a decade of email look like?

8:20pm, 2nd February 2004

I have here nearly a hundred megabytes of email from the last 5 years, so I thought I’d have a look at it. But how do you look at something that big? Data visualisation to the rescue! If I plotted each byte as a pixel, I’d have a 125-screen wide bitmap, so instead I plotted the average value of each 16×16 block as a single pixel. And when the Python Imaging Library finished, I had a revelation; a picture! A picture of this:

Image of 5 years' email
Click for biggitude

The image is mostly grey because each pixel is an average of 256 values; since null and high-ascii bytes are rare, they get lost in a sea of averageness. Art imitates life, eh? The latter 20% or so is spam from the last 2 months, which I keep as fuel for Bayesian magic.

Of course, please excuse the crudity of this image, I didn’t have time to colour it.