December 10, 2012 at 8:55 pm (interesting, odds, science) (, , , , , )

There is a lot of talk of “big data” – but I quite like the idea that big data means “more data than you have the computing power to process”.  And that isn’t new.  I particularly like this talk by John Graham-Cumming about big data – describing a big data problem they encountered … in the 1950’s.  The blurb for the conference describes it thus:

It’s 1951 and you’ve got the world’s first business computer and you’ve just been handed a Big Data problem. Go! With 2K of memory it was  powerful enough to run the then massive Lyons business.  But it wasn’t long, in 1955, before Big Data came calling in the form of a request from British Rail to calculate the shortest distance between every one of their 5,000 railway stations.

So why mention it at all?  Well there is an interesting discussion going on at the moment that we might soon be running out of metric units to describe big data.  Andrew McAfee’s blog describes the problem:

Yotta- , signifying 10^24, is the only metrix prefix left on the list. Only 20+ years ago, we didn’t anticipate needing anything beyond yotta. It seems safe to say that before the current decade is out we’ll need to convene a 20th conference to come up with some more prefixes for extraordinarily large quantities not to describe intergalactic distances or the amount of energy released by nuclear reactions, but to capture the amount of digital data in the world.

Yotta?  See wikipedia for the full list:

  • kilo = 1,000
  • mega = 1,000,000
  • giga = 1,000,000,000
  • tera = 1,000,000,000,000
  • peta = 1,000,000,000,000,000
  • exa = 1,000,000,000,000,000,000
  • zetta = 1,000,000,000,000,000,000,000
  • yotta = 1,000,000,000,000,000,000,000,000

Yes, that is 1 followed by 24 zeros.  But even that might not be enough.

So what is being considered?  Well some have suggested hella for 1 followed by 27 zeros, but I think that is missing a great opportunity.  I think it should be helluva.  Then we can have distances that are a helluvameter, really heavy things that are a helluvagram and if you are into really big data then obviously you need storage that has a helluvabyte in it.

But, seeing as Google already recognises hella, we might have missed that chance.  But then Google also already knows about googols too.



Permalink Leave a Comment

Dark, Unexposed Corners of the Internet

October 3, 2012 at 9:50 pm (computers, interesting, internet) (, , , , , , , )

I’m almost finished reading through ‘The Geek Atlas‘, which will probably be the subject of a post of its own at some point, but for various reasons I was led to the author’s website and blog.  There were two very interesting (to my geeky side) recent posts that I found fascinating to follow up.

The first, is a recording of a recent keynote speech given by the author on the issue of ‘big data‘.  This is a bit of an IT buzzword for some reason this year (a bit like ‘cloud’ last year) but the keynote is all about the fact that you can basically pick a point in time, and big data will always mean ‘more data than I can handle with the machinery I currently have at my disposal’.  It describes the issues faced by some engineers tasked with calculating the distances between stations in the British Rail network – they had 9 months to come up with an answer – and this was in 1955.  It is a fascinating talk – I recommend it.

The other one that caught my eye, relates to the recent announcement that the body that oversees allocation of Internet address for Europe is down to its last few (few in this case being approx 16m) and we are rapidly running out.  He noticed that there are various bits of UK government that appear to be sitting on major chunks of unused address space.

Now, working in IT, I know what a major pain and effort it will be to free-up any of these already allocated addresses, so wasn’t really expecting the government to suddenly experience a £500m-£1.5bn windfall from this.  I also know that the first major call to be a ‘good Internet citizen’ and return unused addresses was actually made in 1996 (in the shape of RFC 1917):

“This document is an appeal to the Internet community to return unused address space, i.e. any block of consecutive IP prefixes, to the Internet Assigned Numbers Authority (IANA) or any of the delegated registries, for reapportionment.”

So, over 15 years later anyone easy returns would probably have happened by now.  However what has been interesting in this recent case is seeing geeky, interested, members of the public using Freedom of Information as a means to prod said gov departments to find out what these are used for.

First – the UK MOD has – the response:

“I can confirm that the IPv4 address block about which you enquire is assigned to and owned by the MOD; however, I should point out that within this block, none of the addresses or address ranges are in use on the public internet for departmental IT, communications or other functions.  To date, we estimate that around 60% of the IPv4 address block has been allocated for internal use. As I am sure you will appreciate, the volume and complexity of the Information Systems used by the Armed Forces supporting military operations and for training continues to develop and grow.    We are aware that the allocation of  IPv4 addresses are becoming exhausted, and the issue has been recognised  within the Department as a potential future IS risk.”

Then the UK DWP – – the response:

“DWP have no plans to release any of the address space for use on the public Internet. The cost and complexity of re-addressing the existing government estate is too high to make this a viable proposition. DWP are aware that the worldwide IPv4 address space is almost exhausted, but knows that in the short to medium term there are mechanisms available to ISPs that will allow continued expansion of the Internet, and believes that in the long term a transition to IPv6 will resolve address exhaustion. Note that even if DWP were able to release their address space, this would only delay IPv4 address exhaustion by a number of months.”

So no – too expensive to release them, and as stated above, it only prolongs the agony very slightly anyway.  However, I do wonder how many other corners of the Internet are actually ‘dark’ like this and will never actually be connected.

Maybe we will do better with IPv6 allocations – even a home user will get an allocation that is larger than the current Internet, but the authors of RFC 3177 make the argument that it is fully justified (especially as they have room for around 35 trillion of these):

“… based on experience with IPv4 and several other address spaces, and on extremely ambitious scaling goals for the Internet amounting to an 80 bit address space *per person*.  Even so, being acutely aware of the history of under-estimating demand, the IETF has reserved more than 85% of the address space (i.e., the bulk of the space not under the 001 Global Unicast Address prefix).  Therefore, if the analysis does one day turn out to be wrong, our successors will still have the option of imposing much more restrictive allocation policies on the remaining 85%.”

So there is quite a large margin for error, even compared to the decision back in the 1970s to allow for 4 bn addresses for the current Internet at a time when there were only a handful of computers to be connected.

As I say an interesting interplay about a topical, if geeky, infrastructure issue.

BTW – you can see both blocks in the top left hand quarter of the xkcd Internet map (labelled 25 UK MoD and 51 UK Social Security).


Permalink Leave a Comment