Tuesday, 30 November 2010

Some thoughts and analysis of the Wikileaks "Cablegate" situation

The current Wikileaks "Cablegate" case is interesting on a variety of IT Security topics, especially around access control, and the effect that Web 2.0 can have on confidential and secret information.

From a data security perspective, I feel that this is a relatively new situation that is catching the US government off-guard. Certainly it is a situation brought to a new level by the internet (and the public's desire to read about political gossip).

I will stay away from the politics, but here are some of my thoughts about the data, and the data security, in brief.

Very worrying for the US Government

This is a very worrying situation for the US Government. Data loss of this magnitude and media interest is unprecedented. I feel that this situation is likely to cause political diplomacy issues for some time to come.

I am sure that US data security will be tightened as a result (so some good will come of it) perhaps with new legislation. Meanwhile the information contained in the cables is getting frontpage coverage on every national newspaper worldwide.

The Wikileaks Cablegate data online

Wikileaks have certainly spent a lot of effort making the data available, with multiple mirrored websites.

The data is getting indexed (and cached) by Google, making searching very powerful if you know how to use Google operators.

For Google searching, it would be possible to use expressions like the following for example

...but then if you know your Google operators, you could probably combine that with something more focused.

As I said, the data is also being cached by Google (and other search engines) so if the sites were to be taken down, then it may still be possible to access the information by clicking on the "cache" links provided by Google.

It is interesting that data never really goes away on the internet, once it is out there, it is out there, and there would be no way of removing all of the instances of the data, no matter how hard someone tries.

Release of information rate, and data-cleansing

The release of new information is rather slow. I suspect this is due to the data-cleansing that Wikileaks are currently doing (they seem to be removing names of individuals from the data before publication, except for names of senior politicians and leaders, though I am not sure what the logic behind that is).

With just over 0.1% of the data currently hosted, and a small dribble of new data each day, I feel that it is going to take a massive amount of effort for Wikileaks to publish the data. They are talking about sanitizing a quarter of a million documents, an enormous task!

I would estimate it will take several years to release all the data at this rate. It's a huge project, and will take considerable time and resources to complete. Meanwhile, news keeps coming with each release, and some of the press have their own copy of the data (Though I think people will get bored of it within 6 months, if Wikileaks lasts that long...)

Denial of service

I saw a tweet earlier in the week (right after the launch of the Cablegate data) that Wikileaks described that their systems were under attack from a distributed denial of service attack (DDoS).

This may be the US Government, or third parties acting on behalf of, or completely independently of them. I am sure this situation will change rapidly over time, such is the nature of DDoS and defenses against it.

The cablegate.wikileaks.org platform

Interestingly, looking at the DNS records, I can see that Wikileaks are currently hosting part of the Cablegate site on Amazon EC2 (certainly one of their mirrors I checked the DNS for at is located at the Seattle Amazon EC2 location). Other servers are currently located in France.

If the US do decide to pull-the-plug on Wikileaks, then I would think it would be relatively straightforward for the current platform. Whether this could be classed as "political censorship of the web" is open for debate - but as I said, I will stay away from the politics.

Anyway, I feel it would be easier to take the current platform off-line by political pressure, negotiations with vendors, and legal means rather than DDoS.

However, it would likely just pop up somewhere else - such is the nature of the internet. Once the data is out there it will not be deletable.

Also, the small amount of data released so far is easily available via bit-torrent among other file-sharing methods.

Cable data

It seems that the cables are each quite small pieces of data, like telegrams. The cables are very compact and concise, apparently being typically around 1,000 words (around 6K each).

If this is representative of the rest of the data, then this would put 250,000 documents at a size of around 1.5GB. However, due to the nature of the data, I would say that it would compress at a good ratio (with Winzip for example) perhaps down to 500MB, so easily would have fit onto a memory stick, SD card, recordable CD or DVD.

In other words, with todays technology a quarter of a million cables are surprisingly portable. Perhaps this is how it was stolen (it has been suggested in the past that a CD-R format was used by Bradley Manning, and that this is where the data came from).

Need to know

Mitigations for this leak should have been better (my opinion). From a data governance perspective, It looks like there has been a huge failing in the application of "least privilege", i.e. the "need to know".

I have seen reports that suggest that up to 1 million US military, and law enforcement personnel may potentially have had the level of access required to view some of this data. If true, that is an incredible situation and no surprise that something this damaging has happened.

I doubt this is true, but in any case, access control and data loss prevention seem to have been lacking - I am sure this is currently in urgent review!


