Wednesday, January 2, 2013

'Big Data' may be so big it helps protect our privacy

The Register suggests that the sheer quantity and volume of data being collected about us may serve to protect our privacy, because we simply can't analyze it all.

We can't seem to get enough of Big Data. In its Digital Universe in 2020 report (PDF), IDC forecasts Big Data-related IT spending to rise 40 per cent each year between 2012 and 2020, as the digital universe, now at 2.8 zettabytes (ZB), or 2.8 trillion GB, explodes to 40 ZB.

That's very, very Big Data. It's a pity, therefore, that we currently analyse a mere 0.5 per cent of it all.

Not that all of these data are useful. IDC expects that by 2020, just 33 per cent of the world's data will be useful if analysed. But the delta between today's 0.5 per cent of actually analysed data and 33 per cent that could be useful if analysed is unlikely to get dramatically better. We like to think of ourselves as hyper-analytical, what with our quantified selves and "measure everything" approaches to business.

But, as I've argued before, we're actually quite inept at analysing data, be the data big or small.

Not only are we bad at regulating our intake of information, to paraphrase Nick Carr, but we're also really bad at separating signal from noise. In hindsight, we think we see clearly, but even then we tend to miss the point.

There's more at the link.

So we're only analyzing about half of one per cent of all data collected?  That makes me feel a lot more comfortable about privacy . . . even though I wish the data weren't being collected at all!



AJ said...

A true-ism I rather like: "Data isn't information, information isn't knowledge, knowledge isn't wisdom." Easy to see the disconnect between data & wisdom. Also easy to overlook steps along the way.

Borepatch said...

The bit about how hard it is to separate signal from noise is quite insightful. The successes that have been had are almost always very, very simple cases. What people dream about in their pie-in-the-sky plans is massive correlation to flag "grey" areas.

What that means in the Real World is a massive false positive rate. So many things get flagged for follow up that the system gets turned off.

trailbee said...

By giving us a year, 2020, I feel better. I have been waiting for the world authoritarians to shut down our access to cyber communication. In addition, to have so much info as to not be able to analyze it all is sweet! Many thanks.

Anonymous said...

So, then, the ultimate defense of our privacy may lie not in hiding our data, but generating even more trivia on ourselves. I see a program that generates near-gibberish about a person...