Dr. Augustine Fou's Online Scrapbook: Big Data, Big Problems: The Trouble With Storage Overload [Memory Forever]

Source: http://gizmodo.com/5495601/big-data-big-problems-the-trouble-with-storage-overload

We collect an astonishing amount of digital information. But as the Economist recently pointed out in a special reports, we've long since surpassed our ability to store and process it all. Big data is here, and it's causing big problems.

Walmart's transaction databases are a whopping 2.5 petrabytes. There are more than 40 billion photos hosted by Facebook alone. When there's this much data floating around, it becomes nearly impossible to sort and analyze. And it's only expanding faster: the amount of digital information increases tenfold every five years.

We've also running out of space. The Economist reports that the amount of information created will more than double the available storage by 2011.
Big Data, Big Problems: The Trouble With Storage Overload

And the data we can store becomes more and more difficult to sort for future generations of researchers and businesses.

This may not seem like such a huge deal, but take a more recent, practical example. To produce the definitive word on the Lehman Brothers bankruptcy, court-appointed examiner Anton R. Valukas had to sift through 350 billion pages of electronic documents. That's three quadrillion bites of data. So how'd he look through all that information?

Simple. He didn't. Instead, loose search parameters were used to cut the number of emails and documents roughly in half, then teams of lawyers pared down what was left to a "manageable" 34 million pages. Valukas's final report was an expansive 2,200 pages long, but there's no way he was able to process all of the relevant documents, or that he was able to tell the whole story.

If there's hope to be found, it's in metadata. Much like library cards kept you from having to read every book, Google arranges your search queries and Flickr your photos. Even the tags on Gizmodo make it more manageable to find relevant content. But while metadata gives things searchable labels, the fact that it's often crowd-sourced means that those labels are at best inconsistent and at worst incomprehensible.

We've also made some advances visualizing big data, a relatively new field simply because it's only recently become a necessity. Whether graphing stock market data or turning large chunks of text into word clouds, it's imperative that we find ways to look at data that our brains can process more easily than they can long strings of raw information:

The brain finds it easier to process information if it is presented as an image rather than as words or numbers. The right hemisphere recognises shapes and colours. The left side of the brain processes information in an analytical and sequential way and is more active when people read text or look at a spreadsheet. Looking through a numerical table takes a lot of mental effort, but information presented visually can be grasped in a few seconds. The brain identifies patterns, proportions and relationships to make instant subliminal comparisons.

Processing information through images becomes ever more important if we ever hope to keep up with it.

We have a more thorough record of our lives and the world around us now than we ever have before. We can map the human genome in a week, for goodness sake. All of which is wonderful! We should absolutely be leaving behind as much of a record of our existence as possible. But we should also figure out how to manage it, and present it, before big data balloons totally out of our control. [Economist]

Wednesday, March 17, 2010

Big Data, Big Problems: The Trouble With Storage Overload [Memory Forever]