The Myth and Mystery of Big Data

“With enough data, you can discover patterns and facts using simple counting that you can’t discover in small data using sophisticated statistical and machine learning approaches.” Link

I used to assume that big data and data mining and statistics were inseparable. But the reality–companies making a killing transforming data into value–is far from complex.

Big data is not hard. Statistics are not required. Neither are complex algorithms. Google’s Marissa Mayer attributed the company’s intelligence to the volume of data available for cross-referencing and not to clever algorithms. Google translate leveraged massive volumes of cross-referenced text in multiple languages rather than a finely tuned understanding of grammar. Voice translation uses much the same technique based on huge volumes of recorded, transcribed text.

Right now our two best tools are visualization and data exploration (business discovery). Both are simple, easy to demonstrate and easy to grasp. The big data revolution’s message to the masses is that simple correlation will outstrip them both as long as enough data can be crunched. And much of this can be automated, pre-calculated, and even anticipated. Imagine the analysis system analyzing itself: these people tend to ask these questions at these times!

Data can be correlated post-hoc. Correlation does not equal causation, but simple correlation is ample evidence on which to take action. Correlation is immediately perceived visually. Correlation is relative and easy to compare. Correlation can look at 2, 3, 4 or more factors at once. Correlation is business friendly. It is easily understood. Correlation is gut-instinct compatible. Kids understand it: mom gets upset when I put peanut butter on the cat. If I do it right now, she’ll probably be mad.

The business opportunity is really that so much big data is simply thrown away. The opportunity to store all this data didn’t exist, so we have an old habit of simply letting it vaporize. Every server message, every website click, every customer contact and interaction, every manufacturing activity, temperature, timeclock action, phone call received, phone call placed, security video, email sent. Every bit of data can be analyzed, and from multiple perspectives: employee, employer, customer, vendor, shipper, receiver, and on and on.

We don’t know what we’ll find. As more and more stories of big data at little(er) companies emerge, the snowball will become an avalanche.

No related posts.

One thought on “The Myth and Mystery of Big Data

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>