Looking up at the night sky, it’s relatively straightforward to make the translation from sky to paper. Stars are perceived to be the same distance away from you, making connecting the dots a simple process.
But what if you fly out beyond the stars? How would they look then? Showing actual distance the stars transition, and the easy-to-distinguish constellations are practically unrecognisable.
This is what context can do. It can completely change your perspective and this is true whether it is the stars or your dataset. It can help you map out constellations or can help you decide what numbers represent and how best to interpret them. Without context, data is useless, and any visualisation you create with it will be useless also.
Using data without knowing anything about it, other than the values themselves, is like hearing an abridged quote secondhand and then citing it as a main discussion point in an essay. It might be okay, but you risk finding out later on that the speaker meant the opposite of what you thought. Suddenly your essay is deemed “off point”.
Essentially, you have to know the who, what, when, where, why, and how – the metadata or the data about the data – before you can begin to understand the huge amounts of data you’ve collected. You need to know what to look for in your data, what to do with it and what tools to use.
Getting to know your data
“Big data” is a big buzzword and while it promises the feeling of opportunity, it’s essentially the same principal companies have been using for years – find insights into your company by leveraging data you have stored in files. “Big data” is not a new-age solution, it merely refers to the extremely large pools of data that companies have stored today. Data is still data, but today there is so much more of it. Sure, this means there is more opportunity to gain from it, but it also means there is more white noise, making understanding it a little harder.
There is one special difference with big data, however, and that is how quickly big data can be analysed. In the past, surveys and their processing would take weeks, whereas today, aggregating social media data, search data and other forms of big data offers real time results. As a result of big data, the ability to react to the market and make decisions has changed dramatically.
An example of this is Google, who with the right tools and analysis of big data managed to use its stored searches to accurately predict flu outbreaks across the US, faster than the Centers for Disease Control (CDC).
With big data, we can simply look for patterns that correlate with real world phenomena. Google didn’t check for symptoms; it didn’t use medical training. It didn’t use highly trained medical experts. It merely searched for a pattern using vast amounts of data to produce a result just as accurate as the CDC did, just in half the time. It’s all about the right tools and strategies that make data worthwhile.
So how do you start understanding data?
1. Ensure data quality
Poor data quality means higher marketing costs and the potential to seriously misunderstand the true profile of an important customer. Quality data means being alert for stale entries, bogus addresses, and duplicate records.
2. Measure success on metrics that matter
Instead of churning through what-if scenarios until you stumble across a pattern that provides a satisfactory response to a problem, work from hypothesis to test to conclusion. This means researching and planning how your data should perform and how success can be evaluated, and requires your continued effort. What’s the point in fudging the formula halfway to the target or trying to score dozens of metrics with no understanding of how they relate to each other?
3. Be active
Data should not be static. It should instead lead to an immediate course of action, by tying analytical results with workflow. When you continue to reevaluate, you can correct errors in initial assessments and make adjustments as facts on the ground change.
4. Know the three basic challenges
Big data results in three basic challenges: storing, processing and managing it effectively. Many people don’t realise that the vast majority of big data is either duplicated data or synthesised data. Realising this, your first step should be to bring data down to its unique set and reduce the amount of data to be managed. This in turn creates a smaller data footprint, which results in more accurate data analysis.
Virtualisation is the “hero” when it comes to managing big data. It gives end-users flexibility, lower costs and freedom from IT vendor lock-in, and leverages the power of multiple applications using the same data footprint.
Big data isn’t necessarily better, but it is different. In order to unlock its full potential, we will have to make serious changes to how we think, manage and operate.