It is dirty work, this data business

Data Analysis

 “I’m an oil man, ladies and gentlemen. I have numerous concerns spread across this state. I have many wells flowing at many thousand barrels per days. I like to think of myself as an oil man”. – Daniel Plainview, There Will Be Blood

Data is the new oil; many people acknowledge that.

However, while no one would imagine trying to drill for oil without the proper expertise, the same is not true when it comes to drilling for insights. All too often, the horror scenario that gets played out across too many organizations is that a manager becomes aware of the importance of data. Said manager then assigns the task to a member of staff who may, or may not, have the skillset or the ability to learn the necessary skills. The staff member with all the best intentions may find great insights (accurate, meaningful), or they may not.

More importantly, organisations jump into the data-dashboard-insights-visualisation field demanding the sexy, shiny final product, without wanting to spend time and money doing the groundwork. It’s like demanding plastics without building refineries and factories that enable petrochemicals to be turned into plastics.

The dirty work of data cleanup

The most unsexy but necessary work associated with data analysis is getting and preparing the data. It is very rare to get data from just one source. Most of the time, data comes from multiple sources, requiring copious amount of time to integrate into a single source. Even when data comes from the one database, it still needs to be checked, cleaned and prepared because missing, corrupted or just incorrect data is all too common. And there is nothing seemingly glamorous about this task – even though cleaning up data so that it’s consistent and reliable – is the foundations of the data analysis structure.

Do you have the skills needed to clean, check and integrate your data? Do you have the patience? Can you explain to management why 90% of the perspiration has been spent on data cleanup? Are you sure, before you dive into data analysis that your input isn’t garbage? Garbage in, garbage out definitely holds true for data analytics. If data isn’t properly cleaned and prepared, the analysis of it can be worthless.

The analytical pyramid

Analytics come in four distinct types: descriptive, diagnostic, predictive, and prescriptive. Each builds off the other. As shown in the figure 1, it is a pyramid with each level supporting the next. Briefly,

  1. Descriptive analytics is about what happened and allow analysis of past performance data to identify known strengths and weaknesses
  2. Diagnostic analytics is about why something happened and reveals what factors drive positive and negative performance
  3. Predictive analytics is about what could happen that is extrapolated based on past performance, and finally,
  4. Prescriptive analytics is about what should happen and is about abducting future decisions based on past data. Prescriptive analytics is heavily reliant on artificial intelligence and machine learning mining past data to inform future decisions.

A big mistake that companies often make is that they prioritise a single tier. If one of the top tiers is prioritised before securing the bottom tiers, the analytical process becomes unstable. If only descriptive analysis is attempted, then benefits from higher tiers of more insightful analysis is missed.