There are a few misconceptions floating around about data and its role in business. Unfortunately, this has led to the buzzwordification of terms like "big data," "machine learning," and especially "artificial intelligence." This is particularly challenging for those of us whose job titles encompass all three. Yes, I’m looking at you, fellow data scientists.
More and more corporations are sitting on mountains of data. And while I don’t disagree with the long-term notion that data is to artificial intelligence what oil was to the automobile, the near-term impacts can be less clear. It’s very rare that simply owning hideously large first-party datasets will yield instant profits.
For the rest of us, sound practices and robust methodologies using an ever-growing toolbox are required to make our datasets comprehensible, let alone actionable. In the early days of data science at Compass Digital Labs, my team was tasked with writing data-cleansing algorithms to do exactly this. By data cleansing, I’m not referring to traditional ETL (Extract-Transform-Load) in the context of database management, but machine learning algorithms to groom (e.g., reducing naming redundancies, fixing typos, classification/categorization, etc.) extremely large datasets that were established with little governance. I’d be remiss if I didn’t mention that this problem is not isolated to my employer: it’s very challenging to set up a perfect data strategy today without knowing what the downstream uses of data will be a dozen years from now.
So we wrote a suite of machine learning algorithms to clean our data. With this, we were able to offer descriptive reporting back to the company. This not only boosted our Business Intelligence department’s dashboards, but also sparked questions from the company we couldn’t answer just yet. Our state of data cleanliness is always far from binary – we’re constantly working on extracting more from it by enhancing our data-cleansing algorithms (a process known as data mining).
The real tipping point, however, was when our data was clean enough to then be fed into other machine learning models. It’s crazy, I know – using machine learning to clean data so that it can be used for more machine learning! But with this strategy, we’re able to produce forecasts and company-wide prescriptive suggestions, as well as hyper-local ones, pertaining to all aspects of retail, operations, and labor.
But despite all of this, good business decisions still require human expertise. Running a fine-toothed comb through exabytes of data, refining state-of-the-art algorithms, and creating prescriptive forecasts/case studies are all part of the data-driven decision-making process. Sometimes it’s as easy as asking the data (or more accurately, the algorithms) to tell me what my next move should be as a decision maker. But more often than not, decision makers will be presented with a number of scenarios and their respective likelihoods of success. And it’s these situations that make it very clear that, despite all our advances, it's still a human-run world (in our context, at the time of this writing), albeit one that has been heavily augmented with cutting-edge technology. A data-driven approach alone doesn't ensure success, but the lack of one over a long enough period of time will guarantee failure.