Why Data Science Projects Fail

In the Spring 2021 Issue of the MITSloan Management Review, “Why So Many Data Science Projects Fail to Deliver”*, the authors** list five common obstacles to be overcome to gain business value in an organization’s advanced analytics projects. These five mistakes are:
1. The Hammer in Search of a Nail
2. Unrecognized Sources of Bias
3. Right Solution, Wrong Time
4. Right Tool, Wrong User
5. The Rocky Last Mile
These mistakes can be summarized:
1. An amazing algorithm that fails to deliver business value is not a good solution.
2. Unfamiliarity with their data sources can lead to biased data and inaccurate results.
3. Asynchronization between data projects and the business’ changing priorities can make the results irrelevant.
4. A lack of attention to how the results are used and conveyed to end users can negate the usefulness of the results.
5. An inability to involve the business user to help train the models over time can discourage participation.
The authors state that “The mistakes we identified invariably occurred at the interfaces between the data science function and the business at large.”
Our focus is on one particular class of data science project – that of building machine learning models.
We suggest that all five of these problems have one common cause and one common solution.
The real problem is that, typically, Machine Learning Models are built by data scientists, not business users.
That the solution is to allow the business user, NOT the data scientist, to build and train the models.
That the business user will have no special love of algorithms – they have a problem
to solve. (#1)
The Business User is most familiar with the data sources (#2), keeps the project and the priorities aligned (#3), knows how to use the results (#4), and knows best how to train the model (#5.)
Another way to look at this problem is this:
“Not everything that can be counted counts. Not everything that counts can be
counted.” – William Bruce Cameron.

A lack of familiarity with what counts can doom projects to failure.  Inclusion of everything countable can produce very noisy results.

In a February 13, 2021 post Why Big Data Science & Data Analytics Projects Fail, Data Science Project Management lists three very alarming statistics:

  • 85% of big data projects fail (Gartner, 2017)
  • 87% of data science projects never make it to production (VentureBeat, 2019)
  • “Through 2022, only 20% of analytic insights will deliver business outcomes” (Gartner, 2019)

This is not a great batting average.

Why the low rates of success?

Once again, the root cause is the disconnect between the business user and the data scientist.  To properly understand the requirements takes time – which can result in a project’s irrelevance.  To train old-school Machine Learning models takes time – and tens of thousands of examples – of sample data.

Instead of trying to help the data scientist think like a business user – why not allow the business user to do the work of a data scientist?

It’s only within the last eighteen months that this has become possible in the cognitive document processing and cognitive image processing area – allowing business users to extract required data from documents and images, making it machine-readable, and passing this digitized data to other systems for further processing.

The ideal process:

  1. Provides an intuitive UI that allows users to select the relevant data.
  2. Requires a handful of sample documents.
  3. Includes machine self-training and adaptive learning.
  4. Allows human refinement of machine training.

(We’ll discuss the technology of intelligent document processing and intelligent image processing – Natural Language Processing backbone models, transfer learning, deep neural networks, and Computer Vision – in subsequent articles.)

There will always be projects requiring data scientists, who are in very short supply right now.

Cognitive document processing or intelligent document processing – data extraction not by scanning but by reading the document – is not one of those projects that should require an army of data scientists.  Allowing the business user to act as their own data scientist avoids the most common cause for the mistakes made in failed data science projects. 

About the Author