Why Data Science Projects Fail
- BY JIM WELDY
A lack of familiarity with what counts can doom projects to failure. Inclusion of everything countable can produce very noisy results.
In a February 13, 2021 post Why Big Data Science & Data Analytics Projects Fail, Data Science Project Management lists three very alarming statistics:
- 85% of big data projects fail (Gartner, 2017)
- 87% of data science projects never make it to production (VentureBeat, 2019)
- “Through 2022, only 20% of analytic insights will deliver business outcomes” (Gartner, 2019)
This is not a great batting average.
Why the low rates of success?
Once again, the root cause is the disconnect between the business user and the data scientist. To properly understand the requirements takes time – which can result in a project’s irrelevance. To train old-school Machine Learning models takes time – and tens of thousands of examples – of sample data.
Instead of trying to help the data scientist think like a business user – why not allow the business user to do the work of a data scientist?
It’s only within the last eighteen months that this has become possible in the cognitive document processing and cognitive image processing area – allowing business users to extract required data from documents and images, making it machine-readable, and passing this digitized data to other systems for further processing.
The ideal process:
- Provides an intuitive UI that allows users to select the relevant data.
- Requires a handful of sample documents.
- Includes machine self-training and adaptive learning.
- Allows human refinement of machine training.
(We’ll discuss the technology of intelligent document processing and intelligent image processing – Natural Language Processing backbone models, transfer learning, deep neural networks, and Computer Vision – in subsequent articles.)
There will always be projects requiring data scientists, who are in very short supply right now.
Cognitive document processing or intelligent document processing – data extraction not by scanning but by reading the document – is not one of those projects that should require an army of data scientists. Allowing the business user to act as their own data scientist avoids the most common cause for the mistakes made in failed data science projects.