In this first blog of a three-part series, we'll delve into the top five reasons for stalled or unsuccessful data science projects. For the rest of the series, see the links below:
One of the biggest issues data scientists face is a lack of clear business objectives. Symptoms that indicate this gap include:
Whether the organization has failed to be clear about its objectives or the data scientists are lacking the skills to define the problem and success criteria, the result is an inability to develop a good hypothesis and, therefore, a productive model.
An effective delivery methodology is the foundation necessary for successful data science teams. Does your team know if a model is good enough to be used in production? Does your team have a way to operationalize the model in production? If not, those are clear signs that your data delivery methodology lacks maturity. Reliance on a complicated decentralized data pipeline can result from the lack of top-down strategy and sponsorship.
If stakeholders are anticipating a working model after a two-week sprint, then the organization has not been educated on the inherently unpredictable nature of data science work. An organization that is anticipating the same milestones and timelines that they are accustomed to from a software development team is going to be left disappointed due to the iterative and unpredictable nature of the data science lifecycle and inconsistency of outcomes. Additional pain points incurred when some are lacking knowledge of the data science lifecycle include inconsistent use of terminology among data, software, and DevOps engineers, missing dependencies between the data science and platform teams, and having stakeholders who are resistant to making decisions based on model predictions.
Lacking data at the start of a project, poorly structured data, inconsistencies in the data, and dirty data that requires cleansing are going to block your ability to deliver working models effectively. The five V’s of data (velocity, volume, value, variety, and veracity) need to be understood or at the very least discussed prior to the start of any data science project.
A shortage of data scientists is impacting everyone in the IT industry. Even if you can fill a team with top-notch statisticians and MLOps engineers, you are likely still lacking the required skill sets for a successful data science project. Business domain knowledge, pipeline expertise, and the soft skills to uncover the business problem to be solved are all critical as well. Maturing your ability to deliver also requires your team to understand agility and the skills necessary to shorten feedback loops.
Investing time evaluating where your organization may be falling short can be extremely valuable in helping you begin to chart an informed path forward. A mapping exercise like the one below can help unlock new levels of visibility around your own data science projects. It's up to you and your team to identify where your organization has the most opportunity to grow.
In the next two parts of this series, we'll look at how to improve your data science project maturity and discuss data science teams and agility.