Image for 4 Common Machine Learning Pitfalls and How To Avoid Them

4 common machine learning pitfalls and how to avoid them

machine learning is one of the hottest topics in technology today — and for good reason.

It has tremendous potential for automating or semi-automating some of the most tedious tasks faced by knowledge workers — and leading tech companies are already starting to realize much of that potential.

For example, machine learning can help reduce manual toil on the following tasks by 50% or more:

We are about to unlock this value as machine learning applications become more widespread. A study by Algorithm discovers 76% of companies priority artificial intelligence (AI) and machine learning (ML) over other IT initiatives in 2021.

Yet, most machine learning initiatives fail. (Read also: The Promises and Pitfalls of Machine Learning

While there are countless reasons why ML pilots never get off the ground, the most pressing issues can be traced back to four key pitfalls:

  1. Lack business alignment
  2. Bad machine learning training practices.
  3. Data quality issues.
  4. Implementation complexity.

Let’s examine each of these and propose some solutions for data teams and organizations to avoid them.

1. Lack of Business Alignment

The original sin of machine learning lies in the way most of these projects are born.

Too often, a group of data scientists think of machine learning projects thinking, “This data is interesting; wouldn’t it be cool if…”

And it’s that mindset that turns ML projects into science experiments.

It may still be possible for the model to deliver something of value in these types of projects, but if the project doesn’t address an urgent and painful need, it won’t get the time or attention it needs from business stakeholders. Or worse, it could get a little closer blockchain: a cool technology in search of a problem. (Read also: An Introduction to Blockchain Technology

Machine learning projects should start by looking at the most pressing business priorities and then assess what resources are needed to solve them – instead of starting with the clean data at hand and then trying to find a problem they can solve.

Good questions to ask before starting a machine learning project include:

  • Is this problem urgent? According to the WHO?
  • Why is machine learning the right solution for this problem?
  • How will we define success?

2. Bad Machine Learning Training

Let’s say your project has a very difficult and valuable business problem in its sights. The next step is collecting enough clean data to train the model

Therein lies the paradox of the data scientist: to eliminate the toil for others, they must wallow in it.

According to Anacondadata scientists spend about 45% of their time data preparation tasks, including loading and cleaning data.

There is a good chance that after all this work there is just not enough suitable or representative training data. And, as with any manual task, the risk of human error is introduced. (Read also: Automation: the future of data science and machine learning?

Refining your ML model can also be challenging. It could be overfitwhere it learns too much, and subordinate, where it learns too little.

How can a machine learning model learn? at well, you ask?

There is a famous example from a model trained to distinguish between huskies and wolves. It was very accurate in training but started to fail in production. The problem? All the pictures of wolves had snow in the background and the huskies didn’t. It was a snow detection model – not a wolf detection model.

Unfortunately, machine learning training may be the only test on which you don’t want to score 100%.

3. Data Quality Issues

Be it training or implementation, it is impossible to have an effective machine learning model with bad data. Garbage in, garbage outas they say.

The challenge is that machine learning models are data-hungry. They always want more data, as long as it is reliable.

However, bad data can be introduced into good data pipelines in almost infinite ways. Sometimes it might be a noisy anomaly where the error is quickly noticed; other times it can be a gradual case of data drift that reduces the accuracy of your model over time. Anyway, it’s bad.

That’s because you built this model to automate or inform a painful business problem — so as accuracy decreases, so does confidence, and the consequences are severe. For example one of my colleagues spoke to a financial company that used a machine learning model to buy bonds that met certain criteria. Bad data took it offline and took weeks to get trusted to go back into production. (Read also: The Future of Fintech: AI and Digital Assets in Financial Institutions

The data infrastructure supporting machine learning models should be continuously tested and observed, ideally in a scaled, automated way.

4. Implementation Complexities

It turns out that a lot of resources are needed to implement and maintain machine learning in production. Who knows?

Well, Gartner did. It projects that by 2025AI will be the top category in infrastructure decisions, due to the maturation of the AI ​​market, resulting in a tenfold increase in computing requirements.

This requires a lot of support from business stakeholders, therefore business alignment is so important. For example, Atul Gupte, former Uber data product manager, led a project to data science workbench that data scientists used to make collaboration easier.

Data scientists were currently automating the process of validating and verifying employee documents required when applying for membership to the Uber platform. This was a great project for machine learning and deep learning, but the problem was that data scientists would routinely hit the limits of available computing power.

Gupte explored multiple solutions and identified virtual GPUs (then an emerging technology) as a possible solution. Although it came with a hefty price tag, Gupte justified the expenditure with leadership. The project would not only save the company millions, but it also supported an important differentiator in the competition.

Another example is how Netflix are . never moved award-winning recommendation algorithm in production, opting instead for a simpler solution that was easier to integrate. (Read also: How AI personalizes entertainment

How to avoid these pitfalls?

Don’t let these challenges stop you from launching your machine learning initiative.

Limit these risk factors by:

  • Obtain and often align stakeholder buy-in early.
  • iterate in DevOps fashion.
  • make sure you the right training data in place and check for quality before and after production.
  • Be aware of the limitations of the means of production.

As Tom Hanks says in “A League of Their Own,” “If it wasn’t hard, everyone would do it. It’s the hardness that makes it great.”

Leave a Comment

Your email address will not be published.