Machine Learning at Scale

3 min readJan 27, 2022

In this article, I will write about my take on building machine learning models at scale.

I have started learning machine learning when the pandemic has just begun and I had nothing to do sitting at home. As a learner, it's easy to find resources on the internet and kick start your machine learning journey.

And now I have also started my corporate journey as a Data Engineer working on Bigdata and Machine learning.

So why am I talking about all this? I just want to point out the difference between building Machine learning models when you are a student and Building enterprise applications. What is the major difference?

It's the SCALE. Can the models I build as a student do well with terabytes of data? The answer is a big NO.

Why isn't my machine learning model scalable?

Let us say I have built a machine learning model to recommend movies based on what the user has searched. The model works perfectly alright for 6 months.

Now here are a few challenges we might face after six months

How do you update the dataset with new movies released?
Does your model work well with all types of movies?
will your model work if the number of requests is increasing?
How do you monitor your model?
What if your friend wants to add a few changes to the model? Is there any version control available similar to the software engineering projects?
If your model is not scalable how will you deliver business value?

Well, there are many other problems that arise when your application needs to be scaled.

I can handle all of this manually by revisiting all the steps I have done during development. Isn't that a herculean task? It is.

So what's the solution? Machine Learning Pipeline

Need for ML Pipeline

Once teams move from a stage where they are occasionally updating a single model to having multiple frequently updating models in production, a pipeline approach becomes paramount. In this workflow, you don’t build and maintain a model. You develop and maintain a pipeline.

Let us compare Manual Cycle vs Machine Learning Pipeline

Manual Cycle of Machine Learning

Characteristics of a manual ML pipeline:

The model is the product
Manual or script-driven process
A disconnect between the data scientist and the engineer
Slow iteration cycle
No automated testing or performance monitoring
No version control

Machine Learning Pipelines for Handling ML projects

Characteristics of an automated ML pipeline:

The pipeline is the product
Fully automated process
Co-operation between the data scientist and the engineer
Fast iteration cycle
Automated testing and performance monitoring
Version-controlled

Maintaining a machine learning pipeline helps you scale the model and deliver business value which ultimately is the main goal of a machine learning project.

Transitioning from a manual cycle to an automated pipeline may have many iterations in between depending on the scale of your machine learning efforts and your team composition. Ultimately, the purpose of a pipeline is to allow you to increase the iteration cycle with the added confidence that codifying the process gives and to scale how many models you can realistically maintain in production.

Ps: I have written this blog after reading an ebook on MLops by Valohai. And I literally loved the concept of ML ops. Hoping to write many articles on the same.

If you found this article helpful. Consider clapping and follow me on medium.

Let’s connect via Linkedin or Twitter for any further discussions on big data and ML

References

1.https://valohai.com/machine-learning-pipeline/

Machine Learning at Scale

Written by avs sridhar