Machine Learning at Scale
In this article, I will write about my take on building machine learning models at scale.
I have started learning machine learning when the pandemic has just begun and I had nothing to do sitting at home. As a learner, it's easy to find resources on the internet and kick start your machine learning journey.
And now I have also started my corporate journey as a Data Engineer working on Bigdata and Machine learning.
So why am I talking about all this? I just want to point out the difference between building Machine learning models when you are a student and Building enterprise applications. What is the major difference?
It's the SCALE. Can the models I build as a student do well with terabytes of data? The answer is a big NO.
Why isn't my machine learning model scalable?
Let us say I have built a machine learning model to recommend movies based on what the user has searched. The model works perfectly alright for 6 months.
Now here are a few challenges we might face after six months
- How do you update the dataset with new movies released?
- Does your model work well with all types of movies?
- will your model work if the number of requests is increasing?
- How do you monitor your model?
- What if your friend wants to add a few changes to the model? Is there any version control available similar to the software engineering projects?
- If your model is not scalable how will you deliver business value?
Well, there are many other problems that arise when your application needs to be scaled.
I can handle all of this manually by revisiting all the steps I have done during development. Isn't that a herculean task? It is.
So what's the solution? Machine Learning Pipeline
Need for ML Pipeline
Once teams move from a stage where they are occasionally updating a single model to having multiple frequently updating models in production, a pipeline approach becomes paramount. In this workflow, you don’t build and maintain a model. You develop and maintain a pipeline.
Let us compare Manual Cycle vs Machine Learning Pipeline
Manual Cycle of Machine Learning
Characteristics of a manual ML pipeline:
- The model is the product
- Manual or script-driven process
- A disconnect between the data scientist and the engineer
- Slow iteration cycle
- No automated testing or performance monitoring
- No version control
Machine Learning Pipelines for Handling ML projects
Characteristics of an automated ML pipeline:
- The pipeline is the product
- Fully automated process
- Co-operation between the data scientist and the engineer
- Fast iteration cycle
- Automated testing and performance monitoring
Maintaining a machine learning pipeline helps you scale the model and deliver business value which ultimately is the main goal of a machine learning project.
Transitioning from a manual cycle to an automated pipeline may have many iterations in between depending on the scale of your machine learning efforts and your team composition. Ultimately, the purpose of a pipeline is to allow you to increase the iteration cycle with the added confidence that codifying the process gives and to scale how many models you can realistically maintain in production.
Ps: I have written this blog after reading an ebook on MLops by Valohai. And I literally loved the concept of ML ops. Hoping to write many articles on the same.
If you found this article helpful. Consider clapping and follow me on medium.