Defining Machine Learning and MLOps
In the tech world, terms like “data science” often have varied interpretations. In this blog, I’ll define and describe my view of Machine Learning (ML) and MLOps.
What is Machine Learning?
Machine learning is a subset of artificial intelligence where computers learn from data, to make predictions or decisions without being explicitly programmed. It helps businesses automate processes, gain insights from large datasets, and improve decision-making by identifying patterns and trends. This leads to increased efficiency, cost savings, and other competitive advantages.
Machine learning involves a two-step process. First, a machine learning model needs to be trained. Using a training algorithm and applying it to training data, the model learns to identify patterns and trends. Second, the trained model is used to make predictions for new, previously unseen data.
In a simplified example, a machine learning model is trained using historical sales data, correlating user and product features to actual sales. Then, the model can be applied to a user visiting a webshop, to identify (predict) the products that are most likely to be sold. These products are then recommended to the user, in the hopes of increasing the probability of a sale.
What is MLOps?
MLOps, which stands for Machine Learning Operations, is a set of practices to make the development, deployment, and maintenance of machine learning models easier. It integrates the principles and best practices of DevOps (and Agile software development) with machine learning, creating a scalable development process that accelerates time-to-market and enhances model quality. There are two complementary sides to MLOps, a technical side and an organisational side.
On the technical side, we build an MLOps platform giving full autonomy and independence to data scientists and ML engineers. With data being the essential input for machine learning, an MLOps platform is typically built on top of an existing Data Platform (including a data lake or data warehouse for example). In MLOps, we automate the training and deployment of ML models in what we call ML pipelines.
Viewing the entire ML Pipeline as a software product to which we can apply DevOps, we automate the deployment of these ML pipelines themselves using traditional CI/CD pipelines. Once deployed, the ML Pipeline takes care of training, retraining and deploying the ML Model in the production environment. Moreover, MLOps emphasises continuous improvement through automated testing and performance monitoring of models in production, ensuring they meet quality and performance standards.
The final step in the machine learning process is using a trained model to make predictions. In this step, new feature data is loaded and transformed into prediction data. In general, data transformations are the domain of Data Pipelines on the Data Platform. In an upcoming blog, I will argue that Data and ML Pipelines should be kept entirely separate, and I believe that using a trained model to make predictions should be part of a Data Pipeline. However, using a trained model to make predictions is closely related to the training process in the ML Pipeline, as it involves the same data loading and preparation steps. Therefore, every MLOps solution should include both an ML Pipeline and a method for making predictions specific to the model being trained.
On the organisational side, MLOps requires effective collaboration between data scientists, engineers, business leaders, and other stakeholders. The ultimate goal of a machine learning model is to address a business challenge. Therefore, it is essential to align the objectives of the ML team with the strategic goals of the business. This involves structuring teams to promote cross-functional collaboration, clear communication and well-defined processes to ensure everyone is on the same page. Additionally, building a culture of continuous learning and adaptation, as encouraged by Agile software development, helps teams handle the experimental nature of machine learning development. I will elaborate on this in a future blog, “Optimising AI Development with End-to-End Product Teams.”
About the author
Erik Jan de Vries is an award-winning ML product architect / engineer and freelance consultant. While AI tools were used in the writing process for this blog, all the ideas and arguments presented here are my own. I’m always up for a chat about AI and ML, feel free to contact me on LinkedIn.