In this article, we cover How to install MLflow. Before we dive into the process, let's begin with introducing MLOps
By definition, MLOps is a cross-functional, collaborative, and continuous process that focuses on operationalizing data science use cases by managing statistical, machine learning models as reusable, highly available software artifacts via repeatable deployment process.
MLOps covers aspects such as model inference, scalability, maintenance, auditing, monitoring, and governance of models in an order that they deliver positive value even as underlying conditions (variables) change.
MLOps has grown into prominence to help organizations reduce the risk associated with Data Science, AI, and ML initiatives and maximize returns on analytics.
Running ML models and managing its lifecycle needs continuous comparison of the performance of model versions and detection of model drifts, as and when they occur. Manual ML tracking methods pave way for too many blind spots when dealing with scores of model runs. This is where MLOps streamlines end-to-end ML lifecycle management.
MLOps helps Data Scientists –
While there are multiple platforms to manage MLOps, let’s focus on MLflow, a popular open-source platform to streamline machine learning development (Tracking experiments, packaging code into reproducible runs, sharing and deploying models).
MLflow is a Machine Learning Operations platform that offers a set of lightweight APIs that can be used with any existing machine learning application or library (Ex: Tensorflow, PyTorch, XGBoost, etc).
Before we get to the MLflow installation, let’s take a look at key advantages of using MLflow as a platform:
In order to install MLflow, create a virtual environment using below code:
conda create --name mlflow python=3.6
conda activate mlflow
After creating the virtual environment, install MLflow using the code below-
conda install -c conda-forge mlflow (or) Pip install mlflow
In this case, local storage on a personal system is used as the tracking server. Once MLflow is installed, create a python file say sample.py by using the below code:
from random import random, randint
from mlflow import log_metric, log_param, log_artifacts
if __name__ == "__main__":
# Log a parameter (key-value pair)
log_param("param1", randint(0, 100))
# Log a metric; metrics can be updated throughout the run
log_metric("foo", random() + 1)
log_metric("foo", random() + 2)
# Log an artifact (output file)
if not os.path.exists("outputs"):
with open("outputs/test.txt", "w") as f:
Create a folder and save sample.py file in the folder.
In Anaconda prompt, mention the path of the folder where we have saved sample.py file. After this, the file can be run as shown below:
A folder named 'mlruns' gets created automatically after running sample.py file. All the information about different runs and artifacts get saved in this folder.
We can then view Mlflow UI by using below command in the Anaconda prompt.The below link is used to launch: http://localhost:5000/
On the left panel, all Experiments can be seen and grouped as different runs of the same problem.
The MLflow UI as we see can be used to train and log models through MLFlow tracking (which allows us to visualize, search and compare runs, download & run artifacts or metadata for analysis with other tools). More on it in future articles.
Mohak Batra is an associate scientist of Data Science Practice at GainInsights and can be reached at [email protected]