Part 3 - Building A Simplified MLOps Pipeline (CI/CT/CD)

10/09/2024 Contributors

This article provides a guide to building an MLOps pipeline with AWS services, focusing on automated model retraining, deployment, and cost optimization.
---

Author(s):
Jian Jian (Ken), Senior Data & Applied Scientist, NetApp

Introduction

In this tutorial, you will learn how to leverage various AWS services to construct a simple MLOps pipeline that encompasses Continuous Integration (CI), Continuous Training (CT), and Continuous Deployment (CD). Unlike traditional DevOps pipelines, MLOps requires additional considerations to complete the operational cycle. By following this tutorial, you will gain insights into incorporating CT into the MLOps loop, enabling continuous training of your models and seamless deployment for inference. The tutorial will guide you through the process of utilizing AWS services to establish this end-to-end MLOps pipeline.

Manifest

Functionality	Name	Comment
Data storage	AWS FSx ONTAP	Refer to Part 1 - Integrating Amazon FSx for NetApp ONTAP (FSx ONTAP) as a private S3 bucket into AWS SageMaker.
Data science IDE	AWS SageMaker	This tutorial is based on the Jupyter notebook presented in Part 2 - Leveraging Amazon FSx for NetApp ONTAP (FSx ONTAP) as a Data Source for Model Training in SageMaker.
Function to trigger the MLOps pipeline	AWS Lambda function	-
Cron job trigger	AWS EventBridge	-
Deep learning framework	PyTorch	-
AWS Python SDK	boto3	-
Programming Language	Python	v3.10

Functionality

Name

Comment

Data storage

AWS FSx ONTAP

Refer to Part 1 - Integrating Amazon FSx for NetApp ONTAP (FSx ONTAP) as a private S3 bucket into AWS SageMaker.

Data science IDE

AWS SageMaker

This tutorial is based on the Jupyter notebook presented in Part 2 - Leveraging Amazon FSx for NetApp ONTAP (FSx ONTAP) as a Data Source for Model Training in SageMaker.

Function to trigger the MLOps pipeline

AWS Lambda function

Cron job trigger

AWS EventBridge

Deep learning framework

PyTorch

AWS Python SDK

boto3

Programming Language

Python

v3.10

Prerequisite

An pre-configured FSx ONTAP file system. This tutorial utilizes data stored in FSx ONTAP for the training process.
A SageMaker Notebook instance that is configured to share the same VPC as the FSx ONTAP file system mentioned above.
Before triggering the AWS Lambda function, ensure that the SageMaker Notebook instance is in stopped status.
The ml.g4dn.xlarge instance type is required to leverage the GPU acceleration necessary for the computations of deep neural networks.

Architecture

This MLOps pipeline is a practical implementation that utilizes a cron job to trigger a serverless function, which in turn executes an AWS service registered with a lifecycle callback function. The AWS EventBridge acts as the cron job. It periodically invokes an AWS Lambda function responsible for retraining and redeploying the model. This process involves spinning up the AWS SageMaker Notebook instance to perform the necessary tasks.

Step-by-Step Configuration

Lifecycle configurations

To configure the lifecycle callback function for the AWS SageMaker Notebook instance, you would utilize Lifecycle configurations. This service allow you to define the necessary actions to be performed during when spinning up the notebook instance. Specifically, a shell script can be implemented within the Lifecycle configurations to automatically shut down the notebook instance once the training and deployment processes are completed. This is a required configuration as the cost is one of the major consideration in MLOps.

It's important to note that the configuration for Lifecycle configurations needs to be set up in advance. Therefore, it is recommended to prioritize configuring this aspect before proceeding with the other MLOps pipeline setup.

To set up a Lifecycle configurations, open the Sagemaker panel and navigate to Lifecycle configurations under the section Admin configurations.
Select the Notebook Instance tab and click the Create configuration button

Paste the below code to the entry area.

#!/bin/bash

set -e
sudo -u ec2-user -i <<'EOF'
# 1. Retraining and redeploying the model
NOTEBOOK_FILE=/home/ec2-user/SageMaker/tyre_quality_classification_local_training.ipynb
echo "Activating conda env"
source /home/ec2-user/anaconda3/bin/activate pytorch_p310
nohup jupyter nbconvert "$NOTEBOOK_FILE" --ExecutePreprocessor.kernel_name=python --execute --to notebook &
nbconvert_pid=$!
conda deactivate

# 2. Scheduling a job to shutdown the notebook to save the cost
PYTHON_DIR='/home/ec2-user/anaconda3/envs/JupyterSystemEnv/bin/python3.10'
echo "Starting the autostop script in cron"
(crontab -l 2>/dev/null; echo "*/5 * * * * bash -c 'if ps -p $nbconvert_pid > /dev/null; then echo \"Notebook is still running.\" >> /var/log/jupyter.log; else echo \"Notebook execution completed.\" >> /var/log/jupyter.log; $PYTHON_DIR -c \"import boto3;boto3.client(\'sagemaker\').stop_notebook_instance(NotebookInstanceName=get_notebook_name())\" >> /var/log/jupyter.log; fi'") | crontab -
EOF

This script executes the Jupyter Notebook, which handles the retraining and redeployment of the model for inference. After the execution is complete, the notebook will automatically shut down within 5 minutes. To learn more about the problem statement and the code implementation, please refer to Part 2 - Leveraging Amazon FSx for NetApp ONTAP (FSx ONTAP) as a Data Source for Model Training in SageMaker.
After the creation, navigate to Notebook instances, select the target instance, and click Update settings under Actions dropdown.
Select the created Lifecycle configuration and click Update notebook instance.

AWS Lambda serverless function

As mentioned earlier, the AWS Lambda function is responsible for spinning up the AWS SageMaker Notebook instance.

To create an AWS Lambda function, navigate to the respective panel, switch to the Functions tab, and click on Create Function.
Please file all required entries on the page and remember to switch the Runtime to Python 3.10.
Please verify that the designated role has the required permission AmazonSageMakerFullAccess and click on the Create function button.

Select the created Lambda function. In the code tab, copy and paste the following code into the text area. This code starts the notebook instance named fsxn-ontap.

import boto3
import logging

def lambda_handler(event, context):
    client = boto3.client('sagemaker')
    logging.info('Invoking SageMaker')
    client.start_notebook_instance(NotebookInstanceName='fsxn-ontap')
    return {
        'statusCode': 200,
        'body': f'Starting notebook instance: {notebook_instance_name}'
    }

Click the Deploy button to apply this code change.
To specify how to trigger this AWS Lambda function, click on the Add Trigger button.
Select EventBridge from the dropdown menu, then click on the radio button labeled Create a new rule. In the schedule expression field, enter rate(1 day), and click on the Add button to create and apply this new cron job rule to the AWS Lambda function.

After completing the two-step configuration, on a daily basis, the AWS Lambda function will initiate the SageMaker Notebook, perform model retraining using the data from the FSx ONTAP repository, redeploy the updated model to the production environment, and automatically shut down the SageMaker Notebook instance to optimize cost. This ensures that the model remains up to date.

This concludes the tutorial for developing an MLOps pipeline.