Part 3 - Building A Simplified MLOps Pipeline (CI/CT/CD)
This article provides a guide to building an MLOps pipeline with AWS services, focusing on automated model retraining, deployment, and cost optimization.
---
Author(s):
Jian Jian (Ken), Senior Data & Applied Scientist, NetApp
Introduction
In this tutorial, you will learn how to leverage various AWS services to construct a simple MLOps pipeline that encompasses Continuous Integration (CI), Continuous Training (CT), and Continuous Deployment (CD). Unlike traditional DevOps pipelines, MLOps requires additional considerations to complete the operational cycle. By following this tutorial, you will gain insights into incorporating CT into the MLOps loop, enabling continuous training of your models and seamless deployment for inference. The tutorial will guide you through the process of utilizing AWS services to establish this end-to-end MLOps pipeline.
Manifest
Functionality | Name | Comment |
---|---|---|
Data storage |
AWS FSx ONTAP |
|
Data science IDE |
AWS SageMaker |
This tutorial is based on the Jupyter notebook presented in Part 2 - Leveraging Amazon FSx for NetApp ONTAP (FSx ONTAP) as a Data Source for Model Training in SageMaker. |
Function to trigger the MLOps pipeline |
AWS Lambda function |
- |
Cron job trigger |
AWS EventBridge |
- |
Deep learning framework |
PyTorch |
- |
AWS Python SDK |
boto3 |
- |
Programming Language |
Python |
v3.10 |
Prerequisite
-
An pre-configured FSx ONTAP file system. This tutorial utilizes data stored in FSx ONTAP for the training process.
-
A SageMaker Notebook instance that is configured to share the same VPC as the FSx ONTAP file system mentioned above.
-
Before triggering the AWS Lambda function, ensure that the SageMaker Notebook instance is in stopped status.
-
The ml.g4dn.xlarge instance type is required to leverage the GPU acceleration necessary for the computations of deep neural networks.
Architecture
This MLOps pipeline is a practical implementation that utilizes a cron job to trigger a serverless function, which in turn executes an AWS service registered with a lifecycle callback function. The AWS EventBridge acts as the cron job. It periodically invokes an AWS Lambda function responsible for retraining and redeploying the model. This process involves spinning up the AWS SageMaker Notebook instance to perform the necessary tasks.
Step-by-Step Configuration
Lifecycle configurations
To configure the lifecycle callback function for the AWS SageMaker Notebook instance, you would utilize Lifecycle configurations. This service allow you to define the necessary actions to be performed during when spinning up the notebook instance. Specifically, a shell script can be implemented within the Lifecycle configurations to automatically shut down the notebook instance once the training and deployment processes are completed. This is a required configuration as the cost is one of the major consideration in MLOps.
It's important to note that the configuration for Lifecycle configurations needs to be set up in advance. Therefore, it is recommended to prioritize configuring this aspect before proceeding with the other MLOps pipeline setup.
-
To set up a Lifecycle configurations, open the Sagemaker panel and navigate to Lifecycle configurations under the section Admin configurations.
-
Select the Notebook Instance tab and click the Create configuration button
-
Paste the below code to the entry area.
#!/bin/bash set -e sudo -u ec2-user -i <<'EOF' # 1. Retraining and redeploying the model NOTEBOOK_FILE=/home/ec2-user/SageMaker/tyre_quality_classification_local_training.ipynb echo "Activating conda env" source /home/ec2-user/anaconda3/bin/activate pytorch_p310 nohup jupyter nbconvert "$NOTEBOOK_FILE" --ExecutePreprocessor.kernel_name=python --execute --to notebook & nbconvert_pid=$! conda deactivate # 2. Scheduling a job to shutdown the notebook to save the cost PYTHON_DIR='/home/ec2-user/anaconda3/envs/JupyterSystemEnv/bin/python3.10' echo "Starting the autostop script in cron" (crontab -l 2>/dev/null; echo "*/5 * * * * bash -c 'if ps -p $nbconvert_pid > /dev/null; then echo \"Notebook is still running.\" >> /var/log/jupyter.log; else echo \"Notebook execution completed.\" >> /var/log/jupyter.log; $PYTHON_DIR -c \"import boto3;boto3.client(\'sagemaker\').stop_notebook_instance(NotebookInstanceName=get_notebook_name())\" >> /var/log/jupyter.log; fi'") | crontab - EOF
-
This script executes the Jupyter Notebook, which handles the retraining and redeployment of the model for inference. After the execution is complete, the notebook will automatically shut down within 5 minutes. To learn more about the problem statement and the code implementation, please refer to Part 2 - Leveraging Amazon FSx for NetApp ONTAP (FSx ONTAP) as a Data Source for Model Training in SageMaker.
-
After the creation, navigate to Notebook instances, select the target instance, and click Update settings under Actions dropdown.
-
Select the created Lifecycle configuration and click Update notebook instance.
AWS Lambda serverless function
As mentioned earlier, the AWS Lambda function is responsible for spinning up the AWS SageMaker Notebook instance.
-
To create an AWS Lambda function, navigate to the respective panel, switch to the Functions tab, and click on Create Function.
-
Please file all required entries on the page and remember to switch the Runtime to Python 3.10.
-
Please verify that the designated role has the required permission AmazonSageMakerFullAccess and click on the Create function button.
-
Select the created Lambda function. In the code tab, copy and paste the following code into the text area. This code starts the notebook instance named fsxn-ontap.
import boto3 import logging def lambda_handler(event, context): client = boto3.client('sagemaker') logging.info('Invoking SageMaker') client.start_notebook_instance(NotebookInstanceName='fsxn-ontap') return { 'statusCode': 200, 'body': f'Starting notebook instance: {notebook_instance_name}' }
-
Click the Deploy button to apply this code change.
-
To specify how to trigger this AWS Lambda function, click on the Add Trigger button.
-
Select EventBridge from the dropdown menu, then click on the radio button labeled Create a new rule. In the schedule expression field, enter
rate(1 day)
, and click on the Add button to create and apply this new cron job rule to the AWS Lambda function.
After completing the two-step configuration, on a daily basis, the AWS Lambda function will initiate the SageMaker Notebook, perform model retraining using the data from the FSx ONTAP repository, redeploy the updated model to the production environment, and automatically shut down the SageMaker Notebook instance to optimize cost. This ensures that the model remains up to date.
This concludes the tutorial for developing an MLOps pipeline.