TR-4834: NetApp and Iguazio for MLRun Pipeline
Rick Huang, David Arnette, NetApp
Marcelo Litovsky, Iguazio
This document covers the details of the MLRun pipeline using NetApp ONTAP AI, NetApp AI Control Plane, NetApp Cloud Volumes software, and the Iguazio Data Science Platform. We used Nuclio serverless function, Kubernetes Persistent Volumes, NetApp Cloud Volumes, NetApp Snapshot copies, Grafana dashboard, and other services on the Iguazio platform to build an end-to-end data pipeline for the simulation of network failure detection. We integrated Iguazio and NetApp technologies to enable fast model deployment, data replication, and production monitoring capabilities on premises as well as in the cloud.
The work of a data scientist should be focused on the training and tuning of machine learning (ML) and artificial intelligence (AI) models. However, according to research by Google, data scientists spend ~80% of their time figuring out how to make their models work with enterprise applications and run at scale, as shown in the following image depicting model development in the AI/ML workflow.
To manage end-to-end AI/ML projects, a wider understanding of enterprise components is needed. Although DevOps have taken over the definition, integration, and deployment these types of components, machine learning operations target a similar flow that includes AI/ML projects. To get an idea of what an end-to-end AI/ML pipeline touches in the enterprise, see the following list of required components:
-
Storage
-
Networking
-
Databases
-
File systems
-
Containers
-
Continuous integration and continuous deployment (CI/CD) pipeline
-
Development integrated development environment (IDE)
-
Security
-
Data access policies
-
Hardware
-
Cloud
-
Virtualization
-
Data science toolsets and libraries
In this paper, we demonstrate how the partnership between NetApp and Iguazio drastically simplifies the development of an end-to-end AI/ML pipeline. This simplification accelerates the time to market for all of your AI/ML applications.
Target Audience
The world of data science touches multiple disciplines in information technology and business.
-
The data scientist needs the flexibility to use their tools and libraries of choice.
-
The data engineer needs to know how the data flows and where it resides.
-
A DevOps engineer needs the tools to integrate new AI/ML applications into their CI/CD pipelines.
-
Business users want to have access to AI/ML applications. We describe how NetApp and Iguazio help each of these roles bring value to business with our platforms.
Solution Overview
This solution follows the lifecycle of an AI/ML application. We start with the work of data scientists to define the different steps needed to prep data and train and deploy models. We follow with the work needed to create a full pipeline with the ability to track artifacts, experiment with execution, and deploy to Kubeflow. To complete the full cycle, we integrate the pipeline with NetApp Cloud Volumes to enable data versioning, as seen in the following image.