Open Source MLOps with NetApp
Mike Oglesby, NetApp
Sufian Ahmad, NetApp
Rick Huang, NetApp
Mohan Acharya, NetApp
Companies and organizations of all sizes and across many industries are turning to artificial intelligence (AI) to solve real-world problems, deliver innovative products and services, and to get an edge in an increasingly competitive marketplace. Many organizations are turning to open-source MLOps tools in order to keep up with the rapid pace of innovation in the industry. These open-source tools offer advanced capabilities and cutting-edge features, but often don't account for data availability and data security. Unfortunately, this means that highly-skilled data scientists are forced to spend a significant amount of time waiting to gain access to data or waiting for rudimentary data-related operations to complete. By pairing popular open-source MLOps tools with an intelligent data infrastructure from NetApp, organizations can accelerate their data pipelines, which, in turn, accelerates their AI initiatives. They can unlock value from their data while ensuring that it remains protected and secure. This solution demonstrates the pairing of NetApp data management capabilities with several popular open-source tools and frameworks in order to address these challenges.
The following list highlights some key capabilities that are enabled by this solution:
-
Users can rapidly provision new high-capacity data volumes and development workspaces that are backed by high-performance, scale-out NetApp storage.
-
Users can near-instantaneously clone high-capacity data volumes and development workspaces in order to enable experimentation or rapid iteration.
-
Users can near-instantaneously save snapshots of high-capacity data volumes and development workspaces for backup and/or traceability/baselining.
A typical MLOps workflow incorporates development workspaces, usually taking the form of Jupyter Notebooks; experiment tracking; automated training pipelines; data pipelines; and inference/deployment. This solution highlights several different tools and frameworks that can be used independently or in conjunction to address the different aspects of the workflow. We also demonstrate the pairing of NetApp data management capabilities with each of these tools. This solution is intended to offer building blocks from which an organization can construct a customized MLOps workflow that is specific to their uses cases and requirements.
The following tools/frameworks are covered in this solution:
The following list describes common patterns for deploying these tools independently or in conjunction.
-
Deploy JupyterHub, MLflow, and Apache Airflow in conjunction - JupyterHub for Jupyter Notebooks, MLflow for experiment tracking, and Apache Airflow for automated training and data pipelines.
-
Deploy Kubeflow and Apache Airflow in conjunction - Kubeflow for Jupyter Notebooks, experiment tracking, automated training pipelines, and inference; and Apache Airflow for data pipelines.
-
Deploy Kubeflow as an all-in-one MLOps platform solution for Jupyter Notebooks, experiment tracking, automated training and data pipelines, and inference.