Part 1 - Integrating Amazon FSx for NetApp ONTAP (FSx ONTAP) as a private S3 bucket into AWS SageMaker
This section provides a guide on configuring FSx ONTAP as a private S3 bucket using AWS SageMaker.
Introduction
Using SageMaker as an example, this page provides guidance on configuring FSx ONTAP as a private S3 bucket.
For more information about FSx ONTAP, please take a look at this presentation (Video Link)
User Guide
Server creation
Create a SageMaker Notebook Instance
-
Open AWS console. In the search panel, search SageMaker and click the service Amazon SageMaker.
-
Open the Notebook instances under Notebook tab, click the orange button Create notebook instance.
-
In the creation page,
Enter the Notebook instance name
Expand the Network panel
Leave other entries default and select a VPC, Subnet, and Security group(s). (This VPC and Subnet will be used to create FSx ONTAP file system later)
Click the orange button Create notebook instance at the bottom right.
Create an FSx ONTAP File System
-
Open AWS console. In the search panel, search Fsx and click the service FSx.
-
Click Create file system.
-
Select the first card FSx ONTAP and click Next.
-
In the details configuration page.
-
Select the Standard create option.
-
Enter the File system name and the SSD storage capacity.
-
Make sure to use the VPC and subnet same to the SageMaker Notebook instance.
-
Enter the Storage virtual machine name and Specify a password for your SVM (storage virtual machine).
-
Leave other entries default and click the orange button Next at the bottom right.
-
Click the orange button Create file system at the bottom right of the review page.
-
-
It may takes about 20-40 minutes to spin up the FSx file system.
Server Configuration
ONTAP Configuration
-
Open the created FSx file system. Please make sure the status is Available.
-
Select the Administration tab and keep the Management endpoint - IP address and ONTAP administrator username.
-
Open the created SageMaker Notebook instance and click Open JupyterLab.
-
In the Jupyter Lab page, open a new Terminal.
-
Enter the ssh command ssh <admin user name>@<ONTAP server IP> to login to the FSx ONTAP file system. (The user name and IP address are retrieved from the step 2)
Please use the password used when creating the Storage virtual machine. -
Execute the commands in the following order.
We use fsxn-ontap as the name for the FSx ONTAP private S3 bucket name.
Please use the storage virtual machine name for the -vserver argument. -
Execute the below commands to retrieve the endpoint IP and credentials for FSx ONTAP private S3.
-
Keep the endpoint IP and credential for future use.
Client Configuration
-
In SageMaker Notebook instance, create a new Jupyter notebook.
-
Use the below code as a work around solution to upload files to FSx ONTAP private S3 bucket.
For a comprehensive code example please refer to this notebook.
fsxn_demo.ipynb
This concludes the integration between FSx ONTAP and the SageMaker instance.
Useful debugging checklist
-
Ensure that the SageMaker Notebook instance and FSx ONTAP file system are in the same VPC.
-
Remember to run the set dev command on ONTAP to set the privilege level to dev.
FAQ (As of Sep 27, 2023)
Q: Why am I getting the error "An error occurred (NotImplemented) when calling the CreateMultipartUpload operation: The s3 command you requested is not implemented" when uploading files to FSx ONTAP?
A: As a private S3 bucket, FSx ONTAP supports uploading files up to 100MB. When using the S3 protocol, files larger than 100MB are divided into 100MB chunks, and the 'CreateMultipartUpload' function is called. However, the current implementation of FSx ONTAP private S3 does not support this function.
Q: Why am I getting the error "An error occurred (AccessDenied) when calling the PutObject operations: Access Denied" when uploading files to FSx ONTAP?
A: To access the FSx ONTAP private S3 bucket from a SageMaker Notebook instance, switch the AWS credentials to the FSx ONTAP credentials. However, granting write permission to the instance requires a workaround solution that involves mounting the bucket and running the 'chmod' shell command to change the permissions.
Q: How can I integrate the FSx ONTAP private S3 bucket with other SageMaker ML services?
A: Unfortunately, the SageMaker services SDK does not provide a way to specify the endpoint for the private S3 bucket. As a result, FSx ONTAP S3 is not compatible with SageMaker services such as Sagemaker Data Wrangler, Sagemaker Clarify, Sagemaker Glue, Sagemaker Athena, Sagemaker AutoML, and others.