Skip to main content
NetApp Solutions

Part 1 - Integrating AWS FSx for NetApp ONTAP (FSxN) as a private S3 bucket into AWS SageMaker

Contributors banum-netapp kevin-hoke UKGANG

Author(s):
Jian Jian (Ken), Senior Data & Applied Scientist, NetApp

Introduction

Using SageMaker as an example, this page provides guidance on configuring FSxN as a private S3 bucket.

For more information about FSxN, please take a look at this presentation (Video Link)

User Guide

Server creation

Create a SageMaker Notebook Instance

  1. Open AWS console. In the search panel, search SageMaker and click the service Amazon SageMaker.

    Error: Open AWS console

  2. Open the Notebook instances under Notebook tab, click the orange button Create notebook instance.

    Error: AWS SageMaker Notebook Instance console

  3. In the creation page,
    Enter the Notebook instance name
    Expand the Network panel
    Leave other entries default and select a VPC, Subnet, and Security group(s). (This VPC and Subnet will be used to create FSxN file system later)
    Click the orange button Create notebook instance at the bottom right.

    Error: Create notebook instance

Create an FSxN File System

  1. Open AWS console. In the search panel, search Fsx and click the service FSx.

    Error: FSx Panel

  2. Click Create file system.

    Error: Create file system

  3. Select the first card FSx for NetApp ONTAP and click Next.

    Error: Select file system type

  4. In the details configuration page.

    1. Select the Standard create option.

      Error: Create file system panel

    2. Enter the File system name and the SSD storage capacity.

      Error: Specify file system details

    3. Make sure to use the VPC and subnet same to the SageMaker Notebook instance.

      Error: Network & security configuration

    4. Enter the Storage virtual machine name and Specify a password for your SVM (storage virtual machine).

      Error: Default storage virtual machine configuration

    5. Leave other entries default and click the orange button Next at the bottom right.

      Error: Confirm configuration

    6. Click the orange button Create file system at the bottom right of the review page.

      Error: Review configuration and confirm creation

  5. It may takes about 20-40 minutes to spin up the FSx file system.

    Error: Inspect the FSx console

Server Configuration

ONTAP Configuration

  1. Open the created FSx file system. Please make sure the status is Available.

    Error: Wait for the backend creation

  2. Select the Administration tab and keep the Management endpoint - IP address and ONTAP administrator username.

    Error: File system detail console

  3. Open the created SageMaker Notebook instance and click Open JupyterLab.

    Error: AWS SageMaker Notebook instance console

  4. In the Jupyter Lab page, open a new Terminal.

    Error: Jupyter Lab welcome page

  5. Enter the ssh command ssh <admin user name>@<ONTAP server IP> to login to the FSxN ONTAP file system. (The user name and IP address are retrieved from the step 2)
    Please use the password used when creating the Storage virtual machine.

    Error: Jupyter Lab terminal

  6. Execute the commands in the following order.
    We use fsxn-ontap as the name for the FSxN private S3 bucket name.
    Please use the storage virtual machine name for the -vserver argument.

    vserver object-store-server create -vserver fsxn-svm-demo -object-store-server fsx_s3 -is-http-enabled true -is-https-enabled false
    
    vserver object-store-server user create -vserver fsxn-svm-demo -user s3user
    
    vserver object-store-server group create -name s3group -users s3user -policies FullAccess
    
    vserver object-store-server bucket create fsxn-ontap -vserver fsxn-svm-demo -type nas -nas-path /vol1

    Error: Jupyter Lab terminal output

  7. Execute the below commands to retrieve the endpoint IP and credentials for FSxN private S3.

    network interface show -vserver fsxn-svm-demo -lif nfs_smb_management_1
    
    set adv
    
    vserver object-store-server user show
  8. Keep the endpoint IP and credential for future use.

    Error: Jupyter Lab terminal

Client Configuration

  1. In SageMaker Notebook instance, create a new Jupyter notebook.

    Error: Open a new Jupyter notebook

  2. Use the below code as a work around solution to upload files to FSxN private S3 bucket.
    For a comprehensive code example please refer to this notebook.
    fsxn_demo.ipynb

    # Setup configurations
    # -------- Manual configurations --------
    seed: int = 77                                              # Random seed
    bucket_name: str = 'fsxn-ontap'                             # The bucket name in ONTAP
    aws_access_key_id = '<Your ONTAP bucket key id>'            # Please get this credential from ONTAP
    aws_secret_access_key = '<Your ONTAP bucket access key>'    # Please get this credential from ONTAP
    fsx_endpoint_ip: str = '<Your FSxN IP address>'             # Please get this IP address from FSXN
    # -------- Manual configurations --------
    
    # Workaround
    ## Permission patch
    !mkdir -p vol1
    !sudo mount -t nfs $fsx_endpoint_ip:/vol1 /home/ec2-user/SageMaker/vol1
    !sudo chmod 777 /home/ec2-user/SageMaker/vol1
    
    ## Authentication for FSxN as a Private S3 Bucket
    !aws configure set aws_access_key_id $aws_access_key_id
    !aws configure set aws_secret_access_key $aws_secret_access_key
    
    ## Upload file to the FSxN Private S3 Bucket
    %%capture
    local_file_path: str = <Your local file path>
    
    !aws s3 cp --endpoint-url http://$fsx_endpoint_ip /home/ec2-user/SageMaker/$local_file_path  s3://$bucket_name/$local_file_path
    
    # Read data from FSxN Private S3 bucket
    ## Initialize a s3 resource client
    import boto3
    
    # Get session info
    region_name = boto3.session.Session().region_name
    
    # Initialize Fsxn S3 bucket object
    # --- Start integrating SageMaker with FSXN ---
    # This is the only code change we need to incorporate SageMaker with FSXN
    s3_client: boto3.client = boto3.resource(
        's3',
        region_name=region_name,
        aws_access_key_id=aws_access_key_id,
        aws_secret_access_key=aws_secret_access_key,
        use_ssl=False,
        endpoint_url=f'http://{fsx_endpoint_ip}',
        config=boto3.session.Config(
            signature_version='s3v4',
            s3={'addressing_style': 'path'}
        )
    )
    # --- End integrating SageMaker with FSXN ---
    
    ## Read file byte content
    bucket = s3_client.Bucket(bucket_name)
    
    binary_data = bucket.Object(data.filename).get()['Body']

This concludes the integration between FSxN and the SageMaker instance.

Useful debugging checklist

  • Ensure that the SageMaker Notebook instance and FSxN file system are in the same VPC.

  • Remember to run the set dev command on ONTAP to set the privilege level to dev.

FAQ (As of Sep 27, 2023)

Q: Why am I getting the error "An error occurred (NotImplemented) when calling the CreateMultipartUpload operation: The s3 command you requested is not implemented" when uploading files to FSxN?

A: As a private S3 bucket, FSxN supports uploading files up to 100MB. When using the S3 protocol, files larger than 100MB are divided into 100MB chunks, and the 'CreateMultipartUpload' function is called. However, the current implementation of FSxN private S3 does not support this function.

Q: Why am I getting the error "An error occurred (AccessDenied) when calling the PutObject operations: Access Denied" when uploading files to FSxN?

A: To access the FSxN private S3 bucket from a SageMaker Notebook instance, switch the AWS credentials to the FSxN credentials. However, granting write permission to the instance requires a workaround solution that involves mounting the bucket and running the 'chmod' shell command to change the permissions.

Q: How can I integrate the FSxN private S3 bucket with other SageMaker ML services?

A: Unfortunately, the SageMaker services SDK does not provide a way to specify the endpoint for the private S3 bucket. As a result, FSxN S3 is not compatible with SageMaker services such as Sagemaker Data Wrangler, Sagemaker Clarify, Sagemaker Glue, Sagemaker Athena, Sagemaker AutoML, and others.