Cloud Sync technical FAQ Edit on GitHub Request doc changes

Contributors netapp-bcammett

This FAQ can help if you’re just looking for a quick answer to a question.

Getting started

The following questions relate to getting started with Cloud sync.

How does Cloud Sync work?

Cloud Sync uses the NetApp data broker software to sync data from a source to a target (this is called a sync relationship).

The data broker controls the sync relationships between your sources and targets. After you set up a sync relationship, Cloud Sync analyzes your source system and breaks it up into multiple replication streams to push to your selected target data.

After the initial copy, the service syncs any changed data based on the schedule that you set.

How does the 14-day free trial work?

The 14-day free trial starts when you sign up for the Cloud Sync service. You’re not subject to NetApp charges for Cloud Sync relationships you create for 14 days. However, all resource charges for any data broker that you deploy still applies.

How much does Cloud Sync cost?

There are two types of costs associated with using Cloud Sync: service charges and resource charges.

Service charges

For pay-as-you-go pricing, Cloud Sync service charges are hourly, based on the number of sync relationships that you create.

Cloud Sync licenses are also available through your NetApp representative. Each license enables 20 sync relationships for 12 months.

Resource charges

The resource charges are related to the compute and storage costs for running the data broker in the cloud.

How is Cloud Sync billed?

There are two ways to pay for sync relationships after your 14-day free trial ends. The first option is to subscribe from AWS or Azure, which enables you to pay-as-you-go or to pay annually. The second option is to purchase licenses directly from NetApp.

Can I use Cloud Sync outside the cloud?

Yes, you can use Cloud Sync in a non-cloud architecture. The source and target can reside on-premises and so can the data broker.

Note the following key points about using Cloud Sync outside of the cloud:

  • For on-premises synchronization, a private Amazon S3 bucket is available through NetApp StorageGRID.

  • The data broker does need an internet connection to communicate with the Cloud Sync service.

  • If you don’t purchase a license directly from NetApp, you will need an AWS or Azure account for the PAYGO Cloud Sync service billing.

How do I access Cloud Sync?

Go to the Cloud Sync page on NetApp Cloud Central and click Start Free Trial. Log in or sign up to Cloud Central. After you’ve authenticated, you’re ready to get started using the Cloud Sync service.

Supported sources and targets

The following questions related to the source and targets that are supported in a sync relationship.

Which sources and targets does Cloud Sync support?

Cloud Sync supports many different types of sync relationships. View the entire list.

What versions of NFS and SMB does Cloud Sync support?

Cloud Sync supports NFS version 3 and later, and SMB version 1 and later.

When Amazon S3 is the target, can the data be tiered to a specific S3 storage class?

Yes, you can choose a specific S3 storage class when AWS S3 is the target:

  • Standard (this is the default class)

  • Intelligent-Tiering

  • Standard-Infrequent Access

  • One Zone-Infrequent Access

  • Glacier

  • Glacier Deep Archive

What about storage tiers for Azure Blob storage?

You can choose a specific Azure Blob storage tier when a Blob container is the target:

  • Hot storage

  • Cool storage

Networking

The following questions relate to networking requirements for Cloud Sync.

What are the networking requirements for Cloud Sync?

The Cloud Sync environment requires that the data broker is connected with the source and the target through the selected protocol (NFS, SMB, EFS) or object storage API (Amazon S3, Azure Blob, IBM Cloud Object Storage).

In addition, the data broker needs an outbound internet connection over port 443 so it can communicate with the Cloud Sync service and contact a few other services and repositories.

For more details:

Data brokers require internet access. We don’t support a proxy server when deploying the data broker in Azure or in Google Cloud Platform.

Data synchronization

The following questions relate to how data synchronization works.

How often does synchronization occur?

The default schedule is set for daily synchronization. After the initial synchronization, you can:

  • Modify the sync schedule to your desired number of days, hours, or minutes

  • Disable the sync schedule

  • Delete the sync schedule (no data will be lost; only the sync relationship will be removed)

What is the minimum sync schedule?

The minimum schedule that can be configured in Cloud Sync is 5 minutes.

Does the data broker retry when a file fails to sync? Or does it timeout?

The data broker doesn’t timeout when a single file fails to transfer. Instead, the data broker retries 3 times before skipping the file. The retry value is configurable in the settings for a sync relationship.

What if I have a very large dataset?

If a single directory contains 600,000 files or more, contact us so we can help you configure the data broker to handle the payload. We might need to add additional memory to the data broker machine.

Security

The following questions related to security.

Is Cloud Sync secure?

Yes. All Cloud Sync service networking connectivity is done using Amazon Simple Queue Service (SQS).

All communication between the data broker and Amazon S3, Azure Blob, Google Cloud Storage, and IBM Cloud Object Storage is done through the HTTPS protocol.

If you’re using Cloud Sync with on-premises (source or destination) systems, here’s a few recommended connectivity options:

  • An AWS Direct Connect, Azure ExpressRoute, or Google Cloud Interconnect connection, which is non-internet routed (and can only communicate with the cloud networks that you specify)

  • A VPN connection between your on-premises gateway device and your cloud networks

  • For extra secure data transfer with S3 buckets, Azure Blob storage, or Google Cloud Storage, an Amazon Private S3 Endpoint, Azure Virtual Network service endpoints, or Private Google Access may be established.

Any of these methods establishes a secure connection between your on-premises NAS servers and a Cloud Sync data broker.

Is data encrypted by Cloud Sync?

Cloud Sync supports data-in-flight encryption between source and target NFS servers. Learn more.

Encryption is not supported with SMB.

Permissions

The following question relates to data permissions.

Are the data permissions synced to the target location?

Cloud Sync copies permissions on both NFS and SMB data to the target location.

Performance

The following questions relate to Cloud Sync performance.

What does the progress indicator for a sync relationship represent?

The sync relationship shows the throughput of the data broker’s network adapter. If you accelerated sync performance by using multiple data brokers, then the throughput is the sum of all traffic. This throughput refreshes every 20 seconds.

I’m experiencing performance issues. Can we limit the number of concurrent transfers?

The data broker can sync 4 files at a time. If you have very large files (multiple TBs each), it can take a long time to complete the transfer process and performance might be impacted.

Limiting the number of concurrent transfers can help. Contact us for help.

Why am I experiencing low performance with Cloud Volumes Service for AWS?

When you sync data to or from a cloud volume, you might experience failures and performance issues if the level of performance for the cloud volume is Standard.

Change the Service level to Premium or Extreme to enhance the sync performance.

How many data brokers are required?

There isn’t a simple answer to this question. You need to observe performance and adjust accordingly.

When you create a new relationship, you start with a single data broker (unless you selected an existing data broker that belongs to an accelerated sync relationship). This data broker might show certain performance characteristics that may or may not fit the data sync requirements.

It might underperform if the bandwidth and capacity are available, but it might overburden the source and target. For example, one data broker might be too much for a Cloud Volumes Service source with a 1 TB Premium tier, as it can reach 100MB/s for some setups while the Cloud Volumes Service would allow 64MB/s at most.

Given that, you can always accelerate sync performance, which adds an additional data broker to share the load of that relationship (and any other relationships that the original data broker in the group handles).

Deleting

The following questions relate to deleting sync relationship and data from sources and targets.

What happens if I delete my Cloud Sync relationship?

Deleting a relationship stops all future data syncs and terminates payment. Any data that was synced to the target remains as-is.

What happens if I delete something from my source server? Is it removed from the target too?

By default, if you have an active sync relationship, the item deleted on the source server is not deleted from the target during the next synchronization. But there is an option in the sync settings for each relationship, where you can define that Cloud Sync will delete files in the target location if they were deleted from the source.

What happens if I delete something from my target? Is it removed from my source too?

If an item is deleted from the target, it will not be removed from the source. The relationship is one-way—from source to target. On the next sync cycle, Cloud Sync compares the source to the target, identifies that the item is missing, and Cloud Sync copies it again from the source to the target.

Data broker deep dive

The following question relates to the data broker.

Can you explain the architecture of the data broker?

Sure. Here are the most important points:

  • The data broker is a node.js application running on a Linux host.

  • Cloud Sync deploys the data broker as follows:

    • AWS: From an AWS CloudFormation template

    • Azure: From Azure Resource Manager

    • Google: From Google Cloud Deployment Manager

    • If you use your own Linux host, you need to manually install the software

  • The data broker software automatically upgrades itself to the latest version.

  • The data broker uses AWS SQS as a reliable and secure communication channel and for control and monitoring. SQS also provides a persistency layer.

  • You can add additional data brokers to a relationship to increase transfer speed and add high availability. There is service resiliency if one data broker fails.