Cloud Sync technical FAQ
Contributors Download PDF of this page
This FAQ can help if you’re just looking for a quick answer to a question.
The following questions relate to getting started with Cloud Sync.
How does Cloud Sync work?
Cloud Sync uses the NetApp data broker software to sync data from a source to a target (this is called a sync relationship).
The data broker controls the sync relationships between your sources and targets. After you set up a sync relationship, Cloud Sync analyzes your source system and breaks it up into multiple replication streams to push to your selected target data.
After the initial copy, the service syncs any changed data based on the schedule that you set.
How does the 14-day free trial work?
The 14-day free trial starts when you sign up for the Cloud Sync service. You’re not subject to NetApp charges for Cloud Sync relationships you create for 14 days. However, all resource charges for any data broker that you deploy still applies.
How much does Cloud Sync cost?
There are two types of costs associated with using Cloud Sync: service charges and resource charges.
For pay-as-you-go pricing, Cloud Sync service charges are hourly, based on the number of sync relationships that you create.
Cloud Sync licenses are also available through your NetApp representative. Each license enables 20 sync relationships for 12 months.
|Cloud Sync relationships are free for Cloud Volumes Service and Azure NetApp Files.|
The resource charges are related to the compute and storage costs for running the data broker in the cloud.
How is Cloud Sync billed?
There are two ways to pay for sync relationships after your 14-day free trial ends. The first option is to subscribe from AWS or Azure, which enables you to pay-as-you-go or to pay annually. The second option is to purchase licenses directly from NetApp.
Can I use Cloud Sync outside the cloud?
Yes, you can use Cloud Sync in a non-cloud architecture. The source and target can reside on-premises and so can the data broker.
Note the following key points about using Cloud Sync outside of the cloud:
For on-premises synchronization, a private Amazon S3 bucket is available through NetApp StorageGRID.
The data broker does need an internet connection to communicate with the Cloud Sync service.
If you don’t purchase a license directly from NetApp, you will need an AWS or Azure account for the PAYGO Cloud Sync service billing.
How do I access Cloud Sync?
Cloud Sync is available from Cloud Manager in the Sync tab.
Supported sources and targets
The following questions related to the source and targets that are supported in a sync relationship.
Which sources and targets does Cloud Sync support?
Cloud Sync supports many different types of sync relationships. View the entire list.
What versions of NFS and SMB does Cloud Sync support?
Cloud Sync supports NFS version 3 and later, and SMB version 1 and later.
When Amazon S3 is the target, can the data be tiered to a specific S3 storage class?
Yes, you can choose a specific S3 storage class when AWS S3 is the target:
Standard (this is the default class)
One Zone-Infrequent Access
Glacier Deep Archive
What about storage tiers for Azure Blob storage?
You can choose a specific Azure Blob storage tier when a Blob container is the target:
Do you support Google Cloud storage tiers?
Yes, you can choose a specific storage class when a Google Cloud Storage bucket is the target:
The following questions relate to networking requirements for Cloud Sync.
What are the networking requirements for Cloud Sync?
The Cloud Sync environment requires that the data broker is connected with the source and the target through the selected protocol (NFS, SMB, EFS) or object storage API (Amazon S3, Azure Blob, IBM Cloud Object Storage).
In addition, the data broker needs an outbound internet connection over port 443 so it can communicate with the Cloud Sync service and contact a few other services and repositories.
For more details, review networking requirements.
Can I use a proxy server with the data broker?
Cloud Sync supports proxy servers with or without basic authentication. If you specify a proxy server when you deploy a data broker, all HTTP and HTTPS traffic from the data broker is routed through the proxy. Note that non-HTTP traffic such as NFS or SMB can’t be routed through a proxy server.
The only proxy server limitation is when using data-in-flight encryption with an NFS or Azure NetApp Files sync relationship. The encrypted data is sent over HTTPS and isn’t routable through a proxy server.
The following questions relate to how data synchronization works.
How often does synchronization occur?
The default schedule is set for daily synchronization. After the initial synchronization, you can:
Modify the sync schedule to your desired number of days, hours, or minutes
Disable the sync schedule
Delete the sync schedule (no data will be lost; only the sync relationship will be removed)
What is the minimum sync schedule?
You can schedule a relationship to sync data as often as every 1 minute.
Does the data broker retry when a file fails to sync? Or does it timeout?
The data broker doesn’t timeout when a single file fails to transfer. Instead, the data broker retries 3 times before skipping the file. The retry value is configurable in the settings for a sync relationship.
What if I have a very large dataset?
If a single directory contains 600,000 files or more, contact us so that we can help you configure the data broker to handle the payload. We might need to add additional memory to the data broker machine.
Note that there’s no limit to the total number of files in the mount point. The extra memory is required for large directories with 600,000 files or more, regardless of their level in the hierarchy (top directory or subdirectory).
The following questions related to security.
Is Cloud Sync secure?
Yes. All Cloud Sync service networking connectivity is done using Amazon Simple Queue Service (SQS).
All communication between the data broker and Amazon S3, Azure Blob, Google Cloud Storage, and IBM Cloud Object Storage is done through the HTTPS protocol.
If you’re using Cloud Sync with on-premises (source or destination) systems, here’s a few recommended connectivity options:
An AWS Direct Connect, Azure ExpressRoute, or Google Cloud Interconnect connection, which is non-internet routed (and can only communicate with the cloud networks that you specify)
A VPN connection between your on-premises gateway device and your cloud networks
For extra secure data transfer with S3 buckets, Azure Blob storage, or Google Cloud Storage, an Amazon Private S3 Endpoint, Azure Virtual Network service endpoints, or Private Google Access may be established.
Any of these methods establishes a secure connection between your on-premises NAS servers and a Cloud Sync data broker.
Is data encrypted by Cloud Sync?
Cloud Sync supports data-in-flight encryption between source and target NFS servers. Learn more.
Encryption is not supported with SMB.
When an Amazon S3 bucket is the target in a sync relationship, you can choose whether to enable data encryption using AWS KMS encryption or AES-256 encryption.
The following questions relate to data permissions.
Are SMB data permissions synced to the target location?
You can set up Cloud Sync to preserve access control lists (ACLs) between a source SMB share and a target SMB share. Or you can manually copy the ACLs yourself. Learn how to copy ACLs between SMB shares.
Are NFS data permissions synced to the target location?
Cloud Sync automatically copies NFS permissions between NFS servers as follows:
NFS version 3: Cloud Sync copies the permissions and the user group owner.
NFS version 4: Cloud Sync copies the ACLs.
Object storage metadata
Cloud Sync copies object storage metadata from the source to the target for the following types of sync relationships:
Amazon S3 → Amazon S3 1
Amazon S3 → StorageGRID
StorageGRID → Amazon S3
StorageGRID → StorageGRID
StorageGRID → Google Cloud Storage
Google Cloud Storage → StorageGRID 1
Google Cloud Storage → IBM Cloud Object Storage 1
Google Cloud Storage → Amazon S3 1
Amazon S3 → Google Cloud Storage
IBM Cloud Object Storage → Google Cloud Storage
StorageGRID → IBM Cloud Object Storage
IBM Cloud Object Storage → StorageGRID
IBM Cloud Object Storage → IBM Cloud Object Storage
1 For these sync relationships, you need to modify the data broker’s configuration in order to copy metadata.
The following questions relate to Cloud Sync performance.
What does the progress indicator for a sync relationship represent?
The sync relationship shows the throughput of the data broker’s network adapter. If you accelerated sync performance by using multiple data brokers, then the throughput is the sum of all traffic. This throughput refreshes every 20 seconds.
I’m experiencing performance issues. Can we limit the number of concurrent transfers?
The data broker can sync 4 files at a time. If you have very large files (multiple TBs each), it can take a long time to complete the transfer process and performance might be impacted.
Limiting the number of concurrent transfers can help. Contact us for help.
Why am I experiencing low performance with Azure NetApp Files?
When you sync data to or from Azure NetApp Files, you might experience failures and performance issues if the disk service level is Standard.
Change the service level to Premium or Ultra to enhance the sync performance.
Why am I experiencing low performance with Cloud Volumes Service for AWS?
When you sync data to or from a cloud volume, you might experience failures and performance issues if the level of performance for the cloud volume is Standard.
Change the Service level to Premium or Extreme to enhance the sync performance.
How many data brokers are required?
When you create a new relationship, you start with a single data broker (unless you selected an existing data broker that belongs to an accelerated sync relationship). In many cases, a single data broker can meet the performance requirements for a sync relationship. If it doesn’t, you can accelerate sync performance by adding additional data brokers. But you should first check other factors that can impact sync performance.
Multiple factors can impact data transfer performance. The overall sync performance might be impacted due to network bandwidth, latency, and network topology, as well as the data broker VM specs and storage system performance. For example, a single data broker in a sync relationship can reach 100 MB/s, while disk throughput on the target might only allow 64 MB/s. As a result, the data broker keeps trying to copy the data, but the target can’t meet the performance of the data broker.
So be sure to check the performance of your networking and the disk throughput on the target.
Then you can consider accelerating sync performance by adding an additional data broker to share the load of that relationship. Learn how to accelerate sync performance.
The following questions relate to deleting sync relationships and data from sources and targets.
What happens if I delete my Cloud Sync relationship?
Deleting a relationship stops all future data syncs and terminates payment. Any data that was synced to the target remains as-is.
What happens if I delete something from my source server? Is it removed from the target too?
By default, if you have an active sync relationship, the item deleted on the source server is not deleted from the target during the next synchronization. But there is an option in the sync settings for each relationship, where you can define that Cloud Sync will delete files in the target location if they were deleted from the source.
What happens if I delete something from my target? Is it removed from my source too?
If an item is deleted from the target, it will not be removed from the source. The relationship is one-way—from source to target. On the next sync cycle, Cloud Sync compares the source to the target, identifies that the item is missing, and Cloud Sync copies it again from the source to the target.
Data broker deep dive
The following question relates to the data broker.
Can you explain the architecture of the data broker?
Sure. Here are the most important points:
The data broker is a node.js application running on a Linux host.
Cloud Sync deploys the data broker as follows:
AWS: From an AWS CloudFormation template
Azure: From Azure Resource Manager
Google: From Google Cloud Deployment Manager
If you use your own Linux host, you need to manually install the software
The data broker software automatically upgrades itself to the latest version.
The data broker uses AWS SQS as a reliable and secure communication channel and for control and monitoring. SQS also provides a persistency layer.
You can add additional data brokers to a relationship to increase transfer speed and add high availability. There is service resiliency if one data broker fails.