Data fabric powered by NetApp for big data architecture

Contributors

The data fabric powered by NetApp simplifies and integrates data management across cloud and on-premises environments to accelerate digital transformation.

The data fabric powered by NetApp delivers consistent and integrated data management services and applications (building blocks) for data visibility and insights, data access and control, and data protection and security, as shown in the figure below.

Error: Missing Graphic Image

Proven data fabric customer use cases

The data fabric powered by NetApp provides the following nine proven use cases for customers:

  • Accelerate analytics workloads

  • Accelerate DevOps transformation

  • Build cloud hosting infrastructure

  • Integrate cloud data services

  • Protect and secure data

  • Optimize unstructured data

  • Gain data center efficiencies

  • Deliver data insights and control

  • Simplify and automate

This document covers two of the nine use cases (along with their solutions):

  • Accelerate analytics workloads

  • Protect and secure data

NetApp NFS direct access

The NetApp NFS direct access (formerly known as NetApp In-Place Analytics Module) (shown in the figure below) allows customers to run big data analytics jobs on their existing or new NFSv3 or NFSv4 data without moving or copying the data. It prevents multiple copies of data and eliminates the need to sync the data with a source. For example, in the financial sector, the movement of data from one place to another place must meet legal obligations, which is not an easy task. In this scenario, the NetApp NFS direct access analyzes the financial data from its original location. Another key benefit is that using the NetApp NFS direct access simplifies protecting Hadoop data by using native Hadoop commands and enables data protection workflows leveraging NetApp’s rich data management portfolio.

Error: Missing Graphic Image

The NetApp NFS direct access provides two kinds of deployment options for Hadoop/Spark clusters:

  • By default, the Hadoop/Spark clusters use Hadoop Distributed File System (HDFS) for data storage and the default file system. The NetApp NFS direct access can replace the default HDFS with NFS storage as the default file system, enabling direct analytics operations on NFS data.

  • In another deployment option, the NetApp NFS direct access supports configuring NFS as additional storage along with HDFS in a single Hadoop/Spark cluster. In this case, the customer can share data through NFS exports and access it from the same cluster along with HDFS data.

The key benefits of using the NetApp NFS direct access include:

  • Analyzes the data from its current location, which prevents the time- and performance-consuming task of moving analytics data to a Hadoop infrastructure such as HDFS.

  • Reduces the number of replicas from three to one.

  • Enables users to decouple the compute and storage to scale them independently.

  • Provides enterprise data protection by leveraging the rich data management capabilities of ONTAP.

  • Is certified with the Hortonworks data platform.

  • Enables hybrid data analytics deployments.

  • Reduces the backup time by leveraging dynamic multithread capability.

Building blocks for big data

The data fabric powered by NetApp integrates data management services and applications (building blocks) for data access, control, protection, and security, as shown in the figure below.

Error: Missing Graphic Image

The building blocks in the figure above include:

  • NetApp NFS direct access. Provides the latest Hadoop and Spark clusters with direct access to NetApp NFS volumes without additional software or driver requirements.

  • NetApp Cloud Volumes ONTAP and Cloud Volume Services. Software-defined connected storage based on ONTAP running in Amazon Web Services (AWS) or Azure NetApp Files (ANF) in Microsoft Azure cloud services.

  • NetApp SnapMirror technology. Provides data protection capabilities between on-premises and ONTAP Cloud or NPS instances.

  • Cloud service providers. These providers include AWS, Microsoft Azure, Google Cloud, and IBM Cloud.

  • PaaS. Cloud-based analytics services such as Amazon Elastic MapReduce (EMR) and Databricks in AWS as well as Microsoft Azure HDInsight and Azure Databricks.