Check that your Linux host is ready to install NetApp Data Classification
Before installing NetApp Data Classification manually on a Linux host, optionally run a script on the host to verify that all the prerequisites are in place for installing Data Classification. You can run this script on a Linux host in your network, or on a Linux host in the cloud. The host can be connected to the internet, or the host can reside in a site that doesn't have internet access (a dark site).
There is also a prerequisite test script that is part of the Data Classification installation script. The script described here is specifically designed for users who want to verify the Linux host independently of running the Data Classification installation script.
Getting Started
You'll perform the following tasks.
-
Optionally, install a Console agent if you don't already have one installed. You can run the test script without having a Console agent installed, but the script checks for connectivity between the Console agent and the Data Classification host machine - so it is recommended that you have a Console agent.
-
Prepare the host machine and verify that it meets all the requirements.
-
Enable outbound internet access from the Data Classification host machine.
-
Verify that all required ports are enabled on all systems.
-
Download and run the Prerequisite test script.
Create a Console agent
A Console agent is required before you can install and use Data Classification. You can, however, run the Prerequisites script without a Console agent.
You can install the Console agent on-premises on a Linux host in your network or on a Linux host in the cloud. Some users planning to install Data Classification on-prem may also choose to install the Console agent on-prem.
To create a Console agent in your cloud provider environment, see creating a Console agent in AWS, creating a Console agent in Azure, or creating a Console agent in GCP.
You'll need the IP address or host name of the Console agent system when running the Prerequisites script. You'll have this information if you installed the Console agent in your premises. If the Console agent is deployed in the cloud, you can find this information from the Console: select the Help icon then Support then Console agent.
Verify host requirements
Data Classification software must run on a host that meets specific operating system requirements, RAM requirements, software requirements, and so on.
-
Data Classification is not supported on a host that is shared with other applications - the host must be a dedicated host.
-
When building the host system in your premises, you can choose among these system sizes depending on the size of the dataset that you plan to have Data Classification scan.
System size CPU RAM (swap memory must be disabled) Disk Extra Large
32 CPUs
128 GB RAM
-
1 TiB SSD on /, or 100 GiB available on /opt
-
895 GiB available on /var/lib/docker
-
5 GiB on /tmp
-
For Podman, 30 GB on /var/tmp
Large
16 CPUs
64 GB RAM
-
500 GiB SSD on /, or 100 GiB available on /opt
-
400 GiB available on /var/lib/docker or for Podman /var/lib/containers
-
5 GiB on /tmp
-
For Podman, 30 GB on /var/tmp
-
-
When deploying a compute instance in the cloud for your Data Classification installation, it's recommended you use a system that meets the "Large" system requirements above:
-
Amazon Elastic Compute Cloud (Amazon EC2) instance type: "m6i.4xlarge". See additional AWS instance types.
-
Azure VM size: "Standard_D16s_v3". See additional Azure instance types.
-
GCP machine type: "n2-standard-16". See additional GCP instance types.
-
-
UNIX folder permissions: The following minimum UNIX permissions are required:
Folder Minimum Permissions /tmp
rwxrwxrwt
/opt
rwxr-xr-x
/var/lib/docker
rwx------
/usr/lib/systemd/system
rwxr-xr-x
-
Operating system:
-
The following operating systems require using the Docker container engine:
-
Red Hat Enterprise Linux version 7.8 and 7.9
-
Ubuntu 22.04 (requires Data Classification version 1.23 or greater)
-
Ubuntu 24.04 (requires Data Classification version 1.23 or greater)
-
-
The following operating systems require using the Podman container engine, and they require Data Classification version 1.30 or greater:
-
Red Hat Enterprise Linux version 8.8, 8.10, 9.0, 9.1, 9.2, 9.3, 9.4, 9.5, and 9.6.
-
-
Advanced Vector Extensions (AVX2) must be enabled on the host system.
-
-
Red Hat Subscription Management: The host must be registered with Red Hat Subscription Management. If it's not registered, the system can't access repositories to update required 3rd-party software during installation.
-
Additional software: You must install the following software on the host before you install Data Classification:
-
Depending on the OS you are using, you'll need to install one of the container engines:
-
Docker Engine version 19.3.1 or greater. View installation instructions.
-
Podman version 4 or greater. To install Podman, enter (
sudo yum install podman netavark -y
).
-
-
-
Python version 3.6 or greater. View installation instructions.
-
NTP considerations: NetApp recommends configuring the Data Classification system to use a Network Time Protocol (NTP) service. The time must be synchronized between the Data Classification system and the Console agent system.
-
-
Firewalld considerations: If you are planning to use
firewalld
, we recommend that you enable it before installing Data Classification. Run the following commands to configurefirewalld
so that it is compatible with Data Classification:firewall-cmd --permanent --add-service=http firewall-cmd --permanent --add-service=https firewall-cmd --permanent --add-port=80/tcp firewall-cmd --permanent --add-port=8080/tcp firewall-cmd --permanent --add-port=443/tcp firewall-cmd --reload
If you're planning to use additional Data Classification hosts as scanner nodes (in a distributed model), add these rules to your primary system at this time:
firewall-cmd --permanent --add-port=2377/tcp firewall-cmd --permanent --add-port=7946/udp firewall-cmd --permanent --add-port=7946/tcp firewall-cmd --permanent --add-port=4789/udp
Note that you must restart Docker or Podman whenever you enable or update
firewalld
settings.
Enable outbound internet access from Data Classification
Data Classification requires outbound internet access. If your virtual or physical network uses a proxy server for internet access, ensure that the Data Classification instance has outbound internet access to contact the following endpoints.
|
This section is not required for host systems installed in sites without internet connectivity. |
Endpoints | Purpose |
---|---|
https://api.console.netapp.com |
Communication with the Console service, which includes NetApp accounts. |
https://netapp-cloud-account.auth0.com |
Communication with the Console website for centralized user authentication. |
https://support.compliance.api.console.netapp.com/ |
Provides access to software images, manifests, templates, and to send logs and metrics. |
https://support.compliance.api.console.netapp.com/ |
Enables NetApp to stream data from audit records. |
https://github.com/docker |
Provides prerequisite packages for docker installation. |
http://packages.ubuntu.com/ |
Provides prerequisite packages for Ubuntu installation. |
Verify that all required ports are enabled
You must ensure that all required ports are open for communication between the Console agent, Data Classification, Active Directory, and your data sources.
Connection Type | Ports | Description |
---|---|---|
Console agent <> Data Classification |
8080 (TCP), 443 (TCP), and 80. |
The firewall or routing rules for the Console agent must allow inbound and outbound traffic over port 443 to and from the Data Classification instance. |
Console agent <> ONTAP cluster (NAS) |
443 (TCP) |
The Console discovers ONTAP clusters using HTTPS. If you use custom firewall policies, the Console agent host must allow outbound HTTPS access through port 443. If the Console agent is in the cloud, all outbound communication is allowed by the predefined firewall or routing rules. |
Run the Data Classification prerequisites script
Follow these steps to run the Data Classification prerequisites script.
Watch this video to see how to run the Prerequisites script and interpret the results.
-
Verify that your Linux system meets the host requirements.
-
Verify that the system has the two prerequisite software packages installed (Docker Engine or Podman, and Python 3).
-
Make sure you have root privileges on the Linux system.
-
Download the Data Classification Prerequisites script from the NetApp Support Site. The file you should select is named standalone-pre-requisite-tester-<version>.
-
Copy the file to the Linux host you plan to use (using
scp
or some other method). -
Assign permissions to run the script.
chmod +x standalone-pre-requisite-tester-v1.25.0
-
Run the script using the following command.
./standalone-pre-requisite-tester-v1.25.0 <--darksite>
Add the option "--darksite" only if you are running the script on a host that doesn't have internet access. Certain prerequisite tests are skipped when the host is not connected to the internet.
-
The script prompts you for the IP address of the Data Classification host machine.
-
Enter the IP address or host name.
-
-
The script prompts whether you have an installed Console agent.
-
Enter N if you do not have an installed Console agent.
-
Enter Y if you do have an installed Console agent. And then enter the IP address or host name of the Console agent so the test script can test this connectivity.
-
-
The script runs a variety of tests on the system and it displays results as it progresses. When it finishes it writes a log of the session to a file named
prerequisites-test-<timestamp>.log
in the directory/opt/netapp/install_logs
.
If all the prerequisites tests ran successfully, you can install Data Classification on the host when you are ready.
If any issues were discovered, they are categorized as "Recommended" or "Required" to be fixed. Recommended issues are typically items that would make the Data Classification scanning and categorizing tasks run slower. These items do not need to be corrected - but you may want to address them.
If you have any "Required" issues, you should fix the issues and run the Prerequisites test script again.