Skip to main content
NetApp Data Classification

Check that your Linux host is ready to install NetApp Data Classification

Contributors netapp-ahibbard

Before installing NetApp Data Classification manually on a Linux host, optionally run a script on the host to verify that all the prerequisites are in place for installing Data Classification. You can run this script on a Linux host in your network, or on a Linux host in the cloud. The host can be connected to the internet, or the host can reside in a site that doesn't have internet access (a dark site).

There is also a prerequisite test script that is part of the Data Classification installation script. The script described here is specifically designed for users who want to verify the Linux host independently of running the Data Classification installation script.

Getting Started

You'll perform the following tasks.

  1. Optionally, install a Console agent if you don't already have one installed. You can run the test script without having a Console agent installed, but the script checks for connectivity between the Console agent and the Data Classification host machine - so it is recommended that you have a Console agent.

  2. Prepare the host machine and verify that it meets all the requirements.

  3. Enable outbound internet access from the Data Classification host machine.

  4. Verify that all required ports are enabled on all systems.

  5. Download and run the Prerequisite test script.

Create a Console agent

A Console agent is required before you can install and use Data Classification. You can, however, run the Prerequisites script without a Console agent.

You can install the Console agent on-premises on a Linux host in your network or on a Linux host in the cloud. Some users planning to install Data Classification on-prem may also choose to install the Console agent on-prem.

To create a Console agent in your cloud provider environment, see creating a Console agent in AWS, creating a Console agent in Azure, or creating a Console agent in GCP.

You'll need the IP address or host name of the Console agent system when running the Prerequisites script. You'll have this information if you installed the Console agent in your premises. If the Console agent is deployed in the cloud, you can find this information from the Console: select the Help icon then Support then Console agent.

Verify host requirements

Data Classification software must run on a host that meets specific operating system requirements, RAM requirements, software requirements, and so on.

  • Data Classification is not supported on a host that is shared with other applications - the host must be a dedicated host.

  • When building the host system in your premises, you can choose among these system sizes depending on the size of the dataset that you plan to have Data Classification scan.

    System size CPU RAM (swap memory must be disabled) Disk

    Extra Large

    32 CPUs

    128 GB RAM

    • 1 TiB SSD on /, or 100 GiB available on /opt

    • 895 GiB available on /var/lib/docker

    • 5 GiB on /tmp

    • For Podman, 30 GB on /var/tmp

    Large

    16 CPUs

    64 GB RAM

    • 500 GiB SSD on /, or 100 GiB available on /opt

    • 400 GiB available on /var/lib/docker or for Podman /var/lib/containers

    • 5 GiB on /tmp

    • For Podman, 30 GB on /var/tmp

  • When deploying a compute instance in the cloud for your Data Classification installation, it's recommended you use a system that meets the "Large" system requirements above:

  • UNIX folder permissions: The following minimum UNIX permissions are required:

    Folder Minimum Permissions

    /tmp

    rwxrwxrwt

    /opt

    rwxr-xr-x

    /var/lib/docker

    rwx------

    /usr/lib/systemd/system

    rwxr-xr-x

  • Operating system:

    • The following operating systems require using the Docker container engine:

      • Red Hat Enterprise Linux version 7.8 and 7.9

      • Ubuntu 22.04 (requires Data Classification version 1.23 or greater)

      • Ubuntu 24.04 (requires Data Classification version 1.23 or greater)

    • The following operating systems require using the Podman container engine, and they require Data Classification version 1.30 or greater:

      • Red Hat Enterprise Linux version 8.8, 8.10, 9.0, 9.1, 9.2, 9.3, 9.4, 9.5, and 9.6.

    • Advanced Vector Extensions (AVX2) must be enabled on the host system.

  • Red Hat Subscription Management: The host must be registered with Red Hat Subscription Management. If it's not registered, the system can't access repositories to update required 3rd-party software during installation.

  • Additional software: You must install the following software on the host before you install Data Classification:

    • Depending on the OS you are using, you'll need to install one of the container engines:

      • Docker Engine version 19.3.1 or greater. View installation instructions.

      • Podman version 4 or greater. To install Podman, enter (sudo yum install podman netavark -y).

  • Python version 3.6 or greater. View installation instructions.

    • NTP considerations: NetApp recommends configuring the Data Classification system to use a Network Time Protocol (NTP) service. The time must be synchronized between the Data Classification system and the Console agent system.

  • Firewalld considerations: If you are planning to use firewalld, we recommend that you enable it before installing Data Classification. Run the following commands to configure firewalld so that it is compatible with Data Classification:

    firewall-cmd --permanent --add-service=http
    firewall-cmd --permanent --add-service=https
    firewall-cmd --permanent --add-port=80/tcp
    firewall-cmd --permanent --add-port=8080/tcp
    firewall-cmd --permanent --add-port=443/tcp
    firewall-cmd --reload

    If you're planning to use additional Data Classification hosts as scanner nodes (in a distributed model), add these rules to your primary system at this time:

    firewall-cmd --permanent --add-port=2377/tcp
    firewall-cmd --permanent --add-port=7946/udp
    firewall-cmd --permanent --add-port=7946/tcp
    firewall-cmd --permanent --add-port=4789/udp

    Note that you must restart Docker or Podman whenever you enable or update firewalld settings.

Enable outbound internet access from Data Classification

Data Classification requires outbound internet access. If your virtual or physical network uses a proxy server for internet access, ensure that the Data Classification instance has outbound internet access to contact the following endpoints.

Tip This section is not required for host systems installed in sites without internet connectivity.
Endpoints Purpose

https://api.console.netapp.com

Communication with the Console service, which includes NetApp accounts.

https://netapp-cloud-account.auth0.com
https://auth0.com

Communication with the Console website for centralized user authentication.

https://support.compliance.api.console.netapp.com/
https://hub.docker.com
https://auth.docker.io
https://registry-1.docker.io
https://index.docker.io/
https://dseasb33srnrn.cloudfront.net/
https://production.cloudflare.docker.com/

Provides access to software images, manifests, templates, and to send logs and metrics.

https://support.compliance.api.console.netapp.com/

Enables NetApp to stream data from audit records.

https://github.com/docker
https://download.docker.com

Provides prerequisite packages for docker installation.

http://packages.ubuntu.com/
http://archive.ubuntu.com

Provides prerequisite packages for Ubuntu installation.

Verify that all required ports are enabled

You must ensure that all required ports are open for communication between the Console agent, Data Classification, Active Directory, and your data sources.

Connection Type Ports Description

Console agent <> Data Classification

8080 (TCP), 443 (TCP), and 80.
9000

The firewall or routing rules for the Console agent must allow inbound and outbound traffic over port 443 to and from the Data Classification instance.

Make sure port 8080 is open so you can see the installation progress in the Console.

If a firewall is used on the Linux host, port 9000 is required for internal processes within an Ubuntu server.

Console agent <> ONTAP cluster (NAS)

443 (TCP)

The Console discovers ONTAP clusters using HTTPS. If you use custom firewall policies, the Console agent host must allow outbound HTTPS access through port 443. If the Console agent is in the cloud, all outbound communication is allowed by the predefined firewall or routing rules.

Run the Data Classification prerequisites script

Follow these steps to run the Data Classification prerequisites script.

Watch this video to see how to run the Prerequisites script and interpret the results.

Before you begin
  • Verify that your Linux system meets the host requirements.

  • Verify that the system has the two prerequisite software packages installed (Docker Engine or Podman, and Python 3).

  • Make sure you have root privileges on the Linux system.

Steps
  1. Download the Data Classification Prerequisites script from the NetApp Support Site. The file you should select is named standalone-pre-requisite-tester-<version>.

  2. Copy the file to the Linux host you plan to use (using scp or some other method).

  3. Assign permissions to run the script.

    chmod +x standalone-pre-requisite-tester-v1.25.0
  4. Run the script using the following command.

     ./standalone-pre-requisite-tester-v1.25.0 <--darksite>

    Add the option "--darksite" only if you are running the script on a host that doesn't have internet access. Certain prerequisite tests are skipped when the host is not connected to the internet.

  5. The script prompts you for the IP address of the Data Classification host machine.

    • Enter the IP address or host name.

  6. The script prompts whether you have an installed Console agent.

    • Enter N if you do not have an installed Console agent.

    • Enter Y if you do have an installed Console agent. And then enter the IP address or host name of the Console agent so the test script can test this connectivity.

  7. The script runs a variety of tests on the system and it displays results as it progresses. When it finishes it writes a log of the session to a file named prerequisites-test-<timestamp>.log in the directory /opt/netapp/install_logs.

Result

If all the prerequisites tests ran successfully, you can install Data Classification on the host when you are ready.

If any issues were discovered, they are categorized as "Recommended" or "Required" to be fixed. Recommended issues are typically items that would make the Data Classification scanning and categorizing tasks run slower. These items do not need to be corrected - but you may want to address them.

If you have any "Required" issues, you should fix the issues and run the Prerequisites test script again.