Skip to main content
Cloud Insights

Configuring the ONTAP SVM Data Collector

Contributors netapp-alavoie

Workload Security uses data collectors to collect file and user access data from devices.

Before you begin

  • This data collector is supported with the following:

    • Data ONTAP 9.2 and later versions. For best performance, use a Data ONTAP version greater than 9.13.1.

    • SMB protocol version 3.1 and earlier.

    • NFS protocol version 4.0 and earlier

    • Flexgroup is supported from ONTAP 9.4 and later versions

    • ONTAP Select is supported

  • Only data type SVMs are supported. SVMs with infinite volumes are not supported.

  • SVM has several sub-types. Of these, only default, sync_source, and sync_destination are supported.

  • An Agent must be configured before you can configure data collectors.

  • Make sure that you have a properly configured User Directory Connector, otherwise events will show encoded user names and not the actual name of the user (as stored in Active Directory) in the “Activity Forensics” page.

  • For optimal performance, you should configure the FPolicy server to be on the same subnet as the storage system.

  • You must add an SVM using one of the following two methods:

    • By Using Cluster IP, SVM name, and Cluster Management Username and Password. This is the recommended method.

      • SVM name must be exactly as is shown in ONTAP and is case-sensitive.

    • By Using SVM Vserver Management IP, Username, and Password

    • If you are not able or not willing to use the full Administrator Cluster/SVM Management Username and Password, you can create a custom user with lesser privileges as mentioned in the “A note about permissions” section below. This custom user can be created for either SVM or Cluster access.

      • o You can also use an AD user with a role that has at least the permissions of csrole as mentioned in “A note about permissions” section below. Also refer to the ONTAP documentation.

  • Ensure the correct applications are set for the SVM by executing the following command:

    clustershell::> security login show -vserver <vservername> -user-or-group-name <username>

Example output:
SVM Command Output Example

  • Ensure that the SVM has a CIFS server configured:
    clustershell::> vserver cifs show

    The system returns the Vserver name, CIFS server name and additional fields.

  • Set a password for the SVM vsadmin user. If using custom user or cluster admin user, skip this step.
    clustershell::> security login password -username vsadmin -vserver svmname

  • Unlock the SVM vsadmin user for external access. If using custom user or cluster admin user, skip this step.
    clustershell::> security login unlock -username vsadmin -vserver svmname

  • Ensure the firewall-policy of the data LIF is set to ‘mgmt’ (not ‘data’). Skip this step if using a dedicated management lif to add the SVM.
    clustershell::> network interface modify -lif <SVM_data_LIF_name> -firewall-policy mgmt

  • When a firewall is enabled, you must have an exception defined to allow TCP traffic for the port using the Data ONTAP Data Collector.

    See Agent requirements for configuration information. This applies to on-premise Agents and Agents installed in the Cloud.

  • When an Agent is installed in an AWS EC2 instance to monitor a Cloud ONTAP SVM, the Agent and Storage must be in the same VPC. If they are in separate VPCs, there must be a valid route between the VPC’s.

A Note About Permissions

Permissions when adding via Cluster Management IP:

If you cannot use the Cluster management administrator user to allow Workload Security to access the ONTAP SVM data collector, you can create a new user named “csuser” with the roles as shown in the commands below. Use the username “csuser” and password for “csuser” when configuring the Workload Security data collector to use Cluster Management IP.

To create the new user, log in to ONTAP with the Cluster management Administrator username/password, and execute the following commands on the ONTAP server:

security login role create -role csrole -cmddirname DEFAULT -access readonly
security login role create -role csrole -cmddirname "vserver fpolicy" -access all
security login role create -role csrole -cmddirname "volume snapshot" -access all -query "-snapshot cloudsecure_*"
security login role create -role csrole -cmddirname "event catalog" -access all
security login role create -role csrole -cmddirname "event filter" -access all
security login role create -role csrole -cmddirname "event notification destination" -access all
security login role create -role csrole -cmddirname "event notification" -access all
security login role create -role csrole -cmddirname "security certificate" -access all
security login create -user-or-group-name csuser -application ontapi -authmethod password -role csrole
security login create -user-or-group-name csuser -application ssh -authmethod password -role csrole

Permissions when adding via Vserver Management IP:

If you cannot use the Cluster management administrator user to allow Workload Security to access the ONTAP SVM data collector, you can create a new user named “csuser” with the roles as shown in the commands below. Use the username “csuser” and password for “csuser” when configuring the Workload Security data collector to use Vserver Management IP.

To create the new user, log in to ONTAP with the Cluster management Administrator username/password, and execute the following commands on the ONTAP server. For ease, copy these commands to a text editor and replace the <vservername> with your Vserver name before and executing these commands on ONTAP:

security login role create -vserver <vservername> -role csrole -cmddirname DEFAULT -access none
security login role create -vserver <vservername> -role csrole -cmddirname "network interface" -access readonly
security login role create -vserver <vservername> -role csrole -cmddirname version -access readonly
security login role create -vserver <vservername> -role csrole -cmddirname volume -access readonly
security login role create -vserver <vservername> -role csrole -cmddirname vserver -access readonly
security login role create -vserver <vservername> -role csrole -cmddirname "vserver fpolicy" -access all
security login role create -vserver <vservername> -role csrole -cmddirname "volume snapshot" -access all
security login create -user-or-group-name csuser -application ontapi -authmethod password -role csrole -vserver <vservername>

Permissions for ONTAP Autonomous Ransomware Protection

If you are using cluster administration credentials, no new permissions are needed.

If you are using a custom user (for example, csuser) with permissions given to the user, then follow the steps below to give permissions to Workload Security to collect ARP related information from ONTAP.

For csuser with cluster credentials, do the following from the ONTAP command line:

security login rest-role create -role arwrole -api /api/storage/volumes -access readonly -vserver <cluster_name>
security login rest-role create -api /api/security/anti-ransomware -access readonly  -role arwrole -vserver <cluster_name>
security login create -user-or-group-name csuser -application http -authmethod password -role arwrole

Permissions for ONTAP Access Denied

If the Data Collector is added using cluster administration credentials, no new permissions are needed.

If the Collector is added using a custom user (for example, csuser) with permissions given to the user, follow the steps below to give Workload Security the necessary permission to register for Access Denied events with ONTAP.

For csuser with cluster credentials, execute the following commands from the ONTAP command line. Note that csrestrole is custom role and csuser is ontap custom user.

 security login rest-role create -role csrestrole -api /api/protocols/fpolicy -access all -vserver <cluster_name>
 security login create -user-or-group-name csuser -application http -authmethod password -role csrestrole

For csuser with SVM credentials, execute the following commands from the ONTAP command line:

 security login rest-role create -role csrestrole -api /api/protocols/fpolicy -access all -vserver <svm_name>
 security login create -user-or-group-name csuser -application http -authmethod password -role csrestrole -vserver <svm_name>

For more information, read about Integration with ONTAP Access Denied

Configure the data collector

Steps for Configuration
  1. Log in as Administrator or Account Owner to your Cloud Insights environment.

  2. Click Workload Security > Collectors > +Data Collectors

    The system displays the available Data Collectors.

  3. Hover over the NetApp SVM tile and click *+Monitor.

    The system displays the ONTAP SVM configuration page. Enter the required data for each field.

Configuration

Field

Description

Name

Unique name for the Data Collector

Agent

Select a configured agent from the list.

Connect via Management IP for:

Select either Cluster IP or SVM Management IP

Cluster / SVM Management IP Address

The IP address for the cluster or the SVM, depending on your selection above.

SVM Name

The Name of the SVM (this field is required when connecting via Cluster IP)

Username

User name to access the SVM/Cluster
When adding via Cluster IP the options are:
1. Cluster-admin
2. ‘csuser’
3. AD-user having similar role as csuser.
When adding via SVM IP the options are:
4. vsadmin
5. ‘csuser’
6. AD-username having similar role as csuser.

Password

Password for the above user name

Filter Shares/Volumes

Choose whether to include or exclude Shares / Volumes from event collection

Enter complete share names to exclude/include

Comma-separated list of shares to exclude or include (as appropriate) from event collection

Enter complete volume names to exclude/include

Comma-separated list of volumes to exclude or include (as appropriate) from event collection

Monitor Folder Access

When checked, enables events for folder access monitoring. Note that folder create/rename and delete will be monitored even without this option selected. Enabling this will increase the number of events monitored.

Set ONTAP Send Buffer size

Sets the ONTAP Fpolicy send buffer size. If an ONTAP version prior to 9.8p7 is used and performance issue is seen, then the ONTAP send buffer size can be altered to get improved ONTAP performance. Contact NetApp Support if you do not see this option and wish to explore it.

After you finish
  • In the Installed Data Collectors page, use the options menu on the right of each collector to edit the data collector. You can restart the data collector or edit data collector configuration attributes.

The following is recommended for Metro Cluster:

  1. Connect two data collectors, one to the source SVM and another to the destination SVM.

  2. The data collectors should be connected by Cluster IP.

  3. At any moment of time, one data collector should be in running, another will be in error.

    The current ‘running’ SVM’s data collector will show as Running. The current ‘stopped’ SVM’s
    data collector will show as Error.

  4. Whenever there is a switchover, the state of the data collector will change from ‘running’ to ‘error’ and vice versa.

  5. It will take up to two minutes for the data collector to move from Error state to Running state.

Service Policy

If using service policy from ONTAP version 9.9.1, in order to connect to the Data Source Collector, the data-fpolicy-client service is required along with the data service data-nfs, and/or data-cifs.

Example:

Testcluster-1::*> net int service-policy create -policy only_data_fpolicy -allowed-addresses 0.0.0.0/0 -vserver aniket_svm
-services data-cifs,data-nfs,data,-core,data-fpolicy-client
(network interface service-policy create)

In versions of ONTAP prior to 9.9.1, data-fpolicy-client need not be set.

Play-Pause Data Collector

2 new operations are now shown on kebab menu of collector (PAUSE and RESUME).

If the Data Collector is in Running state, you can Pause collection. Open the "three dots" menu for the collector and select PAUSE. While the collector is paused, no data is gathered from ONTAP, and no data is sent from the collector to ONTAP. This means no Fpolicy events will flow from ONTAP to the data collector, and from there to Cloud Insights.

Note that if any new volumes, etc. are created on ONTAP while the collector is Paused, Workload Security won’t gather the data and those volumes, etc. will not be reflected in dashboards or tables.

Keep the following in mind:

  • Snapshot purge won’t happen as per the settings configured on a paused collector.

  • EMS events (like ONTAP ARP) won’t be processed on a paused collector. This means if ONTAP identifies a ransomware attack, Cloud Insights Workload Security won’t be able to acquire that event.

  • Health notifications emails will NOT be sent for a paused collector.

  • Manual or Automatic actions (such as Snapshot or User Blocking) will not be supported on a paused collector.

  • On agent or collector upgrades, agent VM restarts/reboots, or agent service restart, a paused collector will remain in Paused state.

  • If the data collector is in Error state, the collector cannot be changed to Paused state. The Pause button will be enabled only if the state of the collector is Running.

  • If the agent is disconnected, the collector cannot be changed to Paused state. The collector will go into Stopped state and the Pause button will be disabled.

Troubleshooting

Known problems and their resolutions are described in the following table.

In the case of an error, click on more detail in the Status column for detail about the error.

CS Data Collector Error

Problem: Resolution:

Data Collector runs for some time and stops after a random time, failing with: "Error message: Connector is in error state. Service name: audit. Reason for failure: External fpolicy server overloaded."

The event rate from ONTAP was much higher than what the Agent box can handle. Hence the connection got terminated.

Check the peak traffic in CloudSecure when the disconnection happened. This you can check from the CloudSecure > Activity Forensics > All Activity page.

If the peak aggregated traffic is higher than what the Agent Box can handle, then please refer to the Event Rate Checker page on how to size for Collector deployment in an Agent Box.

If the Agent was installed in the Agent box prior to 4 March 2021, run the following commands in the Agent box:

echo 'net.core.rmem_max=8388608' >> /etc/sysctl.conf
echo 'net.ipv4.tcp_rmem = 4096 2097152 8388608' >> /etc/sysctl.conf
sysctl -p

Restart the collector from the UI after resizing.

Collector reports Error Message: “No local IP address found on the connector that can reach the data interfaces of the SVM”.

This is most likely due to a networking issue on the ONTAP side. Please follow these steps:

1. Ensure that there are no firewalls on the SVM data lif or the management lif which are blocking the connection from the SVM.

2. When adding an SVM via a cluster management IP, please ensure that the data lif and management lif of the SVM are pingable from the Agent VM. In case of issues, check the gateway, netmask and routes for the lif.

You can also try logging in to the cluster via ssh using the cluster management IP, and ping the Agent IP. Make sure that the agent IP is pingable:

network ping -vserver <vserver name> -destination <Agent IP> -lif <Lif Name> -show-detail

If not pingable, make sure the network settings in ONTAP are correct, so that the Agent machine is pingable.

3. If you have tried connecting via Cluster IP and it is not working, try connecting directly via SVM IP. Please see above for the steps to connect via SVM IP.

4. While adding the collector via SVM IP and vsadmin credentials, check if the SVM Lif has Data plus Mgmt role enabled. In this case ping to the SVM Lif will work, however SSH to the SVM Lif will not work.
If yes, create an SVM Mgmt Only Lif and try connecting via this SVM management only Lif.

5. If it is still not working, create a new SVM Lif and try connecting through that Lif. Make sure that the subnet mask is correctly set.

6. Advanced Debugging:
a) Start a packet trace in ONTAP.
b) Try to connect a data collector to the SVM from CloudSecure UI.
c) Wait till the error appears. Stop the packet trace in ONTAP.
d) Open the packet trace from ONTAP. It is available at this location

https://<cluster_mgmt_ip>/spi/<clustername>/etc/log/packet_traces/

e) Make sure there is a SYN from ONTAP to the Agent box.
f) If there is no SYN from ONTAP then it is an issue with firewall in ONTAP.
g) Open the firewall in ONTAP, so that ONTAP is able to connect the agent box.

7. If it is still not working, please consult the networking team to make sure that no external firewall is blocking the connection from ONTAP to the Agent box.

8. Verify that port 7 is open.

9. If none of the above solves the issue, open a case with Netapp Support for further assistance.

Message: "Failed to determine ONTAP type for [hostname: <IP Address>. Reason: Connection error to Storage System <IP Address>: Host is unreachable (Host unreachable)"

1. Verify that the correct SVM IP Management address or Cluster Management IP has been provided.
2. SSH to the SVM or the Cluster to which you are intending to connect. Once you are connected ensure that the SVM or the Cluster name is correct.

Error Message: "Connector is in error state. Service.name: audit. Reason for failure: External fpolicy server terminated."

1. It is most likely that a firewall is blocking the necessary ports in the agent machine. Verify the port range 35000-55000/tcp is opened for the agent machine to connect from the SVM. Also ensure that there are no firewalls enabled from the ONTAP side blocking communication to the agent machine.

2. Type the following command in the Agent box and ensure that the port range is open.

sudo iptables-save | grep 3500*

Sample output should look like:

-A IN_public_allow -p tcp -m tcp --dport 35000 -m conntrack -ctstate NEW -j ACCEPT

3. Login to SVM, enter the following commands and check that no firewall is set to block the communication with ONTAP.

system services firewall show
system services firewall policy show

Check firewall commands on the ONTAP side.

4. SSH to the SVM/Cluster which you want to monitor. Ping the Agent box from the SVM data lif (with CIFS, NFS protocols support) and ensure that ping is working:

network ping -vserver <vserver name> -destination <Agent IP> -lif <Lif Name> -show-detail

If not pingable, make sure the network settings in ONTAP are correct, so that the Agent machine is pingable.

5.If a single SVM is added twice added to a tenant via 2 data collectors, then this error will be shown. Delete one of the data collectors thru the UI. Then restart the other data collector thru the UI. Then the data collector will show “RUNNING” status and will start receiving events from SVM.

Basically, in a tenant, 1 SVM should be added only once, via 1 data collector. 1 SVM should not added twice via 2 data collectors.

6. In instances where the same SVM was added in two different Workload Security environments (tenants), the last one will always succeed. The second collector will configure fpolicy with its own IP address and kick out the first one. So the collector in the first one will stop receiving events and its "audit" service will enter into error state.
To prevent this, configure each SVM on a single environment.


7. This error may also occur if service policies are not configured correctly. With ONTAP 9.8 or later, in order to connect to the Data Source Collector, the data-fpolicy-client service is required along with the data service data-nfs, and/or data-cifs. Additionally, the data-fpolicy-client service must be associated with the data lif(s) for the monitored SVM.

No events seen in activity page.

1. Check if ONTAP collector is in “RUNNING” state. If yes, then ensure that some cifs events are being generated on the cifs client VMs by opening some files.

2. If no activities are seen, please login to the SVM and enter the following command.
<SVM>event log show -source fpolicy
Please ensure that there are no errors related to fpolicy.

3. If no activities are seen, please login to the SVM. Enter the following command
<SVM>fpolicy show
Check if the fpolicy policy named with prefix “cloudsecure_” has been set and status is “on”. If not set, then most likely the Agent is unable to execute the commands in the SVM. Please ensure all the prerequisites as described in the beginning of the page have been followed.

SVM Data Collector is in error state and Errror message is “Agent failed to connect to the collector”

1. Most likely the Agent is overloaded and is unable to connect to the Data Source collectors.
2. Check how many Data Source collectors are connected to the Agent.
3. Also check the data flow rate in the “All Activity” page in the UI.
4. If the number of activities per second is significantly high, install another Agent and move some of the Data Source Collectors to the new Agent.

SVM Data Collector shows error message as "fpolicy.server.connectError: Node failed to establish a connection with the FPolicy server "12.195.15.146" ( reason: "Select Timed out")"

Firewall is enabled in SVM/Cluster. So fpolicy engine is unable to connect to fpolicy server.
CLIs in ONTAP which can be used to get more information are:

event log show -source fpolicy which shows the error
event log show -source fpolicy -fields event,action,description which shows more details.

Check firewall commands on the ONTAP side.

Error Message: “Connector is in error state. Service name:audit. Reason for failure: No valid data interface (role: data,data protocols: NFS or CIFS or both, status: up) found on the SVM.”

Ensure there is an operational interface (having role as data and data protocol as CIFS/NFS.

The data collector goes into Error state and then goes into RUNNING state after some time, then back to Error again. This cycle repeats.

This typically happens in the following scenario:
1. There are multiple data collectors added.
2. The data collectors which show this kind of behavior will have 1 SVM added to these data collectors. Meaning 2 or more data collectors are connected to 1 SVM.
3. Ensure 1 data collector connects to only 1 SVM.
4. Delete the other data collectors which are connected to the same SVM.

Connector is in error state. Service name: audit. Reason for failure: Failed to configure (policy on SVM svmname. Reason: Invalid value specified for 'shares-to-include' element within 'fpolicy.policy.scope-modify: "Federal'

The share names need to be given without any quotes. Edit the ONTAP SVM DSC configuration to correct the share names.

Include and exclude shares is not intended for a long list of share names. Use filtering by volume instead if you have a large number of shares to include or exclude.

There are existing fpolicies in the Cluster which are unused. What should be done with those prior to installation of Workload Security?

It is recommended to delete all existing unused fpolicy settings even if they are in disconnected state. Workload Security will create fpolicy with the prefix "cloudsecure_". All other unused fpolicy configurations can be deleted.

CLI command to show fpolicy list:

fpolicy show

Steps to delete fpolicy configurations:

fpolicy disable -vserver <svmname> -policy-name <policy_name>
fpolicy policy scope delete -vserver <svmname> -policy-name <policy_name>
fpolicy policy delete -vserver <svmname> -policy-name <policy_name>
fpolicy policy event delete -vserver <svmname> -event-name <event_list>
fpolicy policy external-engine delete -vserver <svmname> -engine-name <engine_name>

After enabling Workload Security, ONTAP performance is impacted: Latency becomes sporadically high, IOPs become sporadically low.

While using ONTAP with Workload Security sometimes latency issues can be seen in ONTAP. There are a number of possible reasons for this as noted in the following: 1372994, 1415152, 1438207, 1479704, 1354659. All of these issues are fixed in ONTAP 9.13.1 and later; it is strongly recommended to use one of these later versions.

Data collector is in error, shows this error message.
“Error: Connector is in error state. Service name: audit. Reason for failure: Failed to configure policy on SVM svm_test. Reason: Missing value for zapi field: events. “

Start with a new SVM with only NFS service configured.
Add an ONTAP SVM data collector in Workload Security. CIFS is configured as an allowed protocol for the SVM while adding the ONTAP SVM Data Collector in Workload Security.
Wait until the Data collector in Workload Security shows an error.
Since the CIFS server is NOT configured on the SVM, this error as shown in the left is shown by Workload Security.
Edit the ONTAP SVM data collector and un-check CIFs as allowed protocol. Save the data collector. It will start running with only NFS protocol enabled.

Data Collector shows the error message:
“Error: Failed to determine the health of the collector within 2 retries, try restarting the collector again (Error Code: AGENT008)”.

1. On the Data Collectors page, scroll to the right of the data collector giving the error and click on the 3 dots menu. Select Edit.
Enter the password of the data collector again.
Save the data collector by pressing on the Save button.
Data Collector will restart and the error should be resolved.

2. The Agent machine may not enough CPU or RAM headroom, that is why the DSCs are failing.
Please check the number of Data Collectors which are added to the Agent in the machine.
If it is more than 20, please increase the CPU and RAM capacity of the Agent machine.
Once the CPU and RAM is increased, the DSCs will get into Initializing and then to Running state automatically.
Look into the sizing guide on this page.

If you are still experiencing problems, reach out to the support links mentioned in the Help > Support page.