Solaris
Configuration topics specific to the Solaris OS.
Solaris NFS mount options
The following table lists the Solaris NFS mount options for a single instance.
File type | Mount options |
---|---|
ADR Home |
|
Controlfiles |
|
|
|
The use of llock
has been proven to dramatically improve performance in customer environments by removing the latency associated with acquiring and releasing locks on the storage system. Use this option with care in environments in which numerous servers are configured to mount the same file systems and Oracle is configured to mount these databases. Although this is a highly unusual configuration, it is used by a small number of customers. If an instance is accidentally started a second time, data corruption can occur because Oracle is unable to detect the lock files on the foreign server. NFS locks do not otherwise offer protection; as in NFS version 3, they are advisory only.
Because the llock
and forcedirectio
parameters are mutually exclusive, it is important that filesystemio_options=setall
is present in the init.ora
file so that directio
is used. Without this parameter, host OS buffer caching is used and performance can be adversely affected.
The following table lists the Solaris NFS RAC mount options.
File type | Mount options |
---|---|
ADR Home |
|
Control files |
|
CRS/Voting |
|
Dedicated |
|
Shared |
|
The primary difference between single-instance and RAC mount options is the addition of noac
and forcedirectio
to the mount options. This addition has the effect of disabling the host OS caching, which enables all instances in the RAC cluster to have a consistent view of the state of the data. Although using the init.ora
parameter filesystemio_options=setall
has the same effect of disabling host caching, it is still necessary to use noac
and forcedirectio
.
The reason actimeo=0
is required for shared ORACLE_HOME
deployments is to facilitate consistency of files such as Oracle password files and spfiles. If each instance in a RAC cluster has a dedicated ORACLE_HOME
, this parameter is not required.
Solaris UFS mount options
NetApp strongly recommends using the logging mount option so that data integrity is preserved in the case of a Solaris host crash or the interruption of FC connectivity. The logging mount option also preserves the usability of Snapshot backups.
Solaris ZFS
Solaris ZFS must be installed and configured carefully to deliver optimum performance.
mvector
Solaris 11 included a change in how it processes large I/O operations which can result in severe performance problems on SAN storage arrays. The problem is documented NetApp tracking bug report 630173, "Solaris 11 ZFS Performance Regression."
This is not an ONTAP bug. It is a Solaris defect that is tracked under Solaris defects 7199305 and 7082975.
You can consult Oracle Support to find out if your version of Solaris 11 is affected, or you can test the workaround by changing zfs_mvector_max_size
to a smaller value.
You can do this by running the following command as root:
[root@host1 ~]# echo "zfs_mvector_max_size/W 0t131072" |mdb -kw
If any unexpected problems arise from this change, it can be easily reversed by running the following command as root:
[root@host1 ~]# echo "zfs_mvector_max_size/W 0t1048576" |mdb -kw
Kernel
Reliable ZFS performance requires a Solaris kernel patched against LUN alignment problems. The fix was introduced with patch 147440-19 in Solaris 10 and with SRU 10.5 for Solaris 11. Only use Solaris 10 and later with ZFS.
LUN configuration
To configure a LUN, complete the following steps:
-
Create a LUN of type
solaris
. -
Install the appropriate Host Utility Kit (HUK) specified by the NetApp Interoperability Matrix Tool (IMT).
-
Follow the instructions in the HUK exactly as described. The basic steps are outlined below, but refer to the latest documentation for the proper procedure.
-
Run the
host_config
utility to update thesd.conf/sdd.conf
file. Doing so allows the SCSI drives to correctly discover ONTAP LUNs. -
Follow the instructions given by the
host_config
utility to enable multipath input/output (MPIO). -
Reboot. This step is required so that any changes are recognized across the system.
-
-
Partition the LUNs and verify that they are properly aligned. See "Appendix B: WAFL Alignment Verification” for instructions on how to directly test and confirm alignment.
zpools
A zpool should only be created after the steps in the LUN Configuration are performed. If the procedure is not done correctly, it can result in serious performance degradation due to the I/O alignment. Optimum performance on ONTAP requires I/O to be aligned to a 4K boundary on a drive. The file systems created on a zpool use an effective block size that is controlled through a parameter called ashift
, which can be viewed by running the command zdb -C
.
The value of ashift
defaults to 9, which means 2^9, or 512 bytes. For optimum performance, the ashift
value must be 12 (2^12=4K). This value is set at the time the zpool is created and cannot be changed, which means that data in zpools with ashift
other than 12 should be migrated by copying data to a newly created zpool.
After creating a zpool, verify the value of ashift
before proceeding. If the value is not 12, the LUNs were not discovered correctly. Destroy the zpool, verify that all steps shown in the relevant Host Utilities documentation were performed correctly, and recreate the zpool.
zpools and Solaris LDOMs
Solaris LDOMs create an additional requirement for making sure that I/O alignment is correct. Although a LUN might be properly discovered as a 4K device, a virtual vdsk device on an LDOM does not inherit the configuration from the I/O domain. The vdsk based on that LUN defaults back to a 512-byte block.
An additional configuration file is required. First, the individual LDOM's must be patched for Oracle bug 15824910 to enable the additional configuration options. This patch has been ported into all currently used versions of Solaris. Once the LDOM is patched, it is ready for configuration of the new properly aligned LUNs as follows:
-
Identify the LUN or LUNs to be used in the new zpool. In this example, it is the c2d1 device.
[root@LDOM1 ~]# echo | format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c2d0 <Unknown-Unknown-0001-100.00GB> /virtual-devices@100/channel-devices@200/disk@0 1. c2d1 <SUN-ZFS Storage 7330-1.0 cyl 1623 alt 2 hd 254 sec 254> /virtual-devices@100/channel-devices@200/disk@1
-
Retrieve the vdc instance of the devices to be used for a ZFS pool:
[root@LDOM1 ~]# cat /etc/path_to_inst # # Caution! This file contains critical kernel state # "/fcoe" 0 "fcoe" "/iscsi" 0 "iscsi" "/pseudo" 0 "pseudo" "/scsi_vhci" 0 "scsi_vhci" "/options" 0 "options" "/virtual-devices@100" 0 "vnex" "/virtual-devices@100/channel-devices@200" 0 "cnex" "/virtual-devices@100/channel-devices@200/disk@0" 0 "vdc" "/virtual-devices@100/channel-devices@200/pciv-communication@0" 0 "vpci" "/virtual-devices@100/channel-devices@200/network@0" 0 "vnet" "/virtual-devices@100/channel-devices@200/network@1" 1 "vnet" "/virtual-devices@100/channel-devices@200/network@2" 2 "vnet" "/virtual-devices@100/channel-devices@200/network@3" 3 "vnet" "/virtual-devices@100/channel-devices@200/disk@1" 1 "vdc" << We want this one
-
Edit
/platform/sun4v/kernel/drv/vdc.conf
:block-size-list="1:4096";
This means that device instance 1 is assigned a block size of 4096.
As an additional example, assume vdsk instances 1 through 6 need to be configured for a 4K block size and
/etc/path_to_inst
reads as follows:"/virtual-devices@100/channel-devices@200/disk@1" 1 "vdc" "/virtual-devices@100/channel-devices@200/disk@2" 2 "vdc" "/virtual-devices@100/channel-devices@200/disk@3" 3 "vdc" "/virtual-devices@100/channel-devices@200/disk@4" 4 "vdc" "/virtual-devices@100/channel-devices@200/disk@5" 5 "vdc" "/virtual-devices@100/channel-devices@200/disk@6" 6 "vdc"
-
The final
vdc.conf
file should contain the following:block-size-list="1:8192","2:8192","3:8192","4:8192","5:8192","6:8192";
Caution The LDOM must be rebooted after vdc.conf is configured and the vdsk is created. This step cannot be avoided. The block size change only takes effect after a reboot. Proceed with zpool configuration and ensure that ashift is properly set to 12 as described previously.
ZFS Intent Log (ZIL)
Generally, there is no reason to locate the ZFS Intent Log (ZIL) on a different device. The log can share space with the main pool. The primary use of a separate ZIL is when using physical drives that lack the write caching features in modern storage arrays.
logbias
Set the logbias
parameter on ZFS file systems hosting Oracle data.
zfs set logbias=throughput <filesystem>
Using this parameter reduces overall write levels. Under the defaults, written data is committed first to the ZIL and then to the main storage pool. This approach is appropriate for a configuration using a plain drive configuration, which includes an SSD-based ZIL device and spinning media for the main storage pool. This is because it allows a commit to occur in a single I/O transaction on the lowest latency media available.
When using a modern storage array that includes its own caching capability, this approach is not generally necessary. Under rare circumstances, it might be desirable to commit a write with a single transaction to the log, such as a workload that consists of highly concentrated, latency-sensitive random writes. There are consequences in the form of write amplification because the logged data is eventually written to the main storage pool, resulting in a doubling of the write activity.
Direct I/O
Many applications, including Oracle products, can bypass the host buffer cache by enabling direct I/O. This strategy does not work as expected with ZFS file systems. Although the host buffer cache is bypassed, ZFS itself continues to cache data. This action can result in misleading results when using tools such as fio or sio to perform performance tests because it is difficult to predict whether I/O is reaching the storage system or whether it is being cached locally within the OS. This action also makes it very difficult to use such synthetic tests to compare ZFS performance to other file systems. As a practical matter, there is little to no difference in file system performance under real user workloads.
Multiple zpools
Snapshot-based backups, restores, clones, and archiving of ZFS-based data must be performed at the level of the zpool and typically requires multiple zpools. A zpool is analogous to an LVM disk group and should be configured using the same rules. For example, a database is probably best laid out with the datafiles residing on zpool1
and the archive logs, control files, and redo logs residing on zpool2
. This approach permits a standard hot backup in which the database is placed in hot backup mode, followed by a snapshot of zpool1
. The database is then removed from hot backup mode, the log archive is forced, and a snapshot of zpool2
is created. A restore operation requires unmounting the zfs file systems and offlining the zpool in its entirety, following by a SnapRestore restore operation. The zpool can then be brought online again and the database recovered.
filesystemio_options
The Oracle parameter filesystemio_options
works differently with ZFS. If setall
or directio
is used, write operations are synchronous and bypass the OS buffer cache, but reads are buffered by ZFS. This action causes difficulties in performance analysis because I/O is sometimes intercepted and serviced by the ZFS cache, making storage latency and total I/O less than it might appear to be.