How ONTAP handles multi-byte file, directory, and qtree names

Contributors

Beginning with ONTAP 9.5, support for 4-byte UTF-8 encoded names enables the creation and display of file, directory, and tree names that include Unicode supplementary characters outside the Basic Multilingual Plane (BMP). In earlier releases, these supplementary characters did not display correctly in multiprotocol environments.

To enable support for 4-byte UTF-8 encoded names, a new utf8mb4 language code is available for the vserver and volume command families.

  • You must create a new volume in one of the following ways:

  • Setting the volume -language option explicitly: volume create -language utf8mb4 {…}

  • Inheriting the volume -language option from an SVM that has been created with or modified for the option: vserver [create|modify] -language utf8mb4 {…}``volume create {…}

  • You cannot modify existing volumes for utf8mb4 support; you must create a new utf8mb4-ready volume, and then migrate the data using client-based copy tools.

    You can update SVMs for utf8mb4 support, but existing volumes retain their original language codes.

    Note

    LUN names with 4-byte UTF-8 characters are not currently supported.

  • Unicode character data is typically represented in Windows file systems applications using the 16-bit Unicode Transformation Format (UTF-16) and in NFS file systems using the 8-bit Unicode Transformation Format (UTF-8).

    In releases prior to ONTAP 9.5, names including UTF-16 supplementary characters that were created by Windows clients were correctly displayed to other Windows clients but were not translated correctly to UTF-8 for NFS clients. Similarly, names with UTF-8 supplementary characters by created NFS clients were not translated correctly to UTF-16 for Windows clients.

  • When you create file names on systems running ONTAP 9.4 or earlier that contain valid or invalid supplementary characters, ONTAP rejects the file name and returns an invalid file name error.

    To avoid this issue, use only BMP characters in file names and avoid using supplementary characters, or upgrade to ONTAP 9.5 or later.

Beginning with ONTAP 9, Unicode characters are allowed in qtree names.

  • You can use either the volume qtree command family or System Manager to set or modify qtree names.

  • qtree names can include multi-byte characters in Unicode format, such as Japanese and Chinese characters.

  • In releases before ONTAP 9.5, only BMP characters (that is, those that could be represented in 3 bytes) were supported.

    Note

    In releases before ONTAP 9.5, the junction-path of the qtree’s parent volume can contain qtree and directory names with Unicode characters. The volume show command displays these names correctly when the parent volume has a UTF-8 language setting. However, if the parent volume language is not one of the UTF-8 language settings, some parts of the junction-path are displayed using a numeric NFS alternate name.

  • In 9.5 and later releases, 4-byte characters are supported in qtree names, provided that the qtree is in a volume enabled for utf8mb4.