Skip to main content
NetApp Solutions
简体中文版经机器翻译而成,仅供参考。如与英语版出现任何冲突,应以英语版为准。

使用SnapCenter的向量数据库保护

贡献者

本节介绍如何使用NetApp SnapCenter为矢量数据库提供数据保护。

使用NetApp SnapCenter保护向量数据库。

例如、在电影制作行业、客户通常拥有视频和音频文件等关键嵌入式数据。由于硬盘故障等问题而丢失这些数据可能会对其运营产生重大影响、从而可能危及数百万美元的企业。我们遇到过宝贵的内容丢失的情况、导致了重大中断和财务损失。因此、确保这些重要数据的安全性和完整性在该行业中至关重要。
在本节中、我们将深入探讨SnapCenter如何保护驻留在ONTAP中的向量数据库数据和Milvus数据。在本示例中、我们会使用从NFS ONTAP卷(vol1)派生的NAS存储分段(milvusdbvol1)存储客户数据、并使用单独的NFS卷(v/tordbpv)存储Milvus集群配置数据。请检查 "此处" 适用于SnapCenter备份工作流

  1. 设置要用于执行SnapCenter命令的主机。

    图中显示了输入/输出对话框或表示已写入内容

  2. 安装和配置存储插件。从添加的主机中、选择"More Options (更多选项)"。导航到并从中选择已下载的存储插件 "NetApp 自动化商店"。安装插件并保存配置。

    图中显示了输入/输出对话框或表示已写入内容

  3. 设置存储系统和卷:在"存储系统"下添加存储系统、然后选择SVM (Storage Virtual Machine)。在此示例中、我们选择了"vs_nidia"。

    图中显示了输入/输出对话框或表示已写入内容

  4. 为向量数据库建立一个资源、其中包含备份策略和自定义快照名称。

    • 使用默认值启用一致性组备份、并在文件系统不一致的情况下启用SnapCenter。

    • 在存储占用空间部分中、选择与向量数据库客户数据和Milvus集群数据关联的卷。在我们的示例中、这些卷分别为"vol1"和"v∶v∶ordbpv"。

    • 创建用于矢量数据库保护的策略并使用该策略保护矢量数据库资源。

      图中显示了输入/输出对话框或表示已写入内容

  5. 使用Python脚本将数据插入S3 NAS分段。在本示例中、我们修改了Milvus提供的备份脚本、即"prepy_data_NetApp.py"、并执行了"ync"命令从操作系统中刷新数据。

    root@node2:~# python3 prepare_data_netapp.py
    
    === start connecting to Milvus     ===
    
    
    === Milvus host: localhost         ===
    
    Does collection hello_milvus_netapp_sc_test exist in Milvus: False
    
    === Create collection `hello_milvus_netapp_sc_test` ===
    
    
    === Start inserting entities       ===
    
    Number of entities in hello_milvus_netapp_sc_test: 3000
    
    === Create collection `hello_milvus_netapp_sc_test2` ===
    
    Number of entities in hello_milvus_netapp_sc_test2: 6000
    root@node2:~# for i in 2 3 4 5 6   ; do ssh node$i "hostname; sync; echo 'sync executed';" ; done
    node2
    sync executed
    node3
    sync executed
    node4
    sync executed
    node5
    sync executed
    node6
    sync executed
    root@node2:~#
  6. 验证S3 NAS存储分段中的数据。在本示例中、时间戳为"2024-04-08 21:22"的文件是由"prepy_data_NetApp.py"脚本创建的。

    root@node2:~# aws s3 ls --profile ontaps3  s3://milvusdbvol1/ --recursive | grep '2024-04-08'
    
    <output content removed to save page space>
    2024-04-08 21:18:14       5656 stats_log/448950615991000809/448950615991000810/448950615991001854/100/1
    2024-04-08 21:18:12       5654 stats_log/448950615991000809/448950615991000810/448950615991001854/100/448950615990800869
    2024-04-08 21:18:17       5656 stats_log/448950615991000809/448950615991000810/448950615991001872/100/1
    2024-04-08 21:18:15       5654 stats_log/448950615991000809/448950615991000810/448950615991001872/100/448950615990800876
    2024-04-08 21:22:46       5625 stats_log/448950615991003377/448950615991003378/448950615991003385/100/1
    2024-04-08 21:22:45       5623 stats_log/448950615991003377/448950615991003378/448950615991003385/100/448950615990800899
    2024-04-08 21:22:49       5656 stats_log/448950615991003408/448950615991003409/448950615991003416/100/1
    2024-04-08 21:22:47       5654 stats_log/448950615991003408/448950615991003409/448950615991003416/100/448950615990800906
    2024-04-08 21:22:52       5656 stats_log/448950615991003408/448950615991003409/448950615991003434/100/1
    2024-04-08 21:22:50       5654 stats_log/448950615991003408/448950615991003409/448950615991003434/100/448950615990800913
    root@node2:~#
  7. 使用"ilvusdb"资源中的一致性组(CG)快照启动备份

    图中显示了输入/输出对话框或表示已写入内容

  8. 为了测试备份功能、我们会在备份过程后添加一个新表、或者从NFS (S3 NAS存储分段)中删除一些数据。

    在此测试中、假设有人在备份后创建了新的、不必要的或不适当的集合。在这种情况下,我们需要在添加新集合之前将引导程序数据库还原到其状态。例如、已插入新集合、例如"hello milvus_NetApp_sc_testnew"和"hello milvus_NetApp_sc_testnew2"。

    root@node2:~# python3 prepare_data_netapp.py
    
    === start connecting to Milvus     ===
    
    
    === Milvus host: localhost         ===
    
    Does collection hello_milvus_netapp_sc_testnew exist in Milvus: False
    
    === Create collection `hello_milvus_netapp_sc_testnew` ===
    
    
    === Start inserting entities       ===
    
    Number of entities in hello_milvus_netapp_sc_testnew: 3000
    
    === Create collection `hello_milvus_netapp_sc_testnew2` ===
    
    Number of entities in hello_milvus_netapp_sc_testnew2: 6000
    root@node2:~#
  9. 从上一个快照执行S3 NAS存储分段的完全还原。

    图中显示了输入/输出对话框或表示已写入内容

  10. 使用Python脚本验证"hello milvus_NetApp_sc_test"和"hello milvus_NetApp_sc_test2"集合中的数据。

    root@node2:~# python3 verify_data_netapp.py
    
    === start connecting to Milvus     ===
    
    
    === Milvus host: localhost         ===
    
    Does collection hello_milvus_netapp_sc_test exist in Milvus: True
    {'auto_id': False, 'description': 'hello_milvus_netapp_sc_test', 'fields': [{'name': 'pk', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'random', 'description': '', 'type': <DataType.DOUBLE: 11>}, {'name': 'var', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 65535}}, {'name': 'embeddings', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 8}}]}
    Number of entities in Milvus: hello_milvus_netapp_sc_test : 3000
    
    === Start Creating index IVF_FLAT  ===
    
    
    === Start loading                  ===
    
    
    === Start searching based on vector similarity ===
    
    hit: id: 2998, distance: 0.0, entity: {'random': 0.9728033590489911}, random field: 0.9728033590489911
    hit: id: 1262, distance: 0.08883658051490784, entity: {'random': 0.2978858685751561}, random field: 0.2978858685751561
    hit: id: 1265, distance: 0.09590047597885132, entity: {'random': 0.3042039939240304}, random field: 0.3042039939240304
    hit: id: 2999, distance: 0.0, entity: {'random': 0.02316334456872482}, random field: 0.02316334456872482
    hit: id: 1580, distance: 0.05628091096878052, entity: {'random': 0.3855988746044062}, random field: 0.3855988746044062
    hit: id: 2377, distance: 0.08096685260534286, entity: {'random': 0.8745922204004368}, random field: 0.8745922204004368
    search latency = 0.2832s
    
    === Start querying with `random > 0.5` ===
    
    query result:
    -{'random': 0.6378742006852851, 'embeddings': [0.20963514, 0.39746657, 0.12019053, 0.6947492, 0.9535575, 0.5454552, 0.82360446, 0.21096309], 'pk': 0}
    search latency = 0.2257s
    
    === Start hybrid searching with `random > 0.5` ===
    
    hit: id: 2998, distance: 0.0, entity: {'random': 0.9728033590489911}, random field: 0.9728033590489911
    hit: id: 747, distance: 0.14606499671936035, entity: {'random': 0.5648774800635661}, random field: 0.5648774800635661
    hit: id: 2527, distance: 0.1530652642250061, entity: {'random': 0.8928974315571507}, random field: 0.8928974315571507
    hit: id: 2377, distance: 0.08096685260534286, entity: {'random': 0.8745922204004368}, random field: 0.8745922204004368
    hit: id: 2034, distance: 0.20354536175727844, entity: {'random': 0.5526117606328499}, random field: 0.5526117606328499
    hit: id: 958, distance: 0.21908017992973328, entity: {'random': 0.6647383716417955}, random field: 0.6647383716417955
    search latency = 0.5480s
    Does collection hello_milvus_netapp_sc_test2 exist in Milvus: True
    {'auto_id': True, 'description': 'hello_milvus_netapp_sc_test2', 'fields': [{'name': 'pk', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': True}, {'name': 'random', 'description': '', 'type': <DataType.DOUBLE: 11>}, {'name': 'var', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 65535}}, {'name': 'embeddings', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 8}}]}
    Number of entities in Milvus: hello_milvus_netapp_sc_test2 : 6000
    
    === Start Creating index IVF_FLAT  ===
    
    
    === Start loading                  ===
    
    
    === Start searching based on vector similarity ===
    
    hit: id: 448950615990642008, distance: 0.07805602252483368, entity: {'random': 0.5326684390871348}, random field: 0.5326684390871348
    hit: id: 448950615990645009, distance: 0.07805602252483368, entity: {'random': 0.5326684390871348}, random field: 0.5326684390871348
    hit: id: 448950615990640618, distance: 0.13562293350696564, entity: {'random': 0.7864676926688837}, random field: 0.7864676926688837
    hit: id: 448950615990642314, distance: 0.10414951294660568, entity: {'random': 0.2209597460821181}, random field: 0.2209597460821181
    hit: id: 448950615990645315, distance: 0.10414951294660568, entity: {'random': 0.2209597460821181}, random field: 0.2209597460821181
    hit: id: 448950615990640004, distance: 0.11571306735277176, entity: {'random': 0.7765521996186631}, random field: 0.7765521996186631
    search latency = 0.2381s
    
    === Start querying with `random > 0.5` ===
    
    query result:
    -{'embeddings': [0.15983285, 0.72214717, 0.7414838, 0.44471496, 0.50356466, 0.8750043, 0.316556, 0.7871702], 'pk': 448950615990639798, 'random': 0.7820620141382767}
    search latency = 0.3106s
    
    === Start hybrid searching with `random > 0.5` ===
    
    hit: id: 448950615990642008, distance: 0.07805602252483368, entity: {'random': 0.5326684390871348}, random field: 0.5326684390871348
    hit: id: 448950615990645009, distance: 0.07805602252483368, entity: {'random': 0.5326684390871348}, random field: 0.5326684390871348
    hit: id: 448950615990640618, distance: 0.13562293350696564, entity: {'random': 0.7864676926688837}, random field: 0.7864676926688837
    hit: id: 448950615990640004, distance: 0.11571306735277176, entity: {'random': 0.7765521996186631}, random field: 0.7765521996186631
    hit: id: 448950615990643005, distance: 0.11571306735277176, entity: {'random': 0.7765521996186631}, random field: 0.7765521996186631
    hit: id: 448950615990640402, distance: 0.13665105402469635, entity: {'random': 0.9742541034109935}, random field: 0.9742541034109935
    search latency = 0.4906s
    root@node2:~#
  11. 验证数据库中是否不再存在不必要或不适当的收集。

    root@node2:~# python3 verify_data_netapp.py
    
    === start connecting to Milvus     ===
    
    
    === Milvus host: localhost         ===
    
    Does collection hello_milvus_netapp_sc_testnew exist in Milvus: False
    Traceback (most recent call last):
      File "/root/verify_data_netapp.py", line 37, in <module>
        recover_collection = Collection(recover_collection_name)
      File "/usr/local/lib/python3.10/dist-packages/pymilvus/orm/collection.py", line 137, in __init__
        raise SchemaNotReadyException(
    pymilvus.exceptions.SchemaNotReadyException: <SchemaNotReadyException: (code=1, message=Collection 'hello_milvus_netapp_sc_testnew' not exist, or you can pass in schema to create one.)>
    root@node2:~#

总之、使用NetApp的SnapCenter保护矢量数据库数据以及驻留在ONTAP中的Milvus数据为客户带来了巨大的优势、尤其是在数据完整性至关重要的行业、例如电影制作。SnapCenter能够创建一致的备份并执行完整数据恢复、从而确保关键数据(例如嵌入式视频和音频文件)不会因硬盘故障或其他问题而丢失。这不仅可以防止运营中断、还可以防止出现重大财务损失。

在本节中、我们演示了如何配置SnapCenter以保护驻留在ONTAP中的数据、包括设置主机、安装和配置存储插件以及使用自定义快照名称为矢量数据库创建资源。此外、我们还展示了如何使用一致性组快照执行备份并验证S3 NAS存储分段中的数据。

此外、我们还模拟了备份后创建不必要或不适当的收集的情形。在这种情况下、SnapCenter能够从先前的快照执行完全还原、从而确保向量数据库可以还原到添加新集合之前的状态、从而保持数据库的完整性。这种将数据还原到特定时间点的功能对客户来说非常重要、可以确保他们的数据不仅安全、而且维护正确。因此、NetApp的SnapCenter产品可为客户提供强大可靠的解决方案来实现数据保护和管理。