Milvus与Amazon FSx ONTAP for NetApp ONTAP—文件和对象双重性
本节讨论了使用Amazon FSx ONTAP为适用于NetApp的向量数据库解决方案设置的Milvus集群。
Milvus与Amazon FSx ONTAP for NetApp ONTAP—文件和对象双重性
在本节中、我们需要在云中部署向量数据库的原因以及在Docker容器中的Amazon FSx ONTAP for NetApp ONTAP中部署向量数据库(Milvus独立)的步骤。
在云中部署矢量数据库可提供多项显著优势、尤其是对于需要处理高维度数据和执行相似性搜索的应用程序。首先、基于云的部署提供可扩展性、支持轻松调整资源、以满足不断增长的数据量和查询负载的需求。这样可以确保数据库在保持高性能的同时高效地处理不断增长的需求。其次、云部署可提供高可用性和灾难恢复、因为可以在不同地理位置之间复制数据、从而最大限度地降低数据丢失的风险、并确保即使在意外事件发生时也能持续提供服务。第三、它不仅经济高效、因为您只需为所使用的资源付费、而且可以根据需要进行扩展或缩减、从而无需在前期投入大量硬件。最后、在云中部署矢量数据库可以增强协作、因为数据可以从任何位置访问和共享、从而促进基于团队的工作和数据驱动的决策。请检查在此验证中使用的Milvus独立架构以及Amazon FSx ONTAP for NetApp ONTAP。
-
创建Amazon FSx ONTAP for NetApp ONTAP实例、并记下VPC、VPC安全组和子网的详细信息。创建EC2实例时需要此信息。有关详细信息、请单击此处- https://us-east-1.console.aws.amazon.com/fsx/home?region=us-east-1#file-system-create
-
创建EC2实例、确保VPC、安全组和子网与Amazon FSx ONTAP for NetApp ONTAP实例的VPC、安全组和子网匹配。
-
使用命令"apt-get install NFS-common"安装NFS-common、并使用"sudo apt-get update"更新软件包信息。
-
创建一个挂载文件夹、并在此文件夹上挂载Amazon FSx ONTAP for NetApp ONTAP。
ubuntu@ip-172-31-29-98:~$ mkdir /home/ubuntu/milvusvectordb ubuntu@ip-172-31-29-98:~$ sudo mount 172.31.255.228:/vol1 /home/ubuntu/milvusvectordb ubuntu@ip-172-31-29-98:~$ df -h /home/ubuntu/milvusvectordb Filesystem Size Used Avail Use% Mounted on 172.31.255.228:/vol1 973G 126G 848G 13% /home/ubuntu/milvusvectordb ubuntu@ip-172-31-29-98:~$
-
使用"apt-get install"安装Docker和Docker准备。
-
基于Docker compose.yaml文件设置Milvus群集,该文件可从Milvus网站下载。
root@ip-172-31-22-245:~# wget https://github.com/milvus-io/milvus/releases/download/v2.0.2/milvus-standalone-docker-compose.yml -O docker-compose.yml --2024-04-01 14:52:23-- https://github.com/milvus-io/milvus/releases/download/v2.0.2/milvus-standalone-docker-compose.yml <removed some output to save page space>
-
在Docker compose .yml文件的"volumes"部分中、将NetApp NFS挂载点映射到相应的Milvus容器路径、尤其是etd、minio和standalone.Check中的路径 "附录D:dkder-compose.yml" 有关yml更改的详细信息
-
验证已挂载的文件夹和文件。
ubuntu@ip-172-31-29-98:~/milvusvectordb$ ls -ltrh /home/ubuntu/milvusvectordb total 8.0K -rw-r--r-- 1 root root 1.8K Apr 2 16:35 s3_access.py drwxrwxrwx 2 root root 4.0K Apr 4 20:19 volumes ubuntu@ip-172-31-29-98:~/milvusvectordb$ ls -ltrh /home/ubuntu/milvusvectordb/volumes/ total 0 ubuntu@ip-172-31-29-98:~/milvusvectordb$ cd ubuntu@ip-172-31-29-98:~$ ls docker-compose.yml docker-compose.yml~ milvus.yaml milvusvectordb vectordbvol1 ubuntu@ip-172-31-29-98:~$
-
从包含docker-compose.yml文件的目录中运行docker-compose up -d。
-
检查Milvus容器的状态。
ubuntu@ip-172-31-29-98:~$ sudo docker-compose ps Name Command State Ports ---------------------------------------------------------------------------------------------------------------------------------------------------------- milvus-etcd etcd -advertise-client-url ... Up (healthy) 2379/tcp, 2380/tcp milvus-minio /usr/bin/docker-entrypoint ... Up (healthy) 0.0.0.0:9000->9000/tcp,:::9000->9000/tcp, 0.0.0.0:9001->9001/tcp,:::9001->9001/tcp milvus-standalone /tini -- milvus run standalone Up (healthy) 0.0.0.0:19530->19530/tcp,:::19530->19530/tcp, 0.0.0.0:9091->9091/tcp,:::9091->9091/tcp ubuntu@ip-172-31-29-98:~$ ubuntu@ip-172-31-29-98:~$ ls -ltrh /home/ubuntu/milvusvectordb/volumes/ total 12K drwxr-xr-x 3 root root 4.0K Apr 4 20:21 etcd drwxr-xr-x 4 root root 4.0K Apr 4 20:21 minio drwxr-xr-x 5 root root 4.0K Apr 4 20:21 milvus ubuntu@ip-172-31-29-98:~$
-
为了验证矢量数据库及其在Amazon FSx ONTAP for NetApp ONTAP中的数据的读写功能、我们使用了Python Milvus SDK和PyMilvus提供的示例程序。使用"apt-get install python3-NumPy python3-pip"安装必要的软件包、并使用"pip3 install pymilvus"安装PyMilvus。
-
验证矢量数据库中Amazon FSx ONTAP for NetApp ONTAP的数据写入和读取操作。
root@ip-172-31-29-98:~/pymilvus/examples# python3 prepare_data_netapp_new.py === start connecting to Milvus === === Milvus host: localhost === Does collection hello_milvus_ntapnew_sc exist in Milvus: True === Drop collection - hello_milvus_ntapnew_sc === === Drop collection - hello_milvus_ntapnew_sc2 === === Create collection `hello_milvus_ntapnew_sc` === === Start inserting entities === Number of entities in hello_milvus_ntapnew_sc: 9000 root@ip-172-31-29-98:~/pymilvus/examples# find /home/ubuntu/milvusvectordb/ … <removed content to save page space > … /home/ubuntu/milvusvectordb/volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/103/448789845791411923/b3def25f-c117-4fba-8256-96cb7557cd6c /home/ubuntu/milvusvectordb/volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/103/448789845791411923/b3def25f-c117-4fba-8256-96cb7557cd6c/part.1 /home/ubuntu/milvusvectordb/volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/103/448789845791411923/xl.meta /home/ubuntu/milvusvectordb/volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/0 /home/ubuntu/milvusvectordb/volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/0/448789845791411924 /home/ubuntu/milvusvectordb/volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/0/448789845791411924/xl.meta /home/ubuntu/milvusvectordb/volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/1 /home/ubuntu/milvusvectordb/volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/1/448789845791411925 /home/ubuntu/milvusvectordb/volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/1/448789845791411925/xl.meta /home/ubuntu/milvusvectordb/volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/100 /home/ubuntu/milvusvectordb/volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/100/448789845791411920 /home/ubuntu/milvusvectordb/volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/100/448789845791411920/xl.meta
-
使用verify_data_netapp.py脚本检查读取操作。
root@ip-172-31-29-98:~/pymilvus/examples# python3 verify_data_netapp.py === start connecting to Milvus === === Milvus host: localhost === Does collection hello_milvus_ntapnew_sc exist in Milvus: True {'auto_id': False, 'description': 'hello_milvus_ntapnew_sc', 'fields': [{'name': 'pk', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'random', 'description': '', 'type': <DataType.DOUBLE: 11>}, {'name': 'var', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 65535}}, {'name': 'embeddings', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 8}}], 'enable_dynamic_field': False} Number of entities in Milvus: hello_milvus_ntapnew_sc : 9000 === Start Creating index IVF_FLAT === === Start loading === === Start searching based on vector similarity === hit: id: 2248, distance: 0.0, entity: {'random': 0.2777646777746381}, random field: 0.2777646777746381 hit: id: 4837, distance: 0.07805602252483368, entity: {'random': 0.6451650959930306}, random field: 0.6451650959930306 hit: id: 7172, distance: 0.07954417169094086, entity: {'random': 0.6141351712303128}, random field: 0.6141351712303128 hit: id: 2249, distance: 0.0, entity: {'random': 0.7434908973629817}, random field: 0.7434908973629817 hit: id: 830, distance: 0.05628090724349022, entity: {'random': 0.8544487225667627}, random field: 0.8544487225667627 hit: id: 8562, distance: 0.07971227169036865, entity: {'random': 0.4464554280115878}, random field: 0.4464554280115878 search latency = 0.1266s === Start querying with `random > 0.5` === query result: -{'random': 0.6378742006852851, 'embeddings': [0.3017092, 0.74452263, 0.8009826, 0.4927033, 0.12762444, 0.29869467, 0.52859956, 0.23734547], 'pk': 0} search latency = 0.3294s === Start hybrid searching with `random > 0.5` === hit: id: 4837, distance: 0.07805602252483368, entity: {'random': 0.6451650959930306}, random field: 0.6451650959930306 hit: id: 7172, distance: 0.07954417169094086, entity: {'random': 0.6141351712303128}, random field: 0.6141351712303128 hit: id: 515, distance: 0.09590047597885132, entity: {'random': 0.8013175797590888}, random field: 0.8013175797590888 hit: id: 2249, distance: 0.0, entity: {'random': 0.7434908973629817}, random field: 0.7434908973629817 hit: id: 830, distance: 0.05628090724349022, entity: {'random': 0.8544487225667627}, random field: 0.8544487225667627 hit: id: 1627, distance: 0.08096684515476227, entity: {'random': 0.9302397069516164}, random field: 0.9302397069516164 search latency = 0.2674s Does collection hello_milvus_ntapnew_sc2 exist in Milvus: True {'auto_id': True, 'description': 'hello_milvus_ntapnew_sc2', 'fields': [{'name': 'pk', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': True}, {'name': 'random', 'description': '', 'type': <DataType.DOUBLE: 11>}, {'name': 'var', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 65535}}, {'name': 'embeddings', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 8}}], 'enable_dynamic_field': False}
-
如果客户希望通过S3协议访问(读取)在矢量数据库中测试的AI工作负载NFS数据、可以使用简单的Python程序进行验证。例如、本节开头的图片中提到的对其他应用程序中的图像进行相似性搜索。
root@ip-172-31-29-98:~/pymilvus/examples# sudo python3 /home/ubuntu/milvusvectordb/s3_access.py -i 172.31.255.228 --bucket milvusnasvol --access-key PY6UF318996I86NBYNDD --secret-key hoPctr9aD88c1j0SkIYZ2uPa03vlbqKA0c5feK6F OBJECTS in the bucket milvusnasvol are : *************************************** … <output content removed to save page space> … bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611920/0/448789845791411917/xl.meta volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611920/1/448789845791411918/xl.meta volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611920/100/448789845791411913/xl.meta volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611920/101/448789845791411914/xl.meta volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611920/102/448789845791411915/xl.meta volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611920/103/448789845791411916/1c48ab6e-1546-4503-9084-28c629216c33/part.1 volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611920/103/448789845791411916/xl.meta volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/0/448789845791411924/xl.meta volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/1/448789845791411925/xl.meta volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/100/448789845791411920/xl.meta volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/101/448789845791411921/xl.meta volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/102/448789845791411922/xl.meta volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/103/448789845791411923/b3def25f-c117-4fba-8256-96cb7557cd6c/part.1 volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/103/448789845791411923/xl.meta volumes/minio/a-bucket/files/stats_log/448789845791211880/448789845791211881/448789845791411889/100/1/xl.meta volumes/minio/a-bucket/files/stats_log/448789845791211880/448789845791211881/448789845791411889/100/448789845791411912/xl.meta volumes/minio/a-bucket/files/stats_log/448789845791611912/448789845791611913/448789845791611920/100/1/xl.meta volumes/minio/a-bucket/files/stats_log/448789845791611912/448789845791611913/448789845791611920/100/448789845791411919/xl.meta volumes/minio/a-bucket/files/stats_log/448789845791611912/448789845791611913/448789845791611939/100/1/xl.meta volumes/minio/a-bucket/files/stats_log/448789845791611912/448789845791611913/448789845791611939/100/448789845791411926/xl.meta *************************************** root@ip-172-31-29-98:~/pymilvus/examples#
本节有效地演示了客户如何利用Amazon的NetApp FSx ONTAP进行NetApp ONTAP数据存储、在Docker容器中部署和运行独立的Milvus设置。通过这种设置、客户可以在可扩展且高效的Docker容器环境中利用向量数据库的强大功能来处理高维数据和执行复杂查询。通过创建Amazon FSx ONTAP for NetApp ONTAP实例并匹配EC2实例、客户可以确保最佳的资源利用率和数据管理。成功验证了矢量数据库中FSx ONTAP的数据写入和读取操作、为客户提供了可靠且一致的数据操作保障。此外、通过S3协议列出(读取) AI工作负载中的数据、还增强了数据可访问性。因此、这一全面的流程为客户提供了一个强大而高效的解决方案、用于管理其大规模数据运营、并充分利用Amazon FSx ONTAP for NetApp ONTAP的功能。