Milvus 與Amazon FSx ONTAP for NetApp ONTAP - 檔案與物件二元性
本節討論使用Amazon FSx ONTAP為NetApp提供向量資料庫解決方案的 milvus 叢集設定。
Milvus 與Amazon FSx ONTAP for NetApp ONTAP – 檔案與物件二元性
在本節中,我們將介紹為什麼需要在雲端部署向量資料庫,以及在 Docker 容器中的Amazon FSx ONTAP for NetApp ONTAP中部署向量資料庫(milvus 獨立版)的步驟。
在雲端部署向量資料庫有幾個顯著的好處,特別是對於需要處理高維度資料和執行相似性搜尋的應用程式。首先,基於雲端的部署提供了可擴展性,允許輕鬆調整資源以適應不斷增長的資料量和查詢負載。這確保資料庫能夠有效地處理增加的需求,同時保持高效能。其次,雲端部署提供了高可用性和災難復原,因為資料可以在不同的地理位置複製,最大限度地降低資料遺失的風險,並確保即使在意外事件期間也能持續提供服務。第三,它具有成本效益,因為您只需為您使用的資源付費,並且可以根據需求擴大或縮小規模,從而無需在硬體上進行大量的前期投資。最後,在雲端部署向量資料庫可以增強協作,因為可以從任何地方存取和共享數據,從而促進基於團隊的工作和數據驅動的決策。請使用Amazon FSx ONTAP for NetApp ONTAP檢查此驗證中所使用的 milvus 獨立架構。
-
為NetApp ONTAP實例建立Amazon FSx ONTAP ,並記下 VPC、VPC 安全性群組和子網路的詳細資訊。建立 EC2 執行個體時需要此資訊。您可以在此處找到更多詳細資訊 - https://us-east-1.console.aws.amazon.com/fsx/home?region=us-east-1#file-system-create
-
建立一個 EC2 實例,確保 VPC、安全性群組和子網路與Amazon FSx ONTAP for NetApp ONTAP執行個體的 VPC、安全性群組和子網路相符。
-
使用指令“apt-get install nfs-common”安裝 nfs-common,並使用“sudo apt-get update”更新套件資訊。
-
建立一個掛載資料夾並在其上掛載適用於NetApp ONTAP 的Amazon FSx ONTAP 。
ubuntu@ip-172-31-29-98:~$ mkdir /home/ubuntu/milvusvectordb ubuntu@ip-172-31-29-98:~$ sudo mount 172.31.255.228:/vol1 /home/ubuntu/milvusvectordb ubuntu@ip-172-31-29-98:~$ df -h /home/ubuntu/milvusvectordb Filesystem Size Used Avail Use% Mounted on 172.31.255.228:/vol1 973G 126G 848G 13% /home/ubuntu/milvusvectordb ubuntu@ip-172-31-29-98:~$
-
使用“apt-get install”安裝 Docker 和 Docker Compose。
-
根據 docker-compose.yaml 檔案建立 Milvus 集群,該檔案可以從 Milvus 網站下載。
root@ip-172-31-22-245:~# wget https://github.com/milvus-io/milvus/releases/download/v2.0.2/milvus-standalone-docker-compose.yml -O docker-compose.yml --2024-04-01 14:52:23-- https://github.com/milvus-io/milvus/releases/download/v2.0.2/milvus-standalone-docker-compose.yml <removed some output to save page space>
-
在 docker-compose.yml 檔案的「volumes」部分中,將NetApp NFS 掛載點對應到對應的 Milvus 容器路徑,具體在 etcd、minio 和 standalone 中。檢查"附錄 D:docker-compose.yml"有關 yml 更改的詳細信息
-
驗證已安裝的資料夾和檔案。
ubuntu@ip-172-31-29-98:~/milvusvectordb$ ls -ltrh /home/ubuntu/milvusvectordb total 8.0K -rw-r--r-- 1 root root 1.8K Apr 2 16:35 s3_access.py drwxrwxrwx 2 root root 4.0K Apr 4 20:19 volumes ubuntu@ip-172-31-29-98:~/milvusvectordb$ ls -ltrh /home/ubuntu/milvusvectordb/volumes/ total 0 ubuntu@ip-172-31-29-98:~/milvusvectordb$ cd ubuntu@ip-172-31-29-98:~$ ls docker-compose.yml docker-compose.yml~ milvus.yaml milvusvectordb vectordbvol1 ubuntu@ip-172-31-29-98:~$
-
從包含 docker-compose.yml 檔案的目錄執行「docker-compose up -d」。
-
檢查 Milvus 容器的狀態。
ubuntu@ip-172-31-29-98:~$ sudo docker-compose ps Name Command State Ports ---------------------------------------------------------------------------------------------------------------------------------------------------------- milvus-etcd etcd -advertise-client-url ... Up (healthy) 2379/tcp, 2380/tcp milvus-minio /usr/bin/docker-entrypoint ... Up (healthy) 0.0.0.0:9000->9000/tcp,:::9000->9000/tcp, 0.0.0.0:9001->9001/tcp,:::9001->9001/tcp milvus-standalone /tini -- milvus run standalone Up (healthy) 0.0.0.0:19530->19530/tcp,:::19530->19530/tcp, 0.0.0.0:9091->9091/tcp,:::9091->9091/tcp ubuntu@ip-172-31-29-98:~$ ubuntu@ip-172-31-29-98:~$ ls -ltrh /home/ubuntu/milvusvectordb/volumes/ total 12K drwxr-xr-x 3 root root 4.0K Apr 4 20:21 etcd drwxr-xr-x 4 root root 4.0K Apr 4 20:21 minio drwxr-xr-x 5 root root 4.0K Apr 4 20:21 milvus ubuntu@ip-172-31-29-98:~$
-
為了驗證Amazon FSx ONTAP for NetApp ONTAP中向量資料庫及其資料的讀寫功能,我們使用了 Python Milvus SDK 和來自 PyMilvus 的範例程式。使用「apt-get install python3-numpy python3-pip」安裝必要的軟體包,並使用「pip3 install pymilvus」安裝 PyMilvus。
-
驗證向量資料庫中Amazon FSx ONTAP for NetApp ONTAP的資料寫入和讀取操作。
root@ip-172-31-29-98:~/pymilvus/examples# python3 prepare_data_netapp_new.py === start connecting to Milvus === === Milvus host: localhost === Does collection hello_milvus_ntapnew_sc exist in Milvus: True === Drop collection - hello_milvus_ntapnew_sc === === Drop collection - hello_milvus_ntapnew_sc2 === === Create collection `hello_milvus_ntapnew_sc` === === Start inserting entities === Number of entities in hello_milvus_ntapnew_sc: 9000 root@ip-172-31-29-98:~/pymilvus/examples# find /home/ubuntu/milvusvectordb/ … <removed content to save page space > … /home/ubuntu/milvusvectordb/volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/103/448789845791411923/b3def25f-c117-4fba-8256-96cb7557cd6c /home/ubuntu/milvusvectordb/volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/103/448789845791411923/b3def25f-c117-4fba-8256-96cb7557cd6c/part.1 /home/ubuntu/milvusvectordb/volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/103/448789845791411923/xl.meta /home/ubuntu/milvusvectordb/volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/0 /home/ubuntu/milvusvectordb/volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/0/448789845791411924 /home/ubuntu/milvusvectordb/volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/0/448789845791411924/xl.meta /home/ubuntu/milvusvectordb/volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/1 /home/ubuntu/milvusvectordb/volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/1/448789845791411925 /home/ubuntu/milvusvectordb/volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/1/448789845791411925/xl.meta /home/ubuntu/milvusvectordb/volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/100 /home/ubuntu/milvusvectordb/volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/100/448789845791411920 /home/ubuntu/milvusvectordb/volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/100/448789845791411920/xl.meta
-
使用verify_data_netapp.py腳本檢查讀取操作。
root@ip-172-31-29-98:~/pymilvus/examples# python3 verify_data_netapp.py === start connecting to Milvus === === Milvus host: localhost === Does collection hello_milvus_ntapnew_sc exist in Milvus: True {'auto_id': False, 'description': 'hello_milvus_ntapnew_sc', 'fields': [{'name': 'pk', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'random', 'description': '', 'type': <DataType.DOUBLE: 11>}, {'name': 'var', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 65535}}, {'name': 'embeddings', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 8}}], 'enable_dynamic_field': False} Number of entities in Milvus: hello_milvus_ntapnew_sc : 9000 === Start Creating index IVF_FLAT === === Start loading === === Start searching based on vector similarity === hit: id: 2248, distance: 0.0, entity: {'random': 0.2777646777746381}, random field: 0.2777646777746381 hit: id: 4837, distance: 0.07805602252483368, entity: {'random': 0.6451650959930306}, random field: 0.6451650959930306 hit: id: 7172, distance: 0.07954417169094086, entity: {'random': 0.6141351712303128}, random field: 0.6141351712303128 hit: id: 2249, distance: 0.0, entity: {'random': 0.7434908973629817}, random field: 0.7434908973629817 hit: id: 830, distance: 0.05628090724349022, entity: {'random': 0.8544487225667627}, random field: 0.8544487225667627 hit: id: 8562, distance: 0.07971227169036865, entity: {'random': 0.4464554280115878}, random field: 0.4464554280115878 search latency = 0.1266s === Start querying with `random > 0.5` === query result: -{'random': 0.6378742006852851, 'embeddings': [0.3017092, 0.74452263, 0.8009826, 0.4927033, 0.12762444, 0.29869467, 0.52859956, 0.23734547], 'pk': 0} search latency = 0.3294s === Start hybrid searching with `random > 0.5` === hit: id: 4837, distance: 0.07805602252483368, entity: {'random': 0.6451650959930306}, random field: 0.6451650959930306 hit: id: 7172, distance: 0.07954417169094086, entity: {'random': 0.6141351712303128}, random field: 0.6141351712303128 hit: id: 515, distance: 0.09590047597885132, entity: {'random': 0.8013175797590888}, random field: 0.8013175797590888 hit: id: 2249, distance: 0.0, entity: {'random': 0.7434908973629817}, random field: 0.7434908973629817 hit: id: 830, distance: 0.05628090724349022, entity: {'random': 0.8544487225667627}, random field: 0.8544487225667627 hit: id: 1627, distance: 0.08096684515476227, entity: {'random': 0.9302397069516164}, random field: 0.9302397069516164 search latency = 0.2674s Does collection hello_milvus_ntapnew_sc2 exist in Milvus: True {'auto_id': True, 'description': 'hello_milvus_ntapnew_sc2', 'fields': [{'name': 'pk', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': True}, {'name': 'random', 'description': '', 'type': <DataType.DOUBLE: 11>}, {'name': 'var', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 65535}}, {'name': 'embeddings', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 8}}], 'enable_dynamic_field': False}
-
如果客戶想要透過 S3 協定存取(讀取)向量資料庫中測試的 NFS 資料以用於 AI 工作負載,則可以使用簡單的 Python 程式進行驗證。一個例子可以是來自另一個應用程式的圖像的相似性搜索,如本節開頭的圖片中提到的那樣。
root@ip-172-31-29-98:~/pymilvus/examples# sudo python3 /home/ubuntu/milvusvectordb/s3_access.py -i 172.31.255.228 --bucket milvusnasvol --access-key PY6UF318996I86NBYNDD --secret-key hoPctr9aD88c1j0SkIYZ2uPa03vlbqKA0c5feK6F OBJECTS in the bucket milvusnasvol are : *************************************** … <output content removed to save page space> … bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611920/0/448789845791411917/xl.meta volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611920/1/448789845791411918/xl.meta volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611920/100/448789845791411913/xl.meta volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611920/101/448789845791411914/xl.meta volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611920/102/448789845791411915/xl.meta volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611920/103/448789845791411916/1c48ab6e-1546-4503-9084-28c629216c33/part.1 volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611920/103/448789845791411916/xl.meta volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/0/448789845791411924/xl.meta volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/1/448789845791411925/xl.meta volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/100/448789845791411920/xl.meta volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/101/448789845791411921/xl.meta volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/102/448789845791411922/xl.meta volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/103/448789845791411923/b3def25f-c117-4fba-8256-96cb7557cd6c/part.1 volumes/minio/a-bucket/files/insert_log/448789845791611912/448789845791611913/448789845791611939/103/448789845791411923/xl.meta volumes/minio/a-bucket/files/stats_log/448789845791211880/448789845791211881/448789845791411889/100/1/xl.meta volumes/minio/a-bucket/files/stats_log/448789845791211880/448789845791211881/448789845791411889/100/448789845791411912/xl.meta volumes/minio/a-bucket/files/stats_log/448789845791611912/448789845791611913/448789845791611920/100/1/xl.meta volumes/minio/a-bucket/files/stats_log/448789845791611912/448789845791611913/448789845791611920/100/448789845791411919/xl.meta volumes/minio/a-bucket/files/stats_log/448789845791611912/448789845791611913/448789845791611939/100/1/xl.meta volumes/minio/a-bucket/files/stats_log/448789845791611912/448789845791611913/448789845791611939/100/448789845791411926/xl.meta *************************************** root@ip-172-31-29-98:~/pymilvus/examples#
本節有效地展示了客戶如何在 Docker 容器中部署和操作獨立的 Milvus 設置,並利用 Amazon 的NetApp FSx ONTAP進行NetApp ONTAP資料儲存。此設定允許客戶利用向量資料庫的強大功能來處理高維資料和執行複雜查詢,所有這些都可以在可擴展且高效的 Docker 容器環境中完成。透過建立適用於NetApp ONTAP執行個體和符合的 EC2 執行個體的Amazon FSx ONTAP ,客戶可以確保最佳的資源利用率和資料管理。 FSx ONTAP在向量資料庫中資料寫入和讀取操作的成功驗證為客戶提供了可靠、一致的資料操作的保證。此外,透過 S3 協定列出(讀取)來自 AI 工作負載的資料的能力增強了資料可存取性。因此,這項全面的流程為客戶提供了一個強大且高效的解決方案,用於管理他們的大規模資料操作,並利用了 Amazon FSx ONTAP for NetApp ONTAP的功能。