Vector Database Use Cases
This section provides an overview of the use cases for the NetApp vector database solution.
Vector Database Use Cases
In this section, we discuss about two use cases such as Retrieval Augmented Generation with Large Language Models and NetApp IT chatbot.
Retrieval Augmented Generation (RAG) with Large Language Models (LLMs)
Retrieval-augmented generation, or RAG, is a technique for enhancing the accuracy and reliability of Large Language Models, or LLMs, by augmenting prompts with facts fetched from external sources. In a traditional RAG deployment, vector embeddings are generated from an existing dataset and then stored in a vector database, often referred to as a knowledgebase. Whenever a user submits a prompt to the LLM, a vector embedding representation of the prompt is generated, and the vector database is searched using that embedding as the search query. This search operation returns similar vectors from the knowledgebase, which are then fed to the LLM as context alongside the original user prompt. In this way, an LLM can be augmented with additional information that was not part of its original training dataset.
The NVIDIA Enterprise RAG LLM Operator is a useful tool for implementing RAG in the enterprise. This operator can be used to deploy a full RAG pipeline. The RAG pipeline can be customized to utilize either Milvus or pgvecto as the vector database for storing knowledgebase embeddings. Refer to the documentation for details.
NetApp has validated an enterprise RAG architecture powered by the NVIDIA Enterprise RAG LLM Operator alongside NetApp storage. Refer to our blog post for more information and to see a demo. Figure 1 provides an overview of this architecture.
Figure 1) Enterprise RAG powered by NVIDIA NeMo Microservices and NetApp
NetApp IT chatbot use case
NetApp's chatbot serves as another real-time use case for the vector database. In this instance, the NetApp Private OpenAI Sandbox provides an effective, secure, and efficient platform for managing queries from NetApp's internal users. By incorporating stringent security protocols, efficient data management systems, and sophisticated AI processing capabilities, it guarantees high-quality, precise responses to users based on their roles and responsibilities in the organization via SSO authentication. This architecture highlights the potential of merging advanced technologies to create user-focused, intelligent systems.
The use case can be divided into four primary sections.
User Authentication and Verification:
-
User queries first go through the NetApp Single Sign-On (SSO) process to confirm the user's identity.
-
After successful authentication, the system checks the VPN connection to ensure a secure data transmission.
Data Transmission and Processing:
-
Once the VPN is validated, the data is sent to MariaDB through the NetAIChat or NetAICreate web applications. MariaDB is a fast and efficient database system used to manage and store user data.
-
MariaDB then sends the information to the NetApp Azure instance, which connects the user data to the AI processing unit.
Interaction with OpenAI and Content Filtering:
-
The Azure instance sends the user's questions to a content filtering system. This system cleans up the query and prepares it for processing.
-
The cleaned-up input is then sent to the Azure OpenAI base model, which generates a response based on the input.
Response Generation and Moderation:
-
The response from the base model is first checked to ensure it is accurate and meets content standards.
-
After passing the check, the response is sent back to the user. This process ensures that the user receives a clear, accurate, and appropriate answer to their query.