Solution Technology
The following figure illustrates the proposed conversational AI system architecture. You can interact with the system with either speech signal or text input. If spoken input is detected, Jarvis AI-as-service (AIaaS) performs ASR to produce text for Dialog Manager. Dialog Manager remembers states of conversation, routes text to corresponding services, and passes commands to Fulfillment Engine. Jarvis NLP Service takes in text, recognizes intents and entities, and outputs those intents and entity slots back to Dialog Manager, which then sends Action to Fulfillment Engine. Fulfillment Engine consists of third-party APIs or SQL databases that answer user queries. After receiving Result from Fulfillment Engine, Dialog Manager routes text to Jarvis TTS AIaaS to produce an audio response for the end-user. We can archive conversation history, annotate sentences with intents and slots for NeMo training such that NLP Service improves as more users interact with the system.
Hardware Requirements
This solution was validated using one DGX Station and one AFF A220 storage system. Jarvis requires either a T4 or V100 GPU to perform deep neural network computations.
The following table lists the hardware components that are required to implement the solution as tested.
Hardware | Quantity |
---|---|
T4 or V100 GPU |
1 |
NVIDIA DGX Station |
1 |
Software Requirements
The following table lists the software components that are required to implement the solution as tested.
Software | Version or Other Information |
---|---|
NetApp ONTAP data management software |
9.6 |
Cisco NX-OS switch firmware |
7.0(3)I6(1) |
NVIDIA DGX OS |
4.0.4 - Ubuntu 18.04 LTS |
NVIDIA Jarvis Framework |
EA v0.2 |
NVIDIA NeMo |
nvcr.io/nvidia/nemo:v0.10 |
Docker container platform |
18.06.1-ce [e68fc7a] |