Skip to main content

Manage Amazon Q Business connectors

Contributors netapp-mwallis

After you create a connector for Amazon Q Business, you can view the connector details, modify the connector, integrate additional data sources, or delete the connector.

View information about a connector

You can view information about the settings for a connector and the data sources that are integrated.

Steps
  1. Log in to workload factory using one of the console experiences.

  2. From the workload factory navigation menu, select AI.

  3. Select the connector that you want to view.

  4. To view connector details, select the option button and select Manage connector.

    This page displays the published status, embedding status of the data sources, embedding mode, the list of all embedded data sources, and more.

    The Actions menu enables you to manage the connector if you want to make any changes.

Edit a connector

You can update a connector by changing some settings, or you can add or remove data sources.

Each time you add, modify, or remove data sources from the connector, GenAI needs to send the data source information to Amazon Q Business so that it is re-indexed. Syncing is incremental, so Amazon Q Business only processes the objects in your FSx for ONTAP volume that have been added, modified, or deleted since the last sync.

Steps
  1. Log in to workload factory using one of the console experiences.

  2. From the Knowledge bases & Connectors inventory page, select the connector that you want to update.

  3. Select the option button and select Manage connector.

    This page displays the published status, embedding status of the data sources, embedding mode, the list of all embedded data sources, and more.

  4. Select the Actions menu and select Edit connector.

  5. In the Edit connector page, you can change the connector name, description, embedding model, data guardrails enablement, and the snapshot policy used for the volume that contains the connector.

    Note Every data source scan, which includes embedding, incurs a cost. If you enable data guardrails after a connector has been created, then the data source gets scanned again and incurs costs.
  6. Select Save after you have made changes.

Add additional data sources to a connector

You can embed additional data sources in your connector to populate it with additional organization data.

Steps
  1. Log in to workload factory using one of the console experiences.

  2. From the Knowledge bases & Connectors inventory page, select the connector where you want to add the data source.

  3. Select the option button and select Add data source.

  4. Select the type of data source you want to add:

    • Add FSx for ONTAP file system (use files from an existing FSx for ONTAP volume)

    • Add file system (use files from a generic SMB or NFS share)

Add an FSx for ONTAP file system
  1. Select a file system: Select the FSx for ONTAP file system where your data source files reside and select Next.

  2. Select a volume: Select the volume on which your data source files reside and select Next.

    When selecting files stored using the SMB protocol, you'll need to enter the Active Directory information, which includes the domain, IP address, user name, and password.

  3. Select a data source: Select the data source location based on where you have saved the files. This can be an entire volume, or just a specific folder or sub-folder in the volume, and select Next.

  4. Configurations: Configure how the data source ingests information from your files, and which files it includes in scans:

    • Define data source: In the Chunking strategy section, define the how the GenAI engine splits data source content into chunks when the data source is integrated with a knowledge base. You can choose one of the following strategies:

      • Multi-sentence chunking: Organizes information from your data source into sentence-defined chunks. You can choose how many sentences make up each chunk (up to 100).

      • Overlap-based chunking: Organizes information from your data source into character-defined chunks that can overlap neighboring chunks. You can choose the size of each chunk in characters, and how much each chunk overlaps with adjacent chunks. You can configure a chunk size of between 50 and 3000 characters, and an overlap percentage of between 1 and 99%.

        Note Choosing a high overlap percentage can greatly increase storage requirements with only slight improvements in retrieval accuracy.
    • File filtering: Configure which files are included in scans:

      • In the File types support section, choose to either include all types of files, or select individual file types for inclusion in the data source scans.

        If you include images or PDF files, BlueXP workload factory for GenAI parses text in the images (including images in PDF documents), and this incurs a higher cost.

        When including text data from images, GenAI is unable to mask Personally-Identifiable Information (PII) from the image as the scanned text data is sent from your environment to AWS. However, once the data is stored, all PII is masked in the GenAI database.

        Note Your choice to include image files in scans is related to the knowledge base chat model. If you include image files in scans, the chat model must support images. If image file types are selected here, you cannot switch the knowledge base to a chat model that does not support image files.
      • In the File modification time filter section, choose to enable or disable inclusion of files based on their modification time. If you enable modification time filtering, select a date range from the list.

        Note If you include files based on a modification date range, as soon as the date range is not satisfied (the files have not been modified within the date range you specify), the files will be excluded from the periodic scan, and the data source will not include these files.
  5. In the Permission aware section, which is available only when the data source you selected is on a volume that uses the SMB protocol, you can enable or disable permission-aware responses:

    • Enabled: Users of the chatbot who access this knowledge base will only get responses to queries from data sources to which they have access.

    • Disabled: Users of the chatbot will receive responses using content from all integrated data sources.

  6. Select Add to add this data source to your knowledge base.

Add a generic NFS file system
  1. Select a file system: Enter the IP address or FQDN for the filesystem host where your data source files reside, choose the NFS protocol for the network share, and select Next.

  2. Select a data source: Select the data source location based on where you have saved the files. This can be an entire volume, or just a specific folder or sub-folder in the volume, and select Next.

    Note In some cases, you might need to enter the NFS export name manually and select Retrieve directories to display the available directories. You can choose to select the entire export, or only specific folders from the export.
  3. Configurations: Configure how the data source ingests information from your files, and which files it includes in scans:

    • Define data source: In the Chunking strategy section, define the how the GenAI engine splits data source content into chunks when the data source is integrated with a knowledge base. You can choose one of the following strategies:

      • Multi-sentence chunking: Organizes information from your data source into sentence-defined chunks. You can choose how many sentences make up each chunk (up to 100).

      • Overlap-based chunking: Organizes information from your data source into character-defined chunks that can overlap neighboring chunks. You can choose the size of each chunk in characters, and how much each chunk overlaps with adjacent chunks. You can configure a chunk size of between 50 and 3000 characters, and an overlap percentage of between 1 and 99%.

        Note Choosing a high overlap percentage can greatly increase storage requirements with only slight improvements in retrieval accuracy.
    • File filtering: Configure which files are included in scans:

      • In the File types support section, choose to either include all types of files, or select individual file types for inclusion in the data source scans.

        If you include images or PDF files, BlueXP workload factory for GenAI parses text in the images (including images in PDF documents), and this incurs a higher cost.

        When including text data from images, GenAI is unable to mask Personally-Identifiable Information (PII) from the image as the scanned text data is sent from your environment to AWS. However, once the data is stored, all PII is masked in the GenAI database.

        Note Your choice to include image files in scans is related to the knowledge base chat model. If you include image files in scans, the chat model must support images. If image file types are selected here, you cannot switch the knowledge base to a chat model that does not support image files.
      • In the File modification time filter section, choose to enable or disable inclusion of files based on their modification time. If you enable modification time filtering, select a date range from the list.

        Note If you include files based on a modification date range, as soon as the date range is not satisfied (the files have not been modified within the date range you specify), the files will be excluded from the periodic scan, and the data source will not include these files.
  4. Select Add data source to add this data source to your knowledge base.

Add a generic SMB file system
  1. Select file system:

    1. Enter the IP address or FQDN for the filesystem host where your data source files reside.

    2. Choose the SMB protocol for the network share.

    3. Enter the Active Directory information, which includes the domain, IP address, user name, and password.

    4. Select Next.

  2. Select a data source: Select the data source location based on where you have saved the files. This can be an entire volume, or just a specific folder or sub-folder in the volume, and select Next.

    Note In some cases, you might need to enter the SMB share name manually and select Retrieve directories to display the available directories. You can choose to select the entire share, or only specific folders from the share.
  3. Configurations: Configure how the data source ingests information from your files, and which files it includes in scans:

    • Define data source: In the Chunking strategy section, define the how the GenAI engine splits data source content into chunks when the data source is integrated with a knowledge base. You can choose one of the following strategies:

      • Multi-sentence chunking: Organizes information from your data source into sentence-defined chunks. You can choose how many sentences make up each chunk (up to 100).

      • Overlap-based chunking: Organizes information from your data source into character-defined chunks that can overlap neighboring chunks. You can choose the size of each chunk in characters, and how much each chunk overlaps with adjacent chunks. You can configure a chunk size of between 50 and 3000 characters, and an overlap percentage of between 1 and 99%.

        Note Choosing a high overlap percentage can greatly increase storage requirements with only slight improvements in retrieval accuracy.
    • Permission aware: Enable or disable permission-aware responses:

      • Enabled: Users of the chatbot who access this knowledge base will only get responses to queries from data sources to which they have access.

      • Disabled: Users of the chatbot will receive responses using content from all integrated data sources.

    • File filtering: Configure which files are included in scans:

      • In the File types support section, choose to either include all types of files, or select individual file types for inclusion in the data source scans.

        If you include images or PDF files, BlueXP workload factory for GenAI parses text in the images (including images in PDF documents), and this incurs a higher cost.

        When including text data from images, GenAI is unable to mask Personally-Identifiable Information (PII) from the image as the scanned text data is sent from your environment to AWS. However, once the data is stored, all PII is masked in the GenAI database.

        Note Your choice to include image files in scans is related to the knowledge base chat model. If you include image files in scans, the chat model must support images. If image file types are selected here, you cannot switch the knowledge base to a chat model that does not support image files.
      • In the File modification time filter section, choose to enable or disable inclusion of files based on their modification time. If you enable modification time filtering, select a date range from the list.

        Note If you include files based on a modification date range, as soon as the date range is not satisfied (the files have not been modified within the date range you specify), the files will be excluded from the periodic scan, and the data source will not include these files.
  4. Select Add data source to add this data source to your knowledge base.

Result

The data source is integrated into your connector.

Synchronize your data sources with a connector

Data sources are synchronized with the associated connector automatically once a day so that any data source changes are reflected in Amazon Q Business. If you make changes to any of your data sources and you'd like to synchronize (scan) the data immediately, you can perform an on-demand synchronization.

Syncing is incremental, so Amazon Q Business only processes the objects in your data sources that have been added, modified, or deleted since the last sync.

Steps
  1. Log in to workload factory using one of the console experiences.

  2. From the Knowledge bases & Connectors tab, select the connector that you want to synchronize.

  3. Select the option button and select Manage connector.

  4. Select the Actions menu and select Scan now.

    You'll see a message that your data sources are being scanned, and a final message when the scan is complete.

Result

The connector is synchronized with the attached data sources and Amazon Q Business will start using the newest information from your data sources.

Pause or resume a scheduled synchronization

If you want to pause or resume the next synchronization (scan) of the data sources, you can do so at any time. You might need to pause the next scheduled synchronization if you are going to make changes to a data source and don't want the synchronization happening during the change window.

Steps
  1. Log in to workload factory using one of the console experiences.

  2. From the connector inventory page, select the connector for which you want to pause or resume scans.

  3. Select the option button and select Manage connector.

  4. Select the Actions menu and select Scan > Pause scheduled scan or Scan > Resume scheduled scan.

    You'll see a message that the next scheduled scan has either been paused or resumed.

Delete a connector

If you no longer need a connector, you can delete it. When you delete a connector, it is removed from workload factory and the volume that contains the connector is deleted. Deleting a connector is not reversible.

When you delete a connector, you should also disassociate the connector from any agents it is associated with to fully delete all resources associated with the connector.

Steps
  1. Log in to workload factory using one of the console experiences.

  2. From the Knowledge bases & Connectors inventory page, select the connector that you want to delete.

  3. Select the option button and select Manage connector.

  4. Select the Actions menu and select Delete connector.

  5. In the Delete connector dialog, confirm that you want to delete it and select Delete.

Result

The connector is removed from workload factory and its associated volume is deleted.