Skip to main content

Data Collections in Pega GenAI Knowledge Buddy

Use data collections to group your data sources into separate collections, which provides greater flexibility and control over data organization.

Data collection

In previous versions of Pega GenAI Knowledge Buddy™, all data was loaded into a single database table, with each piece of content identified by a unique object ID. This setup prevented the same content from existing more than once in the system, even if it came from different data sources.

Data collection solves this limitation by enabling users to create separate collections that act like individual tables, each containing its own set of data sources and content. As a result, the same content can exist in multiple collections without conflict because each collection has its own unique identity.

Nota: A single semantic query cannot span multiple collections, which helps ensure data isolation when you need it.

The following figure shows the Data collection landing page in Knowledge Buddy:

data collections in the buddy portal

Key benefits

A data collection provides the following benefits for Knowledge Buddy:

  • Test different chunking algorithms or chunk sizes: Create separate collections to test different chunking methods or chunk sizes for content without having to overwrite existing data.
  • Separate production and test data: Keep your production and test data completely separate to prevent any accidental mixing of data.
  • Isolate sensitive data: Collections provide a clear boundary for isolating sensitive data, such as payroll information, from other data sources that the system should not combine in the same semantic query.
  • Improve performance: By segmenting data into smaller collections, semantic queries can run faster because there is less data to search through.

Setting up a Data collection

The process of setting up a data collection involves the following steps:

  1. Create a new collection

    Specify details such as the name, description, chunking settings, and access permissions for the new collection.
    The following figure shows an example of a CustomerService data collection.
    the create data collection window

Under Advanced settings, you can configure content processing options of the data collections. By default, all data sources inherit these settings, but you can override them for individual data sources. The following is a list of parameters you can configure:

  • Chunking method, Chunk size and Chunk overlap
  • Content level attribution: Select an analyzer to apply auto attribution to each content.
  • Chunk level attribution: Select an analyzer to apply auto attribution to each chunk once the chunking is concluded. 
  • Embedding attributes: Select one or more attributes to embed while ingesting content to vector store.
  • Auto filtering attribution: Select an analyzer to apply auto attribution filtering during buddy search.
Nota: You can create new analyzers from Prediction studio.
  1. Create data sources and assign them to collections

    Create data sources (for example, knowledge articles and documents) and assign them to the desired collection category, as shown in the following figure:
    the create data source window
    Nota: You can only assign a collection when you create the data source, and the system does not allow you to change the collection later. If you assign an incorrect data source, you need to create a new data source with the correct data collection.
  1. Associate Knowledge Buddies with collections

    During the Knowledge Buddy setup, select which collections and data sources you want the Buddy to use for its semantic queries.
    collection and data sources when creating a knowledge buddy

This Topic is available in the following Module:

If you are having problems with your training, please review the Pega Academy Support FAQs.

Este conteúdo foi útil?

Quer nos ajudar a melhorar esse conteúdo?

We'd prefer it if you saw us at our best.

Pega Academy has detected you are using a browser which may prevent you from experiencing the site as intended. To improve your experience, please update your browser.

Close Deprecation Notice