Skip to main content

Content ingestion for Knowledge Buddy

Data ingestion is the process of importing and processing assorted data files from multiple sources into a storage or computing system, such as a data warehouse or database, where users or systems can access and analyze them.

If you use Pega GenAI Knowledge Buddy™ with Pega Knowledge™, you can also include PDFs and other text-formatted attachments for content ingestion.

Content ingestion using a KM article and REST API

Steps to ingest a KM article into Knowledge Buddy:

When you receive an article, the Buddy Ingestion API (REST API) ingests new content or updates existing content. Then, you break down the article into smaller chunks. After that, we generate embeddings for each chunk using the Pega GenAI™ gateway and store them in the database.

For example, you receive an article, "How to add, change, or update your address." You use the Buddy Ingestion API to ingest the article into Knowledge Buddy. Next, you break down the article into smaller chunks. Then, you generate embeddings for each of these chunks and store this information in your database to make it accessible for users who search for information on customer service in e-commerce.

Steps to ingest a KM article
Note: While ingested content might contain videos and images, Knowledge Buddy cannot answer questions about images or videos. Additionally, Knowledge Buddy cannot provide image or video responses to questions or generate images or videos as output for a question.

Sample JSON structures for ingesting content with REST API

The following codes are examples of JSON structures that you can use for content ingestion with the REST API:

Example 1:

api example 1

Example 2:

api example 2

The JSON structures include the following properties:

objectID: The most important aspect is the objectID, which is a single value that must be pushed into the database and serves as the key value for the content. The objectID is crucial whether it is a webpage, URL, document type, or reference such as KC-0039.

dataSource: The dataSource refers to the name of the data source defined in Knowledge Buddy. If you attempt to push an objectID without access to the corresponding data source, then the objectID will be rejected.

title: The title is also important and should accurately reflect the ingested content.

chunkingMethod: The chunkingMethod refers to the default chunking size and overlap.

roles: roles are mandatory to secure content, and every piece of content pushed into the system must have an associated role.

text: text refers to the array of text values that the REST API pushes to Knowledge Buddy.

attributes: The attributes (Global attributes) are name-value pairs that you can add as needed; you can use multiple attributes. The REST API automatically applies these attributes to all text chunks.

In addition, you can add your own attributes at the content level for each piece of content you ingest. For example, you can push the category or URL of an article as the content-level attribute.

When you tag an article or put an article in a category, that category becomes the value for the "category" attribute, as shown in the following sample code:

"attributes": [
{
"name": "category",
"values": [
{
"value": "{{category}}"
}
]
}

]

When ingesting content from Pega Knowledge, these steps are automated for you. When pushing out attributes, you push the title, article ID, URL of the article, and so on. However, if you are ingesting your own content, you can select values that you want to push as part of your injection process.

Methods to have Knowledge Buddy ingest Pega Knowledge articles

You can use the following methods to have Knowledge Buddy ingest content from Pega Knowledge articles:

1. Publish content 

By default, when a knowledge article is published in the Pega Knowledge portal, Knowledge Buddy automatically ingests the article based on the content type selected for the knowledge article.

When you create a new content type in the Pega Knowledge portal, that content type creates a corresponding data source for Pega Knowledge Buddy with the name Knowledge_ {ContentType ID}. For example, if there is a content type with the name Smartphones, the data source corresponding to this is Knowledge_Smartphones.

2. Bulk publish content

On the Content landing page, you can filter content by category, and then select multiple articles. To publish all the selected articles, use the Actions list, select the Change status option, and then select Resolved-Published.

Bulk publish content

3. Sync all

For users who update to a new build or if they want Knowledge Buddy to ingest all published content at once, a Sync all action ingests all the published content and creates the data source.

On the Taxonomy landing, on the Article synchronization tab, click Sync all to synchronize all categories with articles.

Select the checkbox to re-index the article text in Knowledge Buddy. 

Sync all

4. Article attachments

You can create an Attachments article by selecting the Article type is an attachment article only checkbox in the Display settings section when you create a knowledge article. You can mark the checkbox on any content type or category.

Note: You can set the Article type is an attachment article only checkbox to be selected by default for particular content types, depending on your organizational needs. To do this, in the Pega Knowledge portal navigate to Configurations>Content types then edit the chosen content type.
the display settings section with the article type is an attachment article only checkbox

Unlike a regular knowledge article, an Attachments articles does not have an article body. Instead, you upload a single file which is then attached to the article. When you publish the article, the system automatically ingests the attached content to Knowledge Buddy depending on the content type you selected for the knowledge article.

When you create an Attachments article, use one of the following recommended file types:

  1. Microsoft Office WORD files
  2. PDF
  3. Text content, which can include HTML or Markdown format

Knowledge Buddy is currently unable to answer questions concerning images or video, and it cannot produce images or video as an output to a question.

an attachments article type created in the pega knowledge portal
Note: Every single piece of content that you push to the system must have a role associated with it.

You have reached the end of this topic. What have you learned?

  • How to ingest content using a KM article and REST API.
  • How Knowledge Buddy can ingest Pega Knowledge articles.

This Topic is available in the following Modules:

If you are having problems with your training, please review the Pega Academy Support FAQs.

Did you find this content helpful?

Want to help us improve this content?

We'd prefer it if you saw us at our best.

Pega Academy has detected you are using a browser which may prevent you from experiencing the site as intended. To improve your experience, please update your browser.

Close Deprecation Notice