The Pega Knowledge Loader
The Pega Knowledge Loader is a feature that enables you to pull content from external repositories into a Pega GenAI Knowledge Buddy™. You can extend the framework to ingest content from multiple sources and enhance the extraction and chunking of data. The Knowledge Loader helps you automate the ingestion and management of external content and complements the existing content management features in Pega Knowledge Management.
The Pega Knowledge loader works out of the box with SharePoint and Confluence, but you can also use an external integrator.
To successfully use Knowledge Loader, you must first create a data collection and assign a data source to the collection. This data collection contains the content that is ingested by Knowledge Loader, and can then be used by a Knowledge Buddy to answer questions.
SharePoint loader
To use content that you store on a SharePoint site, you must first build a SharePoint Loader from the Pega Knowledge loader portal. The following table lists all the properties of the SharePoint loader that you must provide when you create the SharePoint Loader:
|
Value |
Mandatory |
Note |
Example |
|
Collection |
YES |
Specify the collection in which the content ingestion should occur. The collection must already exist in the Knowledge Buddy application. |
Knowledge |
|
DataSource |
YES |
Specify the data source that corresponds to the collection you indicated. The data source must already exist in the Knowledge Buddy application. |
ProductGuide |
|
Role |
YES |
Specify the role for which the content should be available. |
KnowldegeBuddy:Public |
|
SiteName |
YES |
Specify the name of the SharePoint website from which you want to ingest data. |
https://example.sharepoint.com/sites/Guide |
|
Resources |
YES |
Specify the folder path from where you want to begin data ingestion. If you want to indicate the root folder, enter only a forward slash (/). |
/Shared Documents/Guides |
You can also choose whether the SharePoint loader should include sub-folders, and set several optional settings:
|
Value |
Mandatory |
Note |
Example |
|
File names to include |
NO |
Control data inclusion based on file types. |
.pdf,.docx |
|
File names to exclude |
NO |
Control data exclusion based on file name. |
|
|
File types to include |
NO |
Control data inclusion based on file type. |
|
|
Attributes |
NO |
Custom attributes created on SharePoint that you want the Knowledge Loader to ingest. |
Creator,Tag |
Once you are ready, click Submit to create the SharePoint loader. This creates two background jobs, the first of which runs to extracts the list of files from SharePoint, and the second to extract the files and ingest them into Knowledge Buddy or update them when needed. This happens at regular intervals, as configured in the job scheduler.
Customizations
You can apply the following customizations to the Knowledge Loader to better match the needs of your organization:
- Sourcing from any type of repository: The framework supports pulling data from repositories other than SharePoint. To use another repository, you must create a subclass in the PegaKnowledgeLoaderWorkRepository class for the new repository type. Implement the necessary Data Pages and activities to fetch files and folders from the new repository.
- Pushing to Any Destination: By default, the framework pushes data to Knowledge Buddy, but you can extend the framework to push data to other destinations by overriding specific activities.
- Decoding File Content: The framework uses Apache Tika by default to decode file content. However, you can use a different framework to decode the content by overriding the provided extension points.
- Additional Attributes: The framework enables you to ingest additional custom attributes from the source repository. You can extend the data pages and activities to handle these custom attributes.
- Job Schedulers: The Knowledge Loader includes job schedulers that periodically ingest new files and check for updates. You can customize the scheduling intervals and the logic for handling updates.
This Topic is available in the following Module:
Quer nos ajudar a melhorar esse conteúdo?