Skip to main content

Setting up data ingestion

8 Tasks

40 mins

Visible to: All users
Beginner Pega Customer Decision Hub 8.7 English
Verify the version tags to ensure you are consuming the intended content or, complete the latest version.

Scenario

U+ Bank has decided to improve the experience of its customers with predictive and adaptive analytical models that drive personalized decisions. Following the Pega-recommended approach, the project team performed the data mapping workshop and mapped their existing data model to the financial services customer analytical data model. The U+ Bank data warehouse team prepared the customer and account data files along with the manifest files and developed the technical infrastructure to upload the data daily to a repository that Pega Customer Decision Hub™ can access.

The system architect prepared the following integration artifacts to process the data files from the repository:

Rule type Rule name Description

Case Type

DataIngestion

Case type to ingest data with a defined process

Process flow

UploadToStaging

Process flow for loading data from repository to a staging table

Process flow

UploadToXCar

Process flow for loading data from the staging tables to the customer data model

Property

ProcessType

Process type property that is used in the process flow (CustomerDataIngestion or AccountDataIngestion)

Property

DataFlowClassName

Property to hold the class of the data flow to execute (UBank-Data-Customer or UBank-Data-Accounts)

Property

DataFlowNameStaging

Property that holds the name of the data flow to execute for Staging (UBank-Data-Customer or UBank-Data-Accounts)

Property

DataFlowNameXCar

Property that holds the name of the data flow to execute for XCar (UBank-Data-Customer or UBank-Data-Accounts)

Property

ManifestDetails

Single page property of the manifest class

Activity

ExecuteDataIngestion

Activity that executes the data flows for each process type

Parse XML

XCarManifestParser

Reads the manifest file and maps the nodes in the manifest file to properties in Pega Platform

Service package

UBankDataFileProcessing

Service package that determines the security scheme and access for the services in the package

Service file

UBankProcessManifestFile

Service file to use the Parse XML

File listener

CustomerDataFileListener

File listener to process Customer record

File listener

AccountDataFileListener

File listener to process Account record

As a system architect, review the integration artifacts.

Note: The goal of this exercise is to demonstrate an end-to-end process to ingest data into Customer Decision Hub without creating technical artifacts. In a real-life scenario, the rules and configurations depend on the project requirements. The technical integration artifacts that system architects typically create are already configured for you in this exercise.

As a decisioning architect, your role is to prepare the data set and data flow artifacts required to populate customer and account tables in Customer Decision Hub.

After the creation of the artifacts, as a system architect, activate the file listeners to import the data into Customer Decision Hub.

Use the following credentials to log in to the exercise system:

Role User name Password
System architect SystemArchitect rules
Decisioning architect DecisioningArchitect rules

Your assignment consists of the following tasks:

Task 1: Confirm there is no data in the customer and account tables

As a decisioning architect, confirm that there is no customer or account data in Customer Decision Hub. Use the Customer and Account data sets to clear any test data from Customer Decision Hub.

Note: The exercise system contains customer and account data from a Monte Carlo data set.

Task 2: Review the integration artifacts and configure file listener

As a system architect, review the integration artifacts that are ready to ingest data from the repository. The manifest files uploaded to the repository have the following structure:

Manifests

 

Note: For the purposes of the exercise, all files are uploaded to the defaultstore (a system-managed temporary file storage) repository. In a real-life scenario, all files are typically stored in a repository such as AWS S3.

Task 3: Configure the file listeners

As a system architect, review and configure the file listeners (CustomerDataFileListener and AccountDataFileListener).

Task 4: Review the data ingestion case type

As a system architect, review the prepared case design to understand the process for data ingestion.

Flow

 

Task 5: Create the data sets for data ingestion

As a decisioning architect, create the source data sets (file) to ingest the data files from the repository and destination data sets (Decision Data Store) to save the data to a staging table.

Rule type Rule name Description

Data set

CustomerDataFile

File dataset for importing customer data from the repository

Data set

CustomerStaging

Cassandra data set to stage the data for verification

Data set

AccountDataFile

File dataset for importing account data from the repository

Data set

AccountStaging

Cassandra data set to stage the data for verification

Note: For the purposes of this exercise, the AccountDataFile and AccountStaging data sets are preconfigured for you. You configure the artifacts to load the customer data.

Task 6: Create the data flows for data ingestion

As a decisioning architect, create the data flows that ingest the data from the repository to the staging data sets. Then, create the data flows to ingest the data from the staging data sets to the customer and account database tables.

Rule type Rule name Description

Data flow

CustomerFromRepoToStaging

Data flow for importing customer data from CustomerDataFile data set to CustomerStaging data set

Data flow

AccountFromRepoToStaging

Data flow for importing account data from AccountDataFile data set to AccountStaging data set

Data flow

CustomerFromStagingToXCar

Data flow for importing customer data from CustomerStaging data set to Customer table

Data flow

AccountFromStagingToXCar

Data flow for importing account data from AccountStaging data set to Account table

Note: For the purposes of this exercise, AccountFromRepoToStaging and AccountFromStagingToXCar data flows are preconfigured for you. You configure the artifacts to load the customer data.

Task 7: Enable the file listeners

As a system architect, start the customer and account file listeners to initiate the data ingestion process.

Task 8: Confirm the data is ingested

As a decisioning architect, confirm that customer and account data are populated in Customer Decision Hub.

 

You must initiate your own Pega instance to complete this Challenge.

Initialization may take up to 5 minutes so please be patient.

Challenge Walkthrough

Detailed Tasks

1 Confirm there is no data in the customer and account tables

  1. On the exercise system landing page, click Pega CRM suite to log in to Customer Decision Hub.
  2. Log in as the decisioning architect with User name DecisioningArchitect and Password rules.
  3. In the header of Customer Decision Hub, in the search field, enter Customer, and then click the search icon.
    1. In the fourth filter list, select Exact Match.
    2. In the list of results, select the Customer data set with the Applies to class UBank-Data-Customer.
      SearchCustomerDataset
  4. In the upper-right corner, click the Run to truncate the data set.
  5. In the Run Data Set: Customer window, in the Operation list, select Truncate.
  6. In the upper-right, click Run to truncate the customer table.
  7. Close the Status Page window, and then close the Run Data Set: Customer window.
  8. In the lower-left corner, click Back to PM Marketing Studio Home to return to Customer Decision Hub.
    TruncateCustomerDataset
  9. Repeat steps 3–8 to for the Account data set to truncate the data.
    SearchAccountDataset
  10. In the navigation pane of Customer Decision Hub, click Data > Profile Data Sources to view data sources.
    SelectPDS
  11. On the Data sets tab of the Profile Data Sources page, click Customer to open the Customer data set.
  12. On the Data Set: Customer page, click the Records tab to confirm that there are no items.
    NoItem
  13. Optional: Repeat steps 10–12 for the Account data set to confirm that there are no items.
  14. In the upper-right corner, click DA, and then click Log off to log out of Customer Decision Hub.

2 Review the integration artifacts and configure file listener

  1. Log in to Dev Studio as the system architect with User name SystemArchitect and Password rules.
  2. In the navigation pane of Dev Studio, click Records > Integration-Resources > Service Package to open the UBankDataFileProcessing service package.
    UBankDataFileProcessing
    Tip: This rule defines the security scheme and access for the services defined within the service package.
  1. In the Edit Service Package : UBank Data File Processing service package rule, on the Context tab, review the following settings:
    1. In the Context section, in the Service access group field, confirm that the entry is CDH:CDHAdmins.
    2. In the Context section, confirm that the Requires authentication checkbox is clear.
    3. In the Methods section, in the Service type list, confirm that the value is Rule-Service-File.
    4. In the Method name column, click UBankProcessManifestFile to open the service file rule used for processing the manifest files.
      SP
  2. In the Service File: UBank Process Manifest File service file rule, on the Service tab, in the Primary page section, confirm that the entry in the Primary page class field is UBank-CDH-Work-DataIngestion.
    SFService
  3. On the Method tab, in the Processing options section, in the Processing method list, confirm that the selection is file at a time.
    SFMethod
  4. On the Request tab, in the Parse segments section, configure the following settings:
    1. Confirm that the entry in the Map to field is XML ParseRule.
    2. Confirm that the entry in the Map to key field is XCarManifestParser manifest.
      SFRequest
  5. On the Request tab, in the Processing epilog section, verify that the entry in the Final activity field is svcAddWorkObject, and then review the following parameters:
    1. Confirm that the entry in the FlowType field is pyStartCase.
    2. Confirm that the entry in the Organization field is to UBank.
    3. Confirm that the entry in the workPage field is to pyWorkPage.
    4. Confirm that the SkipCreateView checkbox is selected.
      CreateCase
  6. On the Request tab, in the Parse segments section, in the Map to key field, click the Open icon to open the XCarManifestParser rule.
    ParseSegment
    Tip: The Parse XML rule map the manifest file attributes to Pega clipboard pages.
  1. In the Parse XML: XCar Manifest Parser rule form, on the Mapping tab, confirm that the defined elements match the XML tags in the manifest file.
    Note: The sample manifest file is provided in the following screenshot for your convenience.
    ParseXML

3 Configure the file listeners

  1. In the navigation pane of Dev Studio, click Records > Integration-Resources > File Listener to list the file listeners.
  2. In the list of file listener instances, click CustomerDataFileListener to open the file listener that is used to process the CustomerDataIngest.csv file.
    CustomerDataFileListener
  3. In the Edit File Listener: Customer Data File Listener rule, on the Properties tab, configure the following settings:
    1. In the Listener nodes section, clear the Block startup checkbox.
    2. In the Startup option list, confirm that the selection is Run on all nodes.
      ListenerNode
       
      Caution: In a real-life scenario, in the Startup option list, NodeClassification based startup is selected, and the Node Type is BackgroundProcessing. In an actual environment, there are multiple nodes to complete different processes. File listeners typically run in the dedicated node that runs such background processes. The exercise system has a single node, so it is unnecessary to change this setting.
    3. In the Source properties section, confirm that the entry in the Source location field is =D_EnvSettings.RepoURL.
    4. In the Source name mask field, confirm that the value is CustomerDataIngestManifest*.xml.
      DSS
      Tip: As a best practice, use application settings or dynamic system settings for the Source location field. Setting the source location dynamically through parameters allows you to configure different paths for the location in different environments.
  1. In the Listener properties section, verify the following settings:
    1. In the Service package list, confirm that the selection is UBankDataFileProcessing.
    2. In the Service class list, confirm that the selection is Default.
    3. In the Service method list, confirm that the selection is UBankProcessManifestFile.
  2. In the upper-right corner, click Save.
  3. Click Test connectivity to confirm that the file is accessible.
    ListenerProperties
  4. In the File Listener Connectivity Test window, click Close.
    ConnectTest
  5. Repeat the step 2-5 for the AccountDataFileListener.
  6. In the Listener nodes section, clear the Block startup checkbox.
  7. In the Startup option list, select Run on all nodes.
  8. In the upper-right corner, click Save.

4 Review the data ingestion case type

  1. In the navigation pane of Dev Studio, click Case types > Data Ingestion to open the Data Ingestion case type.
    CaseType
    Note: The Data Ingestion case type exists for the purposes of this exercise and skips various important steps in validation and cleanup. In a real-life scenario, the processes and steps are different and ensure that the files are processed without any errors; staging tables are typically cleared. For more information, see the Pega Customer Decision Hub Implementation Guide.
  1. In the Case life cycle section, in the Upload To Staging stage, click CONFIGURE PROCESS to edit the case type.
    CaseLifecycle
  2. In the toolbar of the Edit case type: Data Ingestion work area, click Open process to open the process details.
    Processflow
  3. On the flow diagram, right-click the Process Type? decision shape, and then select Open Decision to open the decision table that initiates the corresponding data flows.
    Processtype
  4. In the DataIngestionProcessType decision table, review the values in the DataFlowNameStaging and DataFlowNameXCar columns.
    DecisionTable
    Note: These are the names of the data flows that are used to move data from the repository to staging tables and then to the customer and account database tables.
  1. In the lower-left corner, click the SA icon, and then select Log off to log out of Dev Studio.

5 Create the data sets for data ingestion

  1. Log in to Customer Decision Hub as the decisioning architect with User name DecisioningArchitect and Password rules.
  2. In the upper-right corner, click the DA > Switch apps > Customer Decision Hub to switch to Dev Studio.
    SwitchApp
    Caution: Ensure that you select the Customer Decision Hub that is not already selected to switch to Dev Studio.
  1. In the header of Dev Studio, click Create > Data Model > Data Set to create a new data set.
    CreateDS
  2. On the Create Data Set tab, create a new data set:
    1. In the Label field, enter CustomerDataFile.
    2. In the Type list, select File.
    3. In the Apply to field, enter or select UBank-Data-Customer.
    4. In the Add to ruleset list, select CDH-Rules.
    5. In the upper-right corner, click Create and open.
      CustomerDataFile
  3. In the Edit Data set : CustomerDataFile tab, in the Configuration section, set up the manifest file:
    1. In the Repository configuration field, enter or select defaultstore.
    2. In the Path section, select Use a manifest file to import data.
    3. In the Manifest file path field, enter or select /CustomerDataIngestManifest.xml.
  4. Click Preview file to preview the manifest file.
    CustomerDataFileSettings
  5. In the File preview window, confirm that the content of the manifest file is visible.
    FilePreview
    Note: The manifest and data files are already uploaded to the repository for this exercise. In a real-life scenario, the files are uploaded in regular intervals with proper naming conventions. File path in the manifest file must be set relative to the Repository configuration in the dataset.
  1. On the Data file tab, confirm that the CustomerDataIngest.csv contents are visible.
    DataPreview
  2. Close the File preview window.
  3. In the File configuration section, in the File type list, select CSV, and then click Configure automatically to automatically complete the data mapping to the properties in the customer class.
  4. In the Date Time format field, enter MM/dd/yyyy HH:mm
  5. In the Date format field, enter MM/dd/yyyy
    ConfigureFileDS
  6. In the upper-right corner, click Save.
  7. On the header of Dev Studio, click Create > Data Model > Data Set to create a new data set.
  8. In the Create Data Set tab, configure the new data set:
    1. In the Label field, enter CustomerStaging.
    2. In the Type list, select Decision Data Store.
    3. In the Apply to field, enter or select UBank-Data-Customer.
    4. In the Add to ruleset list, select CDH-Rules.
    5. In the upper-right corner, click Create and open.
      CustomerStagingDS
    6. In the upper-right corner, click Save.
      CustomerStagingDSCreate

6 Create the data flows for data ingestion

  1. In the header of Dev Studio, click Create > Data Model > Data Flow to create a new data flow.
  2. In the Create Data Flow work space, configure the new data flow:
    1. In the Label field, enter CustomerFromRepoToStaging.
    2. In the Apply to field, enter or select UBank-Data-Customer.
    3. In the Add to ruleset list, select CDH-Rules.
  3. In the upper-right corner, click Create and open.
    CustomerFromRepoToStaging
  4. On the canvas, double-click the first component to modify the source of the CustomerFromRepoToStaging data flow.
    CustomerFromRepoToStaging1
  5. In the Source configurations window, set up the input data:
    1. In the Source list, select Data set.
    2. In the Data set field, enter or select CustomerDataFile.
    3. Click Submit.
      DataflowSource
  6. On the canvas, double-click the second component to modify the destination of the CustomerFromRepoToStaging data flow.
    CustomerFromRepoToStaging2
  7. In the Destination configurations window, set up the output data:
    1. In the Destination list, select Data set.
    2. In the Data set field, enter or select CustomerStaging.
    3. Click Submit.
      DataflowDestination
  8. In the upper-right corner, click Save.
  9. In the header of Dev Studio, click Create > Data Model > Data Flow to create a new data flow.
  10. In the Create Data Flow tab, create a new data flow:
    1. In the Label field, enter CustomerFromStagingToXCar.
    2. In the Apply to field, enter or select UBank-Data-Customer.
    3. In the Add to ruleset list select CDH-Rules.
  11. In the upper-right corner, click Create and open.
  12. On the canvas, double-click the first component to modify the source of the CustomerFromStagingToXCar data flow.
  13. In the Source configurations window, set up the input data:
    1. In the Source list, select Data set.
    2. In the Data set field, enter or select CustomerStaging.
    3. Click Submit.
      DataflowSource2
  14. On the canvas, double-click on second component to modify the CustomerFromStagingToXCar data flow.
  15. In the Destination configurations window, set up the output data:
    1. In the Destination list, select Data set.
    2. In the Data set field, enter or select Customer.
    3. In the Save options, select Insert new and overwrite existing records.
      DataflowDestination2
    4. Click Submit.
  16. In the upper-right corner, click Save.
  17. In the lower-left corner, click the DA icon, and then select Log off to log out of Dev Studio.

7 Enable the file listeners

  1. Log in to Dev Studio as the system architect with User name SystemArchitect and Password rules.
  2. In the header of Dev Studio, click Dev Studio > Admin Studio to switch workspaces.
    AdminStudio
  3. In the navigation pane of Admin Studio, click Resources > Listeners to view listeners.
    Listeners
  4. On the Listeners page, in the Start / restart listener list, select Start new instance.
    StartNewInstance
    1. In the empty field, enter or select CustomerDataFileListener.
      EnableFL
    2. Click Apply.
  5. Repeat step 4 a-b to enable the AccountDataFileListener.
  6. In the Active listeners section, click the Refresh icon.
    Refresh
  7. In the lower-left corner, click the SA icon, and then select Log off to log out of Admin Studio.

8 Confirm the data is ingested

  1. Log in as the decisioning architect with User name DecisioningArchitect and Password rules.
  2. In the navigation pane of Customer Decision Hub, click Data > Profile Data Sources to view data sources.
  3. On the Profile Data Sources page, on the Data sets tab, click Customer to open the customer data set.
  4. On the Data Set: Customer page, confirm that the records are imported.
    CustomerRecords
  5. Optional: Follow the steps 3–5 for the Account data set to confirm that the system imported the account records.

This Challenge is to practice what you learned in the following Module:


Available in the following mission:

If you are having problems with your training, please review the Pega Academy Support FAQs.

Did you find this content helpful?

Want to help us improve this content?

We'd prefer it if you saw us at our best.

Pega Academy has detected you are using a browser which may prevent you from experiencing the site as intended. To improve your experience, please update your browser.

Close Deprecation Notice