Skip to main content

Building models with Pega machine learning


7 Tasks

5 mins

Visible to: All users
Beginner Pega Customer Decision Hub 8.6 English
This content is now archived and is no longer updated. Progress is not calculated. Pega Cloud instances are disabled, and badges are no longer awarded. Click here to continue your progress in the latest version.


U+ Bank uses artificial intelligence (AI) to determine which credit card offer to show a customer on the bank’s website. To reduce the number of clients that leave the bank, the business wants to leverage the historical data that the bank has collected on customers that have churned in the past to predict which customers are likely to leave the bank in the near future. The bank wants to show the potential churners a retention offer instead of a credit card offer.

As a data scientist, your task is to create a predictive model that predicts churn. You decide to create the model using Pega machine learning.

Use the following credentials to log in to the exercise system:


User name


Data Scientist



Your assignment consists of the following tasks:

Task 1: Create a new predictive model

Create a new predictive model, ChurnPML, using the Churn Modeling template in the Retention catagory.

Task 2: Prepare the data

Load the data set using the historical_data.csv file. Set the type of predictors that have no predictive power, like CustomerID and OfficePhone, to Not used. Create a uniform sample that uses 100% of the data. Retain 20% for the test set and 20% for the validation set. In the Outcome definition, use a Binary outcome type and Segment as the outcome field. Map the values of the outcome field to the outcome category.

Task 3: Analyze the data

Change the role for predictors that have a grouped performance lower than 52 to ignored. Examine the trends exhibited by the best performing predictors. Create a virtual field by combining several numerical predictors and examine the trend exhibited by this new predictor.

Task 4: Develop predictive models

For predictor grouping, use the best predictor of each group. Create a new decision tree model of type ID3.

Task 5: Analyze the models

Compare the scores of the three models. Pay particular attention to Discrimination.

Task 6: Select model

Select the Regression model. Make sure all predictors are mapped to customer properties. Reclassify the classes into a loyal class and a churned class. Save the model.

Task 7: Test the model

Run the model using the Troy data transform. Troy has a high churn risk. Re-run the model using the Barbara data transform. Barbara has a low churn risk. Finally, run the model on the CustomerBatch data set and notice the number of customers that are predicted to churn.

Challenge Walkthrough

Detailed Tasks

1 Create a predictive model

  1. Download and extract the historical_data.csv file.
  2. Log in as a Data Scientist with user name DataScientist and password rules.
  3. In the navigation pane on the left, click Intelligence > Prediction Studio > Models.
  4. Click New > Predictive model.
  5. In the New predictive model dialog box, in the Name field, enter ChurnPML.
  6. In the Category list, select Retention.
  7. In the Template list, select Churn Modeling.
  8. Click Start.

2 Prepare the data

  1. In the Source selection section, click Choose File and select the historical_data.csv file.
  2. Check the data, and then click Next.
  3. For the CustomerID and ACCOUNT_ID fields, change the type to Not used.
  4. Click Next.
  5. In the Outcome definition section, in the Model type list, select Binary.
  6. In the Outcome field to predict, select Segment.
  7. In the Outcome category list for churned, select churned.
  8. In the Outcome category list for loyal, select loyal.
  9. Check the number of cases in the development, validation, and test sets.
    Outcome categories
  10. Click Next.

3 Analyze the data

  1. In the list of predictors, click RiskScore and examine the grouping for this predictor.
  2. Click Cancel.
  3. Click New virtual field.
  4. In the Virtual field dialog box, in the Name field, enter Income*CLV.
  5. Click Fields, select Income, and then click Insert.
  6. Repeat this field selection step and build up the expression: Income * {CLV_VALUE}.
  7. Click Save & close.
  8. Confirm that the newly created predictor outperforms the two original predictors.
    Virtual field
  9. Click Next.

4 Develop predictive models

  1. In the Model development section, select Use best of each group.
    Predictor grouping
  2. Click Next.
  3. In the Model creation section, select Decision tree from the Create model list.
    Create model
  4. Click Create model.
  5. Select ID3.
  6. Click Create.
  7. Examine the model created.
  8. Click Submit to save the model.
    List of models
  9. Click Next.

5 Analyze the models

  1. Ensure that the check boxes next to all models are selected.
  2. Click Analyze charts.
  3. Select the Discrimination tab and examine the results.
    Note: The regression model outperforms both decision tree models as it has the largest area under the curve (AUC). However, before you choose a model you should also consider the number of predictors required by the model. Under certain circumstances you may decide to select a lower performing predicting model but one with fewer predictors. Note that all models perform very well with a value around 90.
  1. In the upper left, click the arrow next to Model analysis charts.
  2. Click Next. Here, you can analyze the score distribution.
  3. Click Next. Here, you can analyze class comparison.
  4. Click Next.

6 Select the model

  1. Select the Regression model.
  2. Click Finish.
  3. On the Model tab, in the Expected score distribution section, click between Result6 and Result7 in the score distribution chart.
  4. Under Classification groups, rename class 1-6 as loyal and Class 7-10 as churned.
    Class names
  5. Click the Mapping tab and ensure all predictors are mapped to the appropriate customer fields.
  6. Click Save.

7 Test the model

  1. In the top right, click Run.
  2. In the Run predictive model dialog box, in the Inputs section, select data transform Troy as the data source.
    Inputs Troy
  3. Click Run and scroll down to the Outputs section. Verify that the result for Troy is churned.
    Results Troy
  4. Re-run the model with data transform Barbara as the data source.
  5. Verify that the result for Barbara is loyal.
    Results Barbara
  6. On the Batch run tab, select CustomerBatch as the data source.
  7. Click Re-run.
  8. For the output, select Results.
    Batch result
  9. Notice that model predicts that roughly 30% of the 10K customers in the data set is likely to churn.

Available in the following mission:

We'd prefer it if you saw us at our best.

Pega Academy has detected you are using a browser which may prevent you from experiencing the site as intended. To improve your experience, please update your browser.

Close Deprecation Notice