Skip to main content
This content is now archived and is no longer updated. Progress is not calculated. Pega Cloud instances are disabled, and badges are no longer awarded.

Creating a Pega predictive model

Introduction

Build predictive models on historical customer data in Prediction Studio, using the wizard that guides the user through the successive steps: data preparation, data analysis, model development, model analysis, and model selection.

Video

Transcript

This demo will show you to how to build a predictive model using Pega machine learning. Myco, a telecom service provider, will use this model in their new retention strategy.

The build consists of 5 steps: Data preparation, data analysis, model development, model analysis, and model selection.

Steps

In the ‘Data preparation’ step, the data source is selected, the sample is constructed, and the outcome of the model is defined. The data source can be a csv-file, a database table, a data flow or a data set.

The fields in the data source can be set to Numeric, Categorical or Not used.

Type

By default, all fields are considered potential predictors. When setting predictors, it’s important to use some common sense. For example, the customer ID is a random number and has no impact on the behavior to be predicted. For such fields, change the type to ‘Not used’.

If the data contains a relatively small number of cases, you will want to use 100% of the records. If the data source is large, a sample will be sufficient. You also define the hold-out sets for validation and testing during the model development.

Hold-out sets

The final step of the data preparation is to define the outcome to be predicted. Start by selection the corresponding outcome field. In this case, the field is churn. Churn exhibits binary characteristics, so a Scoring model is appropriate. In contrast, a Spectrum model predicts a continuous outcome; thus, only a numeric outcome field can be selected for this. Here you also specify how to differentiate between good and bad behavior.

Model type

It is worthwhile to verify that the customer distribution across the development data set is similar to the whole sample.

In the ‘Data analysis’ step, you analyze the individual predictors. For fields that have a very high performance, the Role is set to VALUE to protect models from accidentally using predictors that might be directly correlated to the outcome. You can also manipulate features to create a better predictor by creating a ‘New virtual field’. This is a fundamental step towards having good models.

total_minutes is a virtual field. It sums the day, evening and night Minutes fields.

Virtual field

The performance of this new predictor is higher than that of the individual fields. Data analysis creates a binned, ordinal view of individual predictors.

Both Binning and Granularity are automatically set but can be manually adjusted.

As part of model development, the grouping and predictor selection process is automated. When multiple predictors are correlated, considering them all for the machine learning process will lead to unnecessary model complexity. It is best practice to select the best performing predictor in each group, which is the default setting.

Prediction Studio provides a rich model factory supporting industry standard models. You can create 4 types of models: Regression models, Decision tree models, Bivariate models and Genetic algorithm models. By default, a Regression and a Decision tree model are automatically created. These models are highly transparent. Bivariate models and Genetic algorithm models have a lower transparency score.

Transparency

The purpose of Model Analysis is to select the best model for your use case. In the ‘Score comparison’ step, you can compare the scores generated by the models in terms of behavior, lift, gains and discrimination. Prediction Studio uses Area Under the Curve (or AUC) to measure the performance of predictors and models. You can describe AUC as the measure of how well the model is able to discriminate between good and bad cases. The value of AUC ranges from 50%: a random distribution, to 100%: the perfect discrimination.

In the ‘Score distribution’ step, the model scores are segmented based on a method you select. A typical example divides the scores into deciles: 10 classes with an equal number of cases.

Score distribution

The ‘Score distribution’ settings give several methods for defining these segments.

In the ‘Class comparison’ step, you can analyze and compare models after the score distribution has been adjusted.

Finally, you select the model that best fits your needs and specify the context in which to save it. Models are saved as rules. The class where the models are saved is the ‘Apply to’ class. Before you can save the model, check the mapping of the predictors to the properties of your ‘Apply to’ class. If the properties exist and have a name similar to a predictor field name, they will be mapped automatically. You also have the opportunity to create missing properties.

Missing field

If needed, you can adjust the score distribution segments by clicking on the original score distribution chart. The model can now be saved and is ready for use in a decisioning strategy.

You have reached the end of this demo. It showed you how to create a predictive model in Prediction Studio.


This Topic is available in the following Module:

We'd prefer it if you saw us at our best.

Pega Academy has detected you are using a browser which may prevent you from experiencing the site as intended. To improve your experience, please update your browser.

Close Deprecation Notice