Monte Carlo data sets
The Monte Carlo data set is a type of data source that uses predefined functions to generate random data. You can use a Monte Carlo data set to create sample data in the absence of real data.
When you start implementing Pega Customer Decision Hub™, data is typically not available. Understanding the data model shipped as part of your industry, identifying additional requirements, and preparing production-like data takes time and effort when no data integrations are in place. The lack of production-quality sample data typically delays starting your implementation and causes frequent errors in development.
In many organizations, production data is inaccessible or scrubbed, which renders it useless. Most developers can develop test data for their unit tests. But when it comes to tests that need a large amount of data, many organizations lack the ability to develop and apply it properly. Unit testing engagement policies, arbitration logic, and custom strategy extensions require various test cases to cover edge scenarios. But how can you test your development when you have insufficient data or no data? Monte Carlo data sets can help you. With minimal effort, you can populate millions of rows of customer data or generate supporting data for testing.
Start by defining the number of records in the Size field, which is the number of unique records that are created when the Monte Carlo data set runs.
Under Advanced Configuration, you can decide to define a Seed value for the Monte Carlo data set. The system uses the seed for all randomly populated values, for example, names and numbers, to initialize the random value generator. The seed value ensures that the system always generates the same records for a specific seed. To generate random values whenever you run the Monte Carlo data set, leave this field empty.
In the Define fields section, add all fields for which you want to generate data.
Next, select the Method that you want to use for populating the data for a specific Field.
The Monte Carlo method enables you to generate random data for most of your development and testing needs. In the Value list, you select from available functions and can provide additional arguments when required. When choosing from the list, you can see sample outputs for each function.
Hover over the question mark icon to see the usage description for each function.
Some frequently used Monte Carlo methods are:
Purpose | Method name |
---|---|
To generate consecutive identifiers |
ConsecutiveIds.nextRowID(Text,Text) |
To generate name and demographic data |
Name.prefix, Name.firstName, Name.lastName, Address.streetAddress, Address.city, Address.country |
To generate a random decimal |
Number.randomGaussian(Double,Double) |
To generate a random integer |
Number.digits(Integer) |
To generate a random defined text |
Options.options(Text,Text,Text,Text) |
To generate a random true/false |
Bool.bool |
When the Monte Carlo method is not adequate, you can use different methods to generate data, including:
- Decision Table
- Scorecards
- Decision trees
- Map Value
- Predictive models
- Expressions
The system generates the field values sequentially as they are defined in the Monte Carlo data set. Therefore, you can populate a field using the Monte Carlo method, and then use the result of that field as input for another field. For example, in the following figure, you populate .MKTCLVValue field with a three-digit integer in step 12, and then you use the outcome of this field as input for an expression condition to calculate the .CLV field in step 13.
Monte Carlo data sets also support populating Groups (Page Lists) of entities in a primary entity.
For example, you can populate a customer record, and then set fields for multiple accounts (.Accounts page list) of the same customer. In the following preview screen, the first customer record has two accounts with associated account data.
When running the Pega Customer Decision Hub setup wizard, you can pick from various industry templates that contain preconfigured best practices for the data model and Next-Best-Action Designer configurations. Some of these templates come with a Monte Carlo data set example to help you populate customer data for the available contexts. Save the SampleCustomerDataRecord Monte Carlo data set into the customer class of your implementation application, and then apply any customizations for your use case.
Using the functionality of the data flows, you can also manipulate and output the data generated through Monte Carlo data sets into other data sources, such as database tables, stream data sets, and decision data sets, such as Cassandra, and generate input for predictive models and interaction history.
Challenge
Tip: To practice what you have learned in this topic, consider taking the Populating sample data using Monte Carlo data sets challenge.
This Topic is available in the following Module:
If you are having problems with your training, please review the Pega Academy Support FAQs.
Want to help us improve this content?