Best practices for data models

When Pega Platform™ was previously known as PegaRULES Process Commander (PRPC), developers configured data classes as abstract instead of concrete. In Java, an abstract class can be inherited but not instantiated. PRPC did not prevent creating a page of an abstract class on the Pega Clipboard by using a Data Transform or activity.

In Pega Platform, the main distinction between abstract and concrete data classes is that an abstract data class does not define keys. A class without keys cannot be stored in a database table because every table requires a primary key.

Beginning with version 7.3, Pega Platform includes the Rule-Obj-Class Rule that generates a unique ID. If no key is specified, Pega Platform uses the .pyGUID property as the generated key. .pyGUID values are unique, which allows application code to control the primary key of an instance instead of relying on the database to prevent duplication.

The following figure shows the section of a Rule-Obj-Class rule form where you can configure .pyGUID as the primary key before the system generates a local source. When the system defines a Data Type in App Studio, this checkbox is selected.

pyGUID configured as the primary key for a Rule-Obj-Class.

The CustomerData schema removes the need for a database table to include Pega-specific columns such as .pzPvStream and .pzInsKey. When a data class is stored in the CustomerData schema, code must not reference instances by .pzInsKey. Instead, reference the instance by its keys, such as .pyGUID.

A data instance does not need to reference a Case by using the .pzInsKey value of a Case. It can reference a Case by its key, .pyID.

Pega Platform does not allow defining .pzInsKey as a property-type relevant record.

Concrete data classes

The definition of concrete is very clear: concrete refers to something that you persist outside of a Case, for example, something that a Case can reference. The term "referential data" applies to the type of data that generally has no references or very few references to other data. You can think of referential data as long-lived data.

It is recommended to avoid embedding referential data in the BLOB of a Case unless it is necessary. One reason to copy data is that its values are transient, meaning they only last for a short time. For example, the price of a product can change over time. When copying the price data, it also makes sense to copy the reason for the copy. The AsOfDate of a price record can help you find the historical price in the future by performing a database lookup.

Copying a computed and used price is beneficial for performance reasons because the system does not have to perform the same calculations used to compute its value. This approach is comparable to an invoice with numerous line items.

When there are no calculations for the system to perform, it makes sense to perform a lookup (SOR pattern) or join from the case to the referential data instead of embedding it as a snapshot pattern in the BLOB of the case.

As a best practice, ensure that referential concrete class instances are packageable and deployable without any prerequisites or limitations.

It does not make sense for a referential data class to have an embedded page list (field group list) where each page in the list references a historical data instance. The list grows indefinitely over time, which is problematic. This outcome is similar to the open/closed principle but applies to persistence instead of inheritance. A true concrete instance can remain "closed" to modification. In contrast, an instance that contains an embedded page list (field group list) that can grow indefinitely requires continuous modification of its BLOB.

Another example is when a historical data type has several scalar properties. For example, multiple Case Types can be defined that strictly involve that data type. Additionally, those Cases and that historical data type are virtually synonymous.

Data classes enable the reuse of rules, such as views and data transforms. They minimize the number of case-level properties that the system creates.

Reports that include properties within the embedded page's data class are slightly more complex. To expose properties within the embedded page, use "Optimize for reporting." The exposed column is displayed on the External Mapping tab of the class rule.

pyGUID implementation deep dive

pyGUID (Globally Unique Identifier) is an automatically generated identifier for Pega Data Records. Each data object in App Studio includes a pyGUID property as its primary key to ensure uniqueness and support consistent data modeling.

Use pyGUID in these scenarios:

You require an autogenerated, globally unique primary key.
Application logic controls the key instead of the database.
You need a simplified schema without Pega-specific columns.
You must implement a CustomerData schema.
You must ensure consistent IDs across distributed systems.

Use custom keys in these scenarios:

Guarantee uniqueness with existing business identifiers (for example, Social Security number or employee ID).
Provide human-readable identifiers.
Integrate with legacy systems that require specific formats.
Enable manual reference or search by identifiers for business users.
Enable manual reference or search by identifiers for business users.

Benefits for CustomerData schema

The CustomerData schema eliminates the need for Pega-specific columns such as pzPvStream and pzInsKey, and data instances are identified using keys such as pyGUID which provides:

A portable database schema.
Easier integration with external systems.
Reduced complexity and improved maintainability.

Practical differences: pyGUID versus pzInsKey

Review how the modern approach using pyGUID compares to the traditional method using pzInsKey, with a focus on portability, consistency, and integration.

pzInsKey (traditional)

MYAPP-DATA-CUSTOMER 20230515T154530.123 GMT

Tied to a Pega instance
Timestamp-based and harder for integrations
Less portable

pyGUID (modern)

9C0ADF3E-3019-4CCB<code>9C0ADF3E-3019-4CCB-8FB9-600A7FE8E923

Universal and instance-independent
Consistent format
Suitable for APIs and distributed systems

Enterprise-level versus application-level data classes

Define every data class that is central to how an organization does business in the Enterprise layer. If a data class is central to an organization, any related applications that undergo development in the future should use the same class. It does not make sense for two Pega applications in the same organization to share information through integration and Data Transforms. Instead, the data class should be the same. This approach is no different than expecting that every application uses Pega data classes such as Data-Party.

Suppose an application has additional properties that an organization wants to add to an enterprise-level data class that is inapplicable to other applications. In that case, that application can inherit the enterprise-level data class and add those properties directly. According to the Open/Closed principle, anything that works well at the enterprise level should work equally at the application level from the perspective of the enterprise application. The enterprise application does not care what specifically occurs at the application level; it only cares that functionality "X" runs correctly and successfully.

It is also fine for an application to define data classes that are specific to that application, but should occur when no other application has a reason to use the same data class. Data classes that meet this definition are rare and not the norm.

Data class names

It is common for a Case to capture data by using a Data Type that is synonymous with the class of the Case Type. In general, a Case name should be an action and noun or noun and action. The reasoning is that it is possible for multiple Case Types to process the same noun, such as "Vendor." Vendor Enrollment and Truck Request are two Case Types that operate on corresponding Data Objects named "Truck Vendor" and "Truck Request."

Because a data class represents both a noun and an object, it is unnecessary to append the word "details" to the noun because that is what the object does; it encapsulates details or information about itself. So, avoid redundancy in your naming conventions. The word "details" does not add value. Just the opposite; that word raises questions about whether Truck and TruckDetails are two distinct objects.

It is not a best practice to repeat the class name of a property within its own name. For example, you do not name an MDC-Data-DeliveryRequest property DeliveryRequestDate. You instead name the property RequestDate.

Data-Party- versus PegaData- classes

The main difference between these two class types is their purpose and how they relate to case processing:

Data-Party- classes represent individuals or entities that actively participate in a Case. These classes provide base functionality for managing case participants and their interactions within your application. They enable polymorphic behavior based on the class that defines the work party. Use these classes when you need to track roles, permissions, and participant-specific behaviors throughout case processing.
PegaData- classes manage general data. They store information that supports your application but does not involve Case participation. These classes provide standard Data Objects for common Data Types to support consistency and reusability across applications.

When to use Data-Party- classes

Use Data-Party- classes to:

Manage Case participants who play an active role in Case processing, such as customers submitting requests, agents processing claims, or operators performing system tasks.
Track user roles and permissions to define and enforce access levels and features.
Monitor participant interactions, including actions, decisions, and communications.
Implement multi-party workflows that involve coordination among participants with different responsibilities.

Example scenario

In a loan application process, use Data-Party-Person to represent the loan applicant, Data-Party-Operator for the loan officer reviewing the application, and Data-Party-Org for the employer providing employment verification. Each party has specific roles, permissions, and interactions in the Case Lifecycle.

When to use PegaData- classes

Use PegaData- classes to:

Store supporting information that does not represent an active case participant, such as contact details, addresses, or reference data.
Apply standard Data Objects for common Data Types to reduce development time and ensure compatibility.
Maintain reusable data for reference across multiple Cases or applications.
Separate data concerns by distinguishing between “who is involved” (parties) and “what information is needed” (data).

Example scenario

In the same loan application process, use PegaData-Contact to store the applicant’s phone numbers and email addresses, and PegaData-Address for residential and mailing addresses. This information supports the case but does not represent an active participant in the workflow.

Avoiding data redundancy

Data integrity is about maintaining a "single source of truth," meaning that you do not store the same data in two locations. This process is the reasoning behind database normalization techniques. The first and foremost of these techniques is to "…free the collection of relations from undesirable insertion, update, and deletion dependencies" (Edgar F. Codd, 1970).

Consider what might happen if the system stores the same data or a derivative of that data, for example, the Promoter Score Category, in multiple places and the original data changes. The difficulty and complexity of maintenance tasks dramatically increase if an application keeps track of and updates every location where the system stores a value whenever the value changes. For example, imagine a Rule-Declare-Trigger that deals with multiple locked instances simultaneously; it can cause lots of problems.

Check your knowledge with the following interaction:

This Topic is available in the following Module:

Designing Data Models and reporting strategies v1

Get help

If you are having problems with your training, please review the Pega Academy Support FAQs.

Did you find this content helpful?

Yes

Want to help us improve this content?

Suggest an edit

Best practices for data models

Concrete data classes

pyGUID implementation deep dive

Benefits for CustomerData schema

Practical differences: pyGUID versus pzInsKey

pzInsKey (traditional)

pyGUID (modern)

Enterprise-level versus application-level data classes

Data class names

Data-Party- versus PegaData- classes

When to use Data-Party- classes

Example scenario

When to use PegaData- classes

Example scenario

Avoiding data redundancy

This Topic is available in the following Module:

We'd prefer it if you saw us at our best.