Skip to main content

Best practices for data models

When Pega Platform™ was previously known as PegaRULES Process Commander (PRPC), it was essential to configure a data class as abstract instead of concrete. The terms "abstract" and "concrete" come from the Java programming language. The system can only inherit an abstract class instead of constructing the class. However, PRPC did not prevent you from directly creating a page of an abstract class on the Pega Clipboard by using a Data Transform or activity.

Currently, the main distinction in Pega Platform between abstract and concrete data classes is that an abstract data class does not define keys. A data class that does not define keys cannot be stored in a database table: no keys, no database storage. Every database table requires a primary key. 

Beginng with version 7.3, Pega Platform includes the Rule-Obj-Class Rule that generates a unique ID. If no key is specified, Pega Platform uses .pyGUID as the property that contains the generated key. It is virtually impossible to duplicate a .pyGUID value. This feature has the advantage of allowing application code to control the primary key of an instance primary key instead of depending on the database to prevent key duplication. As a result, an application code can define a Save Plan that can persist multiple data instances simultaneously, where every instance references an instance of a different class. 

The following figure shows the section of a Rule-Obj-Class rule form where you can configure .pyGUID as the primary key before the system generates a local source. When the system defines a data type in App Studio, this checkbox selection is active.

pyGUID configured as the primary key for a Rule-Obj-Class

The CustomerData schema eliminates the need for a database table to contain Pega-specific columns such as .pzPvStream and .pzInsKey. If you store a data class in the CustomerData schema, you must avoid code that expects to reference instances of that class by pzInsKey. Instead, you must reference the instance by its keys, such as .pyGUID. A data instance does not need to reference a Case by using the .pzInsKey value of a Case but can reference a Case by its key, .pyID. It is important to note that Pega does not allow the definition of .pzInsKey as a property-type relevant record.

Concrete data classes

The definition of concrete is very clear: concrete refers to something that you persist outside of a Case, for example, something that a Case can reference. The term "referential data" applies to the type of data that generally has no references or very few references to other data. You can think of referential data as long-lived data. 

It is recommended to avoid embedding referential data in the BLOB of a Case unless it is necessary. One reason to copy data is that its values are transient, meaning they only last for a short time. For example, the price of a product can change over time. When copying the price data, it also makes sense to copy the reason for the copy. The AsOfDate of a price record can help you find the historical price in the future by performing a database lookup.

Copying a computed and used price is beneficial for performance reasons because the system does not have to perform the same calculations used to compute its value. This approach is comparable to an invoice with numerous line items.

When there are no calculations for the system to perform, it makes sense to perform a lookup (SOR pattern) or join from the case to the referential data instead of embedding it as a snapshot pattern in the BLOB of the case.

As a best practice, ensure that referential concrete class instances are packageable and deployable without any prerequisites or limitations.

It does not make sense for a referential data class to have an embedded page list (field group list) where each page in the list references a historical data instance. The list grows indefinitely over time, which is problematic. This outcome is similar to the open/closed principle but applies to persistence instead of inheritance. A true concrete instance can remain "closed" to modification. In contrast, an instance that contains an embedded page list (field group list) that can grow indefinitely requires continuous modification of its BLOB.

Abstract data classes

Another example is when a historical data type has several scalar properties. For example, multiple Case Types can be defined that strictly involve that data type. Additionally, those Cases and that historical data type are virtually synonymous.

Data classes enable the reuse of rules, such as views and data transforms. They minimize the number of case-level properties that the system creates.

Reports that include properties within the embedded page's data class are slightly more complex. To expose properties within the embedded page, use "Optimize for reporting." The exposed column is displayed on the External Mapping tab of the class rule.

Enterprise-level versus application-level data classes

Define every data class that is central to how an organization does business in the Enterprise layer. If a data class is central to an organization, any related applications that undergo development in the future should use the same class. It does not make sense for two Pega applications in the same organization to share information through integration and Data Transforms. Instead, the data class should be the same. This approach is no different than expecting that every application uses Pega data classes such as Data-Party.

Suppose an application has additional properties that an organization wants to add to an enterprise-level data class that is inapplicable to other applications. In that case, that application can inherit the enterprise-level data class and add those properties directly. According to the Open/Closed principle, anything that works well at the enterprise level should work equally at the application level from the perspective of the enterprise application. The enterprise application does not care what specifically occurs at the application level; it only cares that functionality "X" runs correctly and successfully.

It is also fine for an application to define data classes that are specific to that application, but should occur when no other application has a reason to use the same data class. Data classes that meet this definition are rare and not the norm.

Data class names

It is common for a Case to capture data by using a data type synonymous with the class of the Case Type. In general, a Case name should be an action + noun or noun + action. The reasoning is that it is possible for multiple Case Types to process the same noun, such as "Vendor." Vendor Enrollment and TruckRequest are two Case Types that operate on corresponding data objects named "Truck Vendor" and "Truck Request"

Because a data class represents both a noun and an object, it is unnecessary to append the word "details" to the noun because that is what the object does; it encapsulates details or information about itself. So, avoid redundancy in your naming conventions. The word "details" does not add value. Just the opposite; that word calls into question whether Truck and TruckDetails are two different objects.

It is not a best practice to repeat the class name of a property within its own name. For example, you do not name an MDC-Data-DeliveryRequest property DeliveryRequestDate. You instead name the property RequestDate

Avoiding data redundancy

Data integrity is about maintaining a "single source of truth," meaning that you do not store the same data in two locations. This process is the reasoning behind database normalization techniques. The first and foremost of these techniques is to "…free the collection of relations from undesirable insertion, update, and deletion dependencies" (Edgar F. Codd, 1970).

Consider what might happen if the system stores the same data or a derivative of that data, for example, the Promoter Score Category, in multiple places and the original data changes. The difficulty and complexity of maintenance tasks dramatically increase if an application keeps track of and updates every location where the system stores a value whenever the value changes. For example, imagine a Rule-Declare-Trigger that deals with multiple locked instances simultaneously; it can cause lots of problems. 

Check your knowledge with the following interaction:


This Topic is available in the following Module:

If you are having problems with your training, please review the Pega Academy Support FAQs.

Did you find this content helpful?

Want to help us improve this content?

We'd prefer it if you saw us at our best.

Pega Academy has detected you are using a browser which may prevent you from experiencing the site as intended. To improve your experience, please update your browser.

Close Deprecation Notice