Data flows
Data flows are scalable and resilient data pipelines that you can use to ingest, process, and move data from one or more sources to one or more destinations. Each data flow consists of components that transform data in the pipeline and enrich data processing with event strategies, strategies, and text analysis. The components run concurrently to handle data starting from the source and ending at the destination.
You can create data flows that process data in different ways:
Batch data flows go over a finite set of data and eventually complete processing. Batch data flows are mainly used for processing large volumes of data.
Real-time data flows go over an infinite set of data. Real-time data flows are always active and continue to process incoming stream data and requests.
Single case data flows are executed on request, with the data flow source set to abstract. Single case data flows are mostly used to process inbound data.
Create a data flow to process and move data between data sources. Customize your data flow by adding data flow shapes and by referencing other business rules to do more complex data operations. For example, a simple data flow can move data from a single data set, apply a filter, and save the results in a different data set. More complex data flows can be sourced by other data flows, apply strategies for data processing, open a case, or trigger an activity as the outcome of the data flow.
A large number of data flow runs that are active at the same time can deplete your system resources. To ensure efficient data flow processing, you can configure dynamic system settings to limit the number of concurrent active data flow runs for a node type.
Real-time event processing with Kafka
Pega Customer Service™ and Pega Customer Decision Hub™ include the default Event Stream service to process real-time events. If required, you can also take advantage of the high performance and scalability offered by Apache Kafka by configuring Pega Marketing or Pega Customer Decision Hub to switch to an external Kafka cluster.
Events provide a mechanism for responding to real-time marketing opportunities. An event is initiated by external or internal systems and can trigger the execution of a campaign. For example, when a customer who has a checking account with UPlus Bank accesses the bank's ATM, the action is recognized by the Event Stream service, which triggers the campaign to which the event is mapped. As a result, the ATM screen shows the customer an offer for a new credit card that UPlus Bank wants to advertise. By default, the event processing is handled by the Event Stream service.
The queue processor automatically generates a stream data set and a corresponding data flow. The stream data set sends messages to and receives messages from the Stream service. The data flow manages the subscription of messages to ensure message processing.
You can view data flows that correspond with the queue processor rules in your system on the QueueProcessors landing page in Admin Studio. From this landing page, you can open the DataFlow associated with every queue processor rule, with which you can monitor and diagnose the background processes. Also, on this landing page, you can trace, enable, and disable your queue processor rules, or you can perform these same functions using REST APIs.
Queue processor rules rely on the DSM service. A Stream node is automatically configured and ready to use in Pega Cloud Services and client-managed cloud environments. For on-premises environments, ensure that you define at least one node in a cluster to be a Stream node by using the -DNodeType=Stream setting.
This Topic is available in the following Module:
Want to help us improve this content?