Data flows
Data flows are scalable and resilient data pipelines that you can use to ingest, process, and move data from one or more sources to one or more destinations. Each data flow consists of components that transform data in the pipeline and enrich data processing with event strategies, strategies, and text analysis. The components run concurrently to handle data, starting from the source and ending at the destination.
You can create the following types of data flows that process data:
- Batch data flows go over a finite set of data and eventually complete processing. Batch data flows are mainly used for processing large volumes of data.
- Real-time data flows go over an infinite set of data. Real-time data flows are always active and continue to process incoming stream data and requests.
- Single-case data flows are run on request, with the data flow source set to abstract. Single-case data flows are mostly used to process inbound data.
To process and move data between data sources, create a data flow. Customize your data flow by adding data flow shapes and referencing other business rules to perform more complex data operations. For instance, a simple data flow can move data from a single data set, apply a filter, and save the results in a different data set. More complex data flows can source simpler data flows and apply strategies to process data, open a case, or trigger an activity as the outcome.
Running a large number of data flow runs simultaneously can deplete your system resources. To ensure efficient data flow processing, configure dynamic system settings to limit the number of concurrent active data flow runs for a node type.
Real-time event processing with Kafka
Pega Customer Service™ and Pega Customer Decision Hub™ come with the default Event Stream service to process real-time events. If necessary, you can also take advantage of the high performance and scalability of Apache Kafka by configuring Pega Marketing or Pega Customer Decision Hub to switch to an external Kafka cluster.
Events offer a mechanism for responding to real-time marketing opportunities. External or internal systems can initiate an event and trigger a campaign run. For instance, when a customer with a checking account at UPlus Bank uses the bank's ATM, the Event Stream service recognizes the action and triggers the campaign mapped to the event. As a result, the ATM screen displays an offer for a new credit card that UPlus Bank wants to advertise. By default, the Event Stream service handles event processing.
The queue processor automatically creates a stream data set and a corresponding data flow. The stream data set sends and receives messages to and from the Stream service. The data flow handles message subscription to ensure message processing.
You can access data flows that correspond to the queue processor rules in your system on the
landing page in Admin Studio. On this page, you can open the data flow associated with each queue processor rule to monitor and diagnose background processes. You can also trace, enable, or disable your queue processor rules using REST APIs.Queue processor rules depend on the DSM service. A Stream node is automatically configured and available for use in Pega Cloud® Services and client-managed cloud environments. For on-premises environments, make sure to designate at least one node in a cluster as a Stream node by using the -DNodeType=Stream setting.
Check your knowledge with the following interaction:
This Topic is available in the following Module:
Want to help us improve this content?