Data flows

Data flows are scalable and resilient data pipelines that you can use to ingest, process, and move data from one or more sources to one or more destinations. Each data flow consists of components that transform data in the pipeline and enrich data processing with event strategies, strategies, and text analysis. The components run concurrently to handle data, starting from the source and ending at the destination.

You can create the following data flows that process data:

Batch data flows go over a finite set of data and eventually complete processing. Batch data flows are mainly used for processing large volumes of data.
Real-time data flows go over an infinite set of data. Real-time data flows are always active and continue to process incoming stream data and requests.
Single-case data flows are run on request, with the data flow source set to abstract. Single-case data flows are mostly used to process inbound data.

Note: Data flow runs that you initiate on the Data Flows landing page run in the access group context. These data flows always use the checked-in instance of the data flow rule and the referenced rules. If you want to do a test run, you can use a checked-out instance of the data flow rule.

Create a data flow to process and move data between data sources. Customize your data flow by adding data flow shapes and by referencing other business rules to do more complex data operations. For example, a simple data flow can move data from a single data set, apply a filter, and save the results in a different data set. Other data flows can source more complex data flows. More complex data flows can also apply strategies to process data, open a case, or trigger an activity as the outcome.

Having a large number of data flow runs active at the same time can deplete your system resources. To ensure efficient data flow processing, you can configure dynamic system settings to limit the number of concurrent active data flow runs for a node type.

Real-time event processing with Kafka

Pega Customer Service™ and Pega Customer Decision Hub™ include the default Event Stream service to process real-time events. If required, you can also take advantage of the high performance and scalability that Apache Kafka offers; configure Pega Marketing or Pega Customer Decision Hub to switch to an external Kafka cluster.

Events provide a mechanism for responding to real-time marketing opportunities. External or internal systems can initiate an event and trigger a campaign run. For example, when a customer with a checking account with UPlus Bank accesses the ATM of the bank, the Event Stream service recognizes the action and triggers the campaign to which the event is mapped. As a result, the ATM screen shows the customer an offer for a new credit card that UPlus Bank wants to advertise. By default, the event processing is handled by the Event Stream service.

The queue processor automatically generates a stream data set and a corresponding data flow. The stream data set sends messages to and receives messages from the Stream service. The data flow manages the subscription of messages to ensure message processing.

You can view data flows that correspond with the queue processor rules in your system on the QueueProcessors landing page in Admin Studio. On the QueueProcessors landing page, you can open the data flow that is associated with every queue processor rule, with which you can monitor and diagnose the background processes. You can also trace, enable, and disable your queue processor rules, or you can perform these same functions by using REST APIs.

Queue processor rules rely on the DSM service. A Stream node is automatically configured and ready to use in Pega Cloud® Services and client-managed cloud environments. For on-premises environments, ensure that you define at least one node in a cluster to be a Stream node by using the -DNodeType=Stream setting.

Check your knowledge with the following interaction:

Get help

If you are having problems with your training, please review the Pega Academy Support FAQs.

Did you find this content helpful?

Yes

Want to help us improve this content?

Suggest an edit

Data flows

Real-time event processing with Kafka

We'd prefer it if you saw us at our best.