Data flows

Data Flows in Pega Platform™ enable efficient movement and processing of data across systems. You can ingest data from multiple sources, transform and enrich it by using configurable components, and deliver it to one or more destinations. Each segment of a Data Flow runs concurrently to support high performance and reliability from input to output.

A Data Flow defines end-to-end record processing in Pega Platform, including the sequence of shapes, branching logic, record-level controls, delivery guarantees, and operational management. It acts as a processing pipeline that converts raw records into durable outcomes across designated destinations.

Run modes and patterns

Data Flows in Pega Platform can operate in different modes depending on the nature of your data and business requirements. Each mode addresses specific processing needs, whether you handle large batches, continuous event streams, or individual requests. By choosing the right run mode and applying proven design patterns, architects can ensure that data pipelines are both efficient and resilient.

You can create the following types of Data Flows:

Batch Data Flows
Real-time Data Flows
Single-Case Data Flows

Batch Data Flow

A Batch Data Flow processes a finite set of records and completes when all data is processed. This mode is suited for scenarios such as high-volume backfills, nightly updates, or large file ingestion where the dataset is known.

Design patterns for batch Data Flows include:

Tuning runner threads and batch size.
Setting checkpoint intervals that balance the cost of writing to the destination.

These patterns help maximize throughput and support reliable recovery. Designing idempotent updates at the destination is critical because it prevents duplicate changes if a batch is retried or resumed.

Real-time Data Flow

A real-time Data Flow processes an unbounded stream of incoming events and remains active continuously. This mode is suited for event-driven processing, streaming enrichment, and analytics that require immediate response, such as fraud detection or customer engagement tracking.
Design patterns for real-time Data Flows include:

Defining windows and joins to aggregate or correlate data over time.
Setting bounds for late-arriving events.

Propagating correlation IDs throughout the pipeline for end-to-end traceability and diagnostics

Single-Case Data Flow

A single-Case Data Flow supports synchronous, per-request processing by using an abstract source. This mode is suited for scenarios where you need to process or analyze a single record on demand, such as evaluating a decision strategy for a specific Case or transaction. The single-Case approach provides deterministic results and low latency, making it appropriate for interactive use cases that require immediate feedback.

Key applications of Data Flows

Data Flows in Pega Platform™ address a range of enterprise data processing needs. They provide a flexible framework for building pipelines that handle complex logic, support reliability, and deliver operational visibility.

The following key scenarios show when Data Flows are the preferred solution, with practical examples for each:

Multi-stage transformations

Use Data Flows when you need to apply multiple operations to data, such as filtering, converting formats, combining records, evaluating strategies, or performing text analysis before sending it to a destination.

Example: Cleansing and enriching customer data from multiple sources before updating a master database.

Branching to multiple destinations

Data Flows can route processed data to more than one target system, which supports parallel updates and analytics.

Example: Simultaneously updating a transactional database and sending summary data to a business intelligence platform.

Record-level resilience

Data Flows provide error handling through error ports, configurable retries, Dead Letter Queues (DLQ), and checkpoint-based recovery. These features prevent individual record failures from disrupting the entire pipeline.

Example: If a record fails validation during a batch update, it is sent to a DLQ for later review while the rest of the batch continues processing.

Observable processing

Monitor pipeline health and performance at every stage with metrics for topology, throughput, latency, and component-level statistics.

Example: Use the Data Flow landing page to identify bottlenecks in a nightly batch job and optimize throughput.

Run governance and control

Manage pipeline processing with controls for concurrency limits, run priorities, pause and resume capabilities, and restart options.

Example: Prioritize real-time fraud detection pipelines over less critical batch jobs during peak system load.

Event-driven and batch processing

Ingest data from streaming sources such as Kafka topics or process large batches from files, databases, or other systems.

Example: Process real-time transaction events for risk scoring or run a scheduled batch to update policy records overnight.

Partition management and scaling

Distribute processing across multiple nodes and partitions to maximize throughput and minimize bottlenecks for high-volume or latency-sensitive applications.

Example: Scale batch processing across several nodes to complete large file ingestion faster.

By using Data Flows for these scenarios, you can build solutions that are technically sound, scalable, and adaptable to evolving business needs.

Check your knowledge with the following interaction:

このトピックは、下記のモジュールにも含まれています。

Advanced background processing v7

お問い合わせ

トレーニングを実施中に問題が発生した場合は、Pega Academy Support FAQsをご確認ください。

このコンテンツは役に立ちましたか？

はい

いいえ

改善できるところはありますか？

修正を提案する