Skip to main content

Background processing modernization

Pega background processing historically used agents, which poll the database and process lists of items. This approach works for simple tasks but does not scale well for the volume and speed required by modern enterprise data.

In Pega Platform™, Data Flows are the standard for background processing. This shift moves from a task-centric model (finding a task and processing it) to a pipeline-centric model (streaming a record and processing it). For the Lead System Architect, this means adopting new design practices, such as design pipelines instead of queues.

補足: Legacy Rule-type agents (including advanced agents) are deprecated. While existing agents remain supported to provide migration time, Pega Platform no longer supports creating new instances of Rule-type agents. Migrate to job schedulers, queue processors, and Data Flows for all new implementations.

Node types and thread management

The key difference between a legacy Agent and a Data Flow is how each uses system resources.

The following table compares legacy advanced agents with batch and real-time Data Flows:

Feature Legacy advanced agent Batch/real-time Data Flow
Processing model Polling and locking. An agent wakes up on one node, queries the database, and iterates through results. If multi-node, agents compete for the same records or require complex partitioning logic. Partitioning and distribution. Data Flows automatically partition data (such as Kafka partitions or database segments) and distribute the workload across all available batch or real-time nodes.
 
Scalability Vertical. To process faster, you typically increase the thread count on a single node, which can lead to resource contention.

Horizontal. To process faster, add more background processing nodes. The Data Flow manager automatically rebalances the partitions to the new nodes.

In Pega Platform '25, thread management further simplifies batch Data Flow configuration by dynamically optimizing thread allocation based on system resources and workload demands.

State management Stateless (mostly). Each run is independent. Stateful. Real-time Data Flows maintain state (such as aggregations and windows) in the stream service, enabling patterns like fraud detection over a 15-minute window that agents cannot natively handle.

Legacy agents create hot spots on specific nodes. Data Flows are designed to saturate the cluster evenly, maximizing hardware ROI.

Background processing tool selection

Selecting the appropriate background processing tool in Pega Platform is important for building scalable and maintainable enterprise solutions. A common misconception is that job schedulers can replace all legacy agents. In practice, the choice depends on the processing pattern and business requirements.

Job schedulers: Time-based automation

Job schedulers are best for automating recurring, time-based tasks that do not involve large volumes of individual records.

Examples include:

  • System maintenance routines.
  • Triggering daily reports.
  • Initiating nightly batch processes.

Job schedulers work well when tasks are predictable and periodic, which supports essential system activities occur without manual intervention.

Queue processors: Asynchronous work item handling

Queue processors handle asynchronous processing of individual, queued work items. They replace standard Agents and provide dedicated threads for improved performance in queue-based workloads.

Examples include:

  • Sending notifications after Case approval.
  • Integrating with external fulfillment systems.
  • Performing virus scans on newly uploaded documents.

Queue processors reduce latency and avoid locking issues common with legacy agent queues.

Data Flows: High-volume and real-time processing

Data Flows support high-volume batch processing, large-scale data ingestion, and real-time event stream analysis. They replace advanced Agents for scenarios involving large record sets or continuous data streams.

Examples include:

  • Overnight updates to hundreds of thousands of policy records.
  • Ingesting and transforming large files.
  • Analyzing customer interaction streams for fraud detection.

Data Flows use partitioning and parallel runs to scale across multiple nodes and support both batch and real-time patterns.

Resilience and observability in modern background processing

When evaluating the shift from legacy agents to modern processing mechanisms in Pega Platform, the benefits go beyond performance. Enhanced observability and resilience are key to maintaining reliable enterprise systems.

Enhanced observability

Legacy agents offer limited visibility into processing activities. Troubleshooting often requires reviewing raw text logs (such as PegaRULES.log) or tracing individual executions without real-time context.

In contrast, Data Flows provide a comprehensive, visual monitoring experience:

  • The Data Flow landing page displays a real-time topology of the pipeline.
  • Architects can instantly view the number of records in motion, throughput metrics (records per second) for each component, and pinpoint bottlenecks.
  • Pega Platform tracks partitioning configurations and lifecycle events by enabling holistic monitoring and rapid diagnosis.

Built-in resilience: The poison pill pattern

Resilience is another area where Data Flows surpass legacy agents. In traditional agent-based processing, a single malformed or poisoned record can trigger an unhandled exception, potentially crashing the entire agent run or terminating a thread. Recovery is often manual and disruptive.

Modern Data Flows address this issue with granular, record-level error handling:

  • You can define error thresholds and retry settings at the Data Flow destination.
  • If a record fails, for example, because of an invalid data format, the pipeline continues processing other records. The failed record is routed to a dedicated error Data Set, such as a file or database table, for later review.
  • The main pipeline continues uninterrupted and processes remaining records efficiently.
  • Checkpoint-based fault-tolerance mechanisms support at-least-once delivery guarantees to improve reliability.

By using Data Flows, architects gain speed, scalability, and the ability to monitor, diagnose, and recover from errors with minimal disruption. These capabilities support resilient and maintainable enterprise solutions in Pega Platform.

Check your knowledge with the following interaction:


このトピックは、下記のモジュールにも含まれています。

トレーニングを実施中に問題が発生した場合は、Pega Academy Support FAQsをご確認ください。

このコンテンツは役に立ちましたか?

改善できるところはありますか?

We'd prefer it if you saw us at our best.

Pega Academy has detected you are using a browser which may prevent you from experiencing the site as intended. To improve your experience, please update your browser.

Close Deprecation Notice