Skip to main content

Using a database index for report performance 

3 Tasks

1 hr 30 mins

Visible to: All users
Advanced Pega Platform 8.6 English
Verify the version tags to ensure you are consuming the intended content or, complete the latest version.

Scenario

Front Stage wants to investigate the impact of expanding its address search capability to include filtering on street addresses that start with a certain number. They are aware of a source of actual address information that they can use: OpenAddress. Front Stage is concerned that, as the Address table grows larger, and with greater use of address search, query performance could impact their ability to efficiently process cases.

The following table provides the credentials you need to complete the challenge.

Role User name Password User type
Admin Admin@Booking rules Application
pgAdmin4 pega pega Database
Note:  This challenge can only be completed via Linux Lite VM  as need to have database access.

Analyze the possible performance impact of expanding the address search capability to include filtering on street addresses. To do this, you first insert a large amount of test data into the FSG-Data-Address table, customerdata.fsg_data_address.

Next, you test a street filter condition to use with the Haversine formula. You do not need to modify the HaversineFormula Connect-SQL rule. Instead, execute the modified query from the PostSQL pdAdmin4 tool. Analyze the query and prefix the query by using EXPLAIN (ANALYZE true, COSTS true, FORMAT JSON). You first execute the query and note the result. Then, you add a database index to the street column. Execute the query again and compare the result from your first run. 

You must initiate your own Pega instance to complete this Challenge.

Initialization may take up to 5 minutes so please be patient.

Detailed Tasks

1 Review solution details

Consider indexing a table column that is used as a filter criterion in the SQL query.

2 Configure indexing

To complete the assignment, accomplish the following tasks.

  1. Download the address data from the following link: Excercise.zip (Openaddr Collected US Northeast File). 
  2. Transfer the .zip file to the $/home/architect/Desktop directory.                                                                                          
  3. Load the .csv address data into the PostgreSQL database by using the query tool of pgAdmin4.
  4. In the Exercise file, open .sql file fsg_data_address_import.sql and execute.
  5. Execute the following query to verify that all the records are loaded successfully.                                   
    Select count(*) from customerdata.fsg_data_address
                                                          
  6. Instead of modifying the Haversine formula by adding a filter criterion such as AND street LIKE '1 ROGERS STR%', perform a simple query within pgAdmin4 against the address table to analyze statistics. The following table is an example.                                                 
    EXPLAIN (ANALYZE true, COSTS true, FORMAT JSON)
    SELECT pyguid AS pyGUID,
    reference AS Reference,
    isfor AS IsFor,
    street AS Street,
    city AS City,
    state AS State,
    postalcode AS PostalCode,
    country AS Country,
    latitude AS Latitude,
    longitude AS Longitude
    FROM customerdata.fsg_data_address
    WHERE street like '1 ROGERS STR%';
     
  7. Verify and capture the statistics for the sequential scan.                                                                                                    
  8. Under pr_fsg_data_addressselect Indexes
  9. Right-click, and then select Create.
  10. Provide an index name such as idx_street or StreetIDX.
  11. In the Definition table, select btree.
  12. Add the street column to the index.
  13. Choose any operator class, such as text_pattern_ops.                  
  14. Repeat step 6 to see significantly faster performance due to an Index Scan being used.                                                                                                                                                                                                                   
  15. Optional: Execute the following modified version of the Haversine formula query.

    EXPLAIN (ANALYZE true, COSTS true, FORMAT json)
    SELECT pyGUID AS pyGUID,
    Reference AS Reference,
    IsFor AS IsFor,
    Street AS Street,
    City AS City,
    State AS State,
    PostalCode AS PostalCode,
    Country AS Country,
    Latitude AS Latitude,
    Longitude AS Longitude,
    Distance AS Distance
    FROM (
    SELECT z.pyguid AS pyGUID,
    z.reference AS Reference,
    z.isfor AS IsFor,
    z.street AS Street,
    z.city AS City,
    z.state AS State,
    z.postalcode AS PostalCode,
    z.country AS Country,
    z.latitude AS Latitude,
    z.longitude AS Longitude,
    p.radius,
    p.distanceunit * DEGREES(ACOS(COS(RADIANS(p.latpoint)) * COS(RADIANS(z.latitude)) * COS(RADIANS(p.longpoint - z.longitude)) + SIN(RADIANS(p.latpoint)) * SIN(RADIANS(z.latitude)))) AS Distance
    FROM customerdata.fsg_data_address AS z
    JOIN (
    SELECT 42.0 AS latpoint,
    -71.0 AS longpoint,
    30.0 AS radius,
    69.0 AS distanceunit ) AS p ON 1 = 1 ) AS d
    WHERE distance <= radius
    AND street LIKE '1 ROGERS STR%'
    ORDER BY distance

     The following image depicts the explain statement execution of the modified version of the Haversine formula.

    Report Performance - Haversine formula explain
    Note: Attempting to index the latitude and longitude columns does not improve performance because these columns are numeric. pgAdmin4 does not support the definition of indexes for numeric columns. Forcing an index on a numeric column using DDL syntax, for example, CREATE INDEX idx_lat ON customerdata.fsg_data_address(latitude), does not work; the index is ignored.
      
  16. Verify that indexed columns result in significantly improved performance readings.

3 Optional: Compare performance of report definition with different sources

To complete the assignment, accomplish the following tasks.

  1. Load the large amount of data into the system.                                                                                                                                                                   
    Report Performance - Total count
           
  2. Execute the below query plan to view the statistics/analysis.                                                                                                                         
    Query Plan Query Plan Output

     

    EXPLAIN (ANALYZE true, COSTS true, FORMAT json)

    SELECT pyguid AS pyGUID,

    reference AS Reference,

    isfor AS IsFor,

    street AS Street,

    city AS City,

    state AS State,

    postalcode AS PostalCode,

    country AS Country,

    latitude AS Latitude,

    longitude AS Longitude

    FROM pegadata.p_fsg_data_address

    WHERE street like '1 ROGERS STR%';

    [

    {

    "Plan": {

    "Node Type": "Seq Scan",

    "Relation Name": "p_fsg_data_address",

    "Alias": "p_fsg_data_address",

    "Startup Cost": 0,

    "Total Cost": 47526.55,

    "Plan Rows": 101,

    "Plan Width": 139,

    "Actual Startup Time": 128.282,

    "Actual Total Time": 417.202,

    "Actual Rows": 3,

    "Actual Loops": 1,

    "Filter": "((street)::text ~~ '1 ROGERS STR%'::text)",

    "Rows Removed by Filter": 1036761

    },

    "Planning Time": 0.241,

    "Triggers": [],

    "Execution Time": 417.292

    }

    ]
                                                                                                                                                                                                                               
    Report Performance - Before index statistics
     
  3. Run the report definition with the same filter criteria from the query plan.
    Report Performance - RD Criteria
          
  4. Verify that data retrieval preference is set to use the database (default).
    Report Performance - RD Criteria
     
  5. Start the Tracer tool and run the report definition to verify the elapsed time in the Tracer.
    Report Performance - Before index tracer
     
  6. Create the index for the required columns (for example, Street).
    Report Performance - index
      
  7. Following the index creation for the column, execute the query plan to view the analysis/statistics.                                               
    Report Performance - after index statistics
     
  8. Start the Tracer tool and run the earlier configured report definition to view the elapsed time in the Tracer.                                                     
    Report Performance - After index tracer
           
  9. Compare the time spent and the elapsed time for the query before and after index creation (step 5 versus step 8).                                                         
  10. Next, try the report using elastic search index.  You will need to modify the report and create an elastic search index.
  11. Update the report definition data retrieval preference to Prefer elastic search index.                                                                                     
    Report Performance - Elastic search Retrieval
     
  12. Create a custom search property rule and add the required column/property.
  13. Create the dedicated index.                                                         
    Report Performance - custom properties
     
  14. Verify that the required dedicated index is available. If not, index or re-index to make it available.                                                                     
    Report Performance - dedicated indexes
      
  15. Start the Tracer tool, and run the report definition using elastic search to view elapsed time in the Tracer.                                                                                     
    Report Performance - Elastic search tracer
     
  16. Compare the elapsed time of interaction end of database retrieval vs elastic search index (step 8 vs step 14).

The report definition that is sourced by an elastic search index retrieves results faster than sourced by a database.



Available in the following mission:

If you are having problems with your training, please review the Pega Academy Support FAQs.

Did you find this content helpful?

Want to help us improve this content?

We'd prefer it if you saw us at our best.

Pega Academy has detected you are using a browser which may prevent you from experiencing the site as intended. To improve your experience, please update your browser.

Close Deprecation Notice