Skip to main content

Accessing PDF data with automations

5 Tasks

15 mins

Visible to: All users
Intermediate Pega Robotic Automation 22.1 Robotic Process Automation English


The billing department of the Astend Technology company receives invoices in PDF file format and must parse specific data to another system for proper processing. You will create a PDF document type for the invoice to ensure that it is identifiable and its values are available in an automation. The automation passes in a file location parameter to open the document. From the lookup table, the automation loops through the table to retrieve the data subtotal, sales tax, and total values, and then stores them into the global variables as a Double data type. 

Complete the following tasks:

  1. Add the PDF file connector to Globals.
  2. Configure the PDF file document type.
  3. Configure a lookup Table and global variables to store the automation data.
  4. Create an automation that opens the PDF document file and updates the lookup table.
  5. Loop through the table to extract data values for Subtotal, Sales Tax, and Total to global variables, and change the data type from String to Double. 

Starting project:

Download, the following solution zip file and extract it to extract it to C:\Users\<yourLogin>\Documents\Pega Robot Studio\Projects.

Download the sample PDF file. Save it to the Desktop.

You must initiate your own Pega instance to complete this Challenge.

Initialization may take up to 5 minutes so please be patient.

Detailed Tasks

1 Add the PDF connector to Globals

  1. On the Project tab, click Globals
  2. In the Toolbox, in the search field, enter PdfConnector.
  3. Drag the PdfConnector to the Globals. 
  4. On the Property Grid of the PdfConnector, change the Name property to Invoice
  5. Save the changes on the Globals tab.

2 Add and configure a document type

  1. In the Globals, right-click the Invoice, and then select Add Document Type
    Adding the document type menu on Globals
  2. Select the PDF file sample document on the Desktop from the Open dialog box. 
  3. In the Add New Document Type window, in the Document type name field, enter Astend Invoice.
  4. In the Threshold configuration pane of the dialog box, in the Line menu, click Show to highlight lines in the preview area on the right. 
  5. Adjust the Line threshold to ensure correct line identification in the PDF file document type. The following figure shows an example of the lines:
    Line threshold configuration
  6. In the dialog box, in the Table configuration section, select Include rectangles in selection.
    Tip: This feature increases the number of tables recognized in the document, matching tables based on intersecting lines.
  7. Click Next to open the Identifiers section.
  8. In the Identifiers section, click Add > Text to configure a unique document type identifier.
    Adding identifier
  9. In the Identifier name field, enter Invoice for.
  10. In the upper-right corner, click the blue rectangle to configure the identifier. The rectangle changes to orange when activated.
  11. Draw a rectangle around the Invoice for: text. 
    Selecting the identifier
  12. Click Save, and then click Next.
  13. In the Automation values section, select Add > Table.
  14. Confirm the blue selection rectangle in the right corner is highlighted.
  15. Draw a rectangle around the Invoice for: text in the PDF image.
  16. In the Table name field, enter tblInvoice.
  17. Click Validate to confirm the selected landmark and to highlight the Invoice Item table. 
  18. In the Table fill option list, select Compact to remove empty cells and move data to the left. 
  19. In the Select Table section, click the blue rectangle to highlight.
  20. In the PDF image, click the Invoice Item table to highlight.
  21. Click Show value to see the table data output.
  22. In the last row, notice that the Total field is off by one column. Also, notice that the Subtotal and Total values have a currency symbol.
    Result of table identification
  23. On the dialog window, click OK.
  24. Click Save, then click Back twice to return to the Document tab.
  25. Click Modify, and then accept the change message.
  26. Clear the Include rectangles in selection checkbox, and then click Next twice to return to the Values tab for table definition.
  27. On the Values tab, click the Edit icon on the Invoice For table.
    Reselecting the table without rectangles
  28. In the Select Table area, click the blue rectangle, and then click the Invoice Item table.
  29. Click the Show value to confirm the correct table structure, and then click OK to close the dialog window.
    Correct table structure
  30. In the Select Table section, select My table has headers in row
  31. Select Advanced column options, and then configure the columns based on the following table.
    Column name New column name Filter results Remove spaces from lines Remove all blank lines Remove these characters
    Col1 Item True True True  
    Col2 Description True True True $, 
    Col3 Qty True True True  
    Col4 Unit Price True True True $,
    Col5 Discount True True True $,
    Col6 Price True True True $,
  32. Click Save, and then click Done to finish the Document Type configuration.
  33. Click File > Save all to save the changes made in the project. 

3 Configure a Lookup Table and add global variables

  1. In the Globals tab, select the lktblInvoice lookup table to access its Property Grid.
  2. In the Fields property in the Property Grid, click More to open the LookupField Collection Editor.
  3. In the LookupField Collection Editor, click the Add icon to add the following fields to the lookup table:
    FieldName Key Type
    Key True System.Int32
    Item False System.String
    Description False System.String
    Quantity False System.String
    Unit Price False System.String
    Discount False System.String
    Price False System.String
    Tip: The Description column is set to a String data type, but we will have to convert the three integers of Total, Subtotal, and Sales Tax as a Double data type later. 
  4. In the Toolbox, expand the Variables section.
  5. Add three Double variables to Globals, and then enter the following names:
    • decTotal
    • decSubtotal
    • decSalesTax
  6. Select File > Save all to save the changes made in the project. 

4 Create a sub-automation to open the file location and populate the lookup table

  1. On the Project tab, click Add > Automation
  2. On the Add new automation dialog box
    1. In the Automation name field, enter Invoice Data Pull.
    2. Deselect Empty Automation checkbox.
    3. Click Add.
    A completed Add new automation dialog box
  3. On the Run block, add a String parameter named fileLocation.
  4. In the Globals section on the Palette, click and drag Invoice to the automation, and then add the FileName property.
  5. From the Invoice.FileName block, add the Invoice.Open method to the automation.
  6. In the Globals section on the Palette, click and drag Invoice > Astend_Invoice > tblInvoice to the automation, and then add the Table property.
  7. On the Jump to Error block, rename the errMessage parameter to Not able to access PDF document.
  8. In the Globals section on the Palette, click and drag lktblInvoice to the automation, and then add the ReplaceTableAutoKey method.
  9. Connect the design blocks as shown in the following figure.
    automation showing opening the file location and replacing the lookup table with the table data and adding an auto-key column
  10. Click Save all

5 Loop through the lookup table and store data to global variables

  1. In the search field on the Toolbox, enter ForLoop. Drag the ForLoop to the Designer windows to add a loop to the automation.
  2. Connect lktblInvoice.ReplaceTableAutoKey design block with the ForLoop1 design block.
  3. In the Globals section of the Palette, click and drag lktblInvoice to the automation, and then add the RowCount property.
  4. Connect lktblInvoice.RowCount property output with the ForLoop1.Limit input. 
    automation section showing the configuration of the for loop iteration values
  5. In the Toolbox, expand the Variables section, and then drag the Integer to the Designer windows. 
  6. On the Integer1 design block, double-click Interger1 and enter intIndex
  7. In the Palette, click and drag lktblInvoice, and then add the GetRecord method to the automation.
  8. On the automation surface, right-click, and then add Jump To > Error, and then in the errMessage parameter enter Not able to access Data Table.
  9. In the Search field on the Toolbox, enter Switch, and then click and drag the Switch component to the automation.
  10. On the lktblInvoice.GetRocord block, draw a data connector from the Item parameter to the Input port of a Switch1 component. 
  11. On the Switch1 design block, click (Not Defined), and then enter TOTAL.
  12. On the Switch1 design block, click the Plus icon twice, and then add Invoice Subtotal and Sales Tax.
  13. On the automation, copy and paste intIndex.Value block three times.
  14. On the Switch1 block, draw a connector from each output parameter to an intIndex.Value property design block.
    Connect switch to intIndex properties
  15. In the Globals section on the Palette, click and drag lktblInvoice to the automation, and then add the GetRowColumn method.
  16. On the lktblInvoice.GetRowColumn block, click columnName, and then enter Description.
  17. In the automation, right-click lktblInvoice.GetRowColumn block, and then copy and paste the block twice.
  18. In the Search field on the Toolbox, enter change, and then click and drag the ChangeType method to the automation.
  19. On the Convert.ChangeType block, click typeCode, and then select Double.
  20. On the automation, copy the Convert.ChangeType block, and then paste it two times.
  21. On the Palette, click and drag the following variables to the automation, and then add its Value property.
    • decSalesTax
    • decSubtotal
    • decTotal
  22. Connect the design blocks as shown in the following image.
    automation connectors from get row column method to each change type method and then its needed variable value.
  23. Confirm the automation links from the following image.
    Image of full automation connectors complete.
  24. Click Save all to save the changes made in the automation.

Confirm your work

  1. In the Invoice Data Pull automation add two breakpoints:
    1. On the connector between tblInvoice.Table property and the lktblInvoice.ReplaceTableAutoKey method.
    2. On the connector between Exit and ExitPoint.
  2. On the automation navigation bar, click Test.
  3. From your desktop taskbar, open File Explorer.
  4. On your desktop, locate the Sample.pdf file and copy its file path.
    Invoice data pull automation input parameter window for debug
  5. In fileLocation field on the Invoice Data Pull modal dialog box, paste or enter the Sample.pdf file location, and then click Test. The debugging starts, and the automation pauses at the first breakpoint.
  6. On the lktblInvoice.ReplaceTableAutoKey block, right-click, and then select Update lookup table.
    context menu showing Update lookup table option
  7. On the Lookup table data editor modal dialog box, confirm no data exists, and then click Cancel.
  8. On the automation debugging navigation bar, click Step In (F11) twice. That automation advances two steps.
  9. On the Lookup table data editor modal dialog box, confirm the data from the pdf exists in the lookup table, and then click Cancel.
    Lookup table editor showing data from the pdf file.
  10. On the automation debugging navigation bar, click Continue. The debugging continues, and the automation pauses at the second breakpoint.
  11. On the Menu, click Debug > Debugging tools.
  12. On the Debugging tools window, click AUTOMATION VALUES.
  13. In the Search field, enter dec. The entry filters the items and displays the three double variables with their respective values in each.
    Automation values tab filtered on dec and showing the variables with their values.
  14. On the automation debugging navigation bar, click Continue. The debugging continues, and the Test results window displays.
  15. On the Test results modal dialog box, click Done.

This Challenge is to practice what you learned in the following Module:

If you are having problems with your training, please review the Pega Academy Support FAQs.

Did you find this content helpful?

Want to help us improve this content?

We'd prefer it if you saw us at our best.

Pega Academy has detected you are using a browser which may prevent you from experiencing the site as intended. To improve your experience, please update your browser.

Close Deprecation Notice