Skip to main content

PDF Document Type configuration

After adding the PdfConnector component to your solution, you are ready to create a document type from a PDF file. Document types are documents that are identifiable by the known elements of a text or the presence of known form fields, such as business documents, travel itineraries, or governmental forms. When you access a PDF file in a solution, the PdfConnector uses the identifiers to recognize the document type. The values that are configured are available for use in an automation.

It is a best practice to add the PdfConnector to a global container. Right-click the PdfConnector in the design area and select Add Document Type to open the PDF Editor and configure a PDF document type.

Screenshot showing Add Document Type in the PdfConnector context menu.

In the PDF Editor, you perform the following actions to configure the PDF document type:

  • Name the document type and confirm whitespace thresholds on the Documents tab to ensure that lines, segments, and words are detected properly in the document.
  • Create document type identifiers on the Identifiers tab to uniquely identify a document type when accessed.
  • Create data values on the Values tab for use in automations.

Documents

In the following image, click the + icons to learn more about configuring a document type and thresholds on the Documents tab.

Identifiers

Identifiers act as match rules for your document type. Add identifiers that uniquely identify a document as the specific type you are configuring. Identifiers can be Text or Form fields. You can use both identifier types on a single document if the document has Form fields.

A text identifier matches text anywhere in a document, so adding more than one Text identifier may be required to make the document match unique. For example, suppose the document type is a Property Loss Notice document. In that case, the text Policy Number might be in multiple document types, so it is not a good choice as a document identifier. Instead, use the title of the document, Property Loss Notice. Document identifiers are static data, such as a label, rather than values that change from document to document.

Screenshot showing the identifier modification screen in the Add New Document Type wizard.

Draw a rectangle around the identifier you are configuring. The system populates the fields. By default, the PdfConnector matches any text in the document that contains the text value that you highlighted. When you require more complex matching, you can set the matching strategy to one of the following options:

  • Exact
  • Closest
  • RegEx
  • StartsWith
  • EndsWith
  • Contains
  • NotContains

Values

Values are how you define the form fields, text values, optical marks, and tables from which you want to extract data to use in an automation.

To create a Text value, you first define the landmark text that is in a fixed position relative to the value field by drawing a box around it. Then, you define the field value by drawing a box around the value. Landmarks help find the value due to their proximity to one another. The system matches the landmark text, and then you configure the data value that is associated with this landmark to act as the text that is a predefined distance away from the landmark. In the following image, the field label is the landmark text, and the data value is the text to the right of the label.

Screenshot showing the landmark value creation screen in the Add New Document Type wizard

As with text identifiers, the default matching strategy for a landmark value is Contains. Other options for matching text are Exact, Closest, RegEx, StartsWith, and EndsWith. By default, the PdfConnector matches the first occurrence of the text that it encounters to use as the landmark for the text value. In the Advanced Landmark Identification section, you adjust the occurrence value if your solution requires you to use a different instance of that text. For example, a customer completes an Emergency Contacts document and returns it for processing. The document has multiple fields for First Name, and each occurrence represents a different person to add as an emergency contact. You add multiple landmark values and adjust the occurrence to match for each landmark value to create individual data values you access in the automation.


This Topic is available in the following Module:

If you are having problems with your training, please review the Pega Academy Support FAQs.

Did you find this content helpful?

Want to help us improve this content?

We'd prefer it if you saw us at our best.

Pega Academy has detected you are using a browser which may prevent you from experiencing the site as intended. To improve your experience, please update your browser.

Close Deprecation Notice