Enhancing entity extraction with machine learning
5 Tasks
25 mins
Scenario
The U+ Air chatbot channel detects U+ Air ticket numbers using a RUTA-based model. The current entity model recognizes a ticket number in the following format: two letters followed by three digits (for example, WZ266 or AA132). In case of a rescheduled ticket or human error, the application also needs to detect unusual ticket number patterns. Running a RUTA script with machine learning can achieve this business outcome.
As a data scientist, enhance entity extraction with a machine learning model to satisfy the business requirements.
Use the following credentials to log in to the exercise system:
Role | User name | Password |
---|---|---|
Data Scientist | NLPDataScientist | rules |
Application Developer | NLPApplicationDeveloper | rules |
Your assignment consists of the following tasks:
Task 1: Test the text extraction in the chatbot for unusual ticket number
As an application developer, test the entity extraction for ticket numbers different than the default pattern (two letters followed by three digits). Test the chatbot for the following message: I want to cancel my ticket number WZ-266.
Task 2: Train the machine learning entity extraction model by record
As a data scientist, add training data manually by record to train the ticket_number entity extraction machine learning model so that it uses machine learning to detect entities.
Task 3: Train the machine learning entity extraction model with a dataset
Import the Airlines_entity_dataset.xlsx dataset as training data to the ticket_number entity extraction model. Train the entity model with all the available data.
Task 4: Test the entity extraction
Test the entity extraction model, and then observe the results.
Task 5: Test the text extraction in the chatbot after building the model
As an application developer, test the entity extraction model, and then observe the results.
Challenge Walkthrough
Detailed Tasks
1 Test the text extraction in the chatbot for unusual ticket number
- On the exercise system landing page, click Launch Pega Infinity™ to log in to App Studio.
- Log in to App Studio as an application developer:
- In the User name field, enter NLPApplicationDeveloper.
- In the Password field, enter rules.
- In the navigation pane of App Studio, click Channels to view the list of current channels.
- In the Current channel interfaces section, click the icon that represents your existing Airline Digital Messaging channel.
- In the Preview console on the right, in the Type your message here text box, enter I want to cancel my ticket number WZ-266 to test the chatbot.
- Turn on the Show analysis switch to see the details.
- Click Yes.
- Confirm that the chatbot detects the cancel ticket topic with high confidence and runs a preconfigured Cancel a ticket case type but does not recognize the ticket number due to an unusual pattern. The chatbot requests the ticket number even though it is provided in the first message.
- In the lower-left corner, click the user icon, and then select Log off to log out of App Studio.
2 Train the machine learning entity extraction model by record
- Log in to Prediction Studio as a data scientist:
- In the User name field, enter NLPDataScientist.
- In the Password field, enter rules.
- On the Predictions landing page, click Airline to open the prediction workspace.
- Click the Entities tab to view the list of entities.
- In the ticket_number row, click the Gear icon to configure the machine learning data.
- In the ticket_number dialog box, click Add training data to add a new piece of data to the ticket_number entity.
- In the text box, enter I want to cancel my reservation for ticket number JK-294.
- Click Add.
- On the right, in the preview pane, select JK-294.
- Right-click JK-294, and then select #ticket_number.
- Click Save.
- In the ticket_number row, in the Total training data column, click 0 and confirm that there is 1 reviewed feedback, as shown in the following figure:
3 Train the machine learning entity extraction model with a dataset
- Download the Airlines_entity_dataset.xlsx dataset.
- Open, and then inspect the downloaded dataset:
- Confirm that the entity in the training data is <START:ticket_number> LO-127 <END>.
Note: The ticket number has a single space in front of the two letters, and double space after the three numbers.
- Close the Airlines_entity_dataset.xlsx file.
- Confirm that the entity in the training data is <START:ticket_number> LO-127 <END>.
- In the list of entities, in the ticket_number row, click the Gear icon.
- In the ticket_number dialog box, click Upload to open the dataset upload dialog box.
- Click Choose File, and then select the Airlines_entity_dataset.xlsx file.
- Click Upload.
- Click Choose File, and then select the Airlines_entity_dataset.xlsx file.
- In the ticket_number dialog box, confirm that the status of the newly added entities is Reviewed, as shown in the following figure:
Note: There are seven pages of new training data to view. You can inspect and edit every record similarly to manually-added training data.
- Click Save.
- In the ticket_number dialog box, click Upload to open the dataset upload dialog box.
- Note the new available reviewed feedback:
- In the upper-right corner of the Airline prediction workspace, click Build to build the model.
- In the Build models dialog box, select the Airline_entities model:
- Click Build to build the model.
Note: Building process may take up to few minutes. You will see a green information ribbon at the top of the Prediction window after the process is completed. If the green information ribbon does not appear after few minutes, In the top right of the prediction window click Actions > Refresh.
- Once the build is completed, in the top of the prediction window, click View build report.
- In the Model training report window, review the build result.
- Click Close to close the Model training report window.
4 Test the entity extraction
- In the upper-right corner of the Airline prediction workspace, click Test.
- In the Test prediction dialog box, in the text box, enter I want to cancel my ticket number WZ-266.
- Click the Entity tab.
- Confirm that the test correctly identifies WZ-266 even though it is not part of the training dataset.
- In the Test prediction dialog box, enter I want to cancel my ticket AAL325.
- Confirm that the test correctly identifies AAL325 as a ticket number even though it is not part of the training dataset.
- Close the Test prediction dialog box.
- In the Test prediction dialog box, in the text box, enter I want to cancel my ticket number WZ-266.
- In the upper-right corner of the Airline prediction workspace, click Save.
- In the lower-left corner, click the user icon, and then select Log off to log out of Prediction Studio.
5 Test the text extraction in the chatbot after building the model
- Log in to App Studio as an application developer:
- In the User name field, enter NLPApplicationDeveloper.
- In the Password field, enter rules.
- In the navigation pane of App Studio, click Channels to view the list of current channels.
- In the Current channel interfaces section, click the icon that represents your existing Airline Digital Messaging channel.
- In the Preview console on the right, in the Type your message here text box, enter I want to cancel my ticket number WZ-266 to test the chatbot.
- Turn on the Show analysis switch to show the details.
- Confirm that the chatbot recognizes WZ-266 as a ticket number.
- In the upper-right corner of the Preview console, click Reset.
- In the Type your message here text box, enter I want to cancel my ticket number AAL325 to test the chatbot.
- Confirm that the chatbot recognizes AAL325 is recognized as a ticket number.
- Click Yes to initiate the routing of the case.
This Challenge is to practice what you learned in the following Module:
Available in the following mission:
If you are having problems with your training, please review the Pega Academy Support FAQs.
Want to help us improve this content?