# Robotic Process Automation¶

Johann Mitloehner, 2022-10-13

## Definition¶

Robotic Process Automation (RPA) allows businesses to automate tasks that are typically carried out by employees.

• In contrast, RPA emulates human interaction with software systems, e.g. by using the web interface of an application instead of its API.

RPA is usually aimed primarily at repetitive and tedious tasks in order to

• free people for more intellectually satisfying work, or
• simply cut personel cost.

In order to mimick human interaction with applications RPA needs to

• work with the user interface provided by the system, such as
• enter data in web forms
• 'understand' the system responses, such as
• extract specific parts from the content of a web page
• assign meaning to those parts, such as success or failure of login
• perform actions corresponding to specific situations
• usually pre-defined in a programmatic way, but increasingly
• involving machine learning
• maybe interact with more than one system
• extract and process data from one system
• transfer data via a different interface to another system

#### Workflow Automation¶

This type of automation is usually understood to involve software that accesses the back-end of a system through its API (application programming interface). In contrast, RPA accesses the system via the front end, usually a GUI (graphical user interface), closely mimicking the human/computer interaction.

Since not all systems provide an API but need to provide a frontend for human users, there are situations where RPA is the only feasible approach to automation.

#### GUI Testing Tools¶

These are aimed at performing system testing with pre-defined test cases and expected results. The focus is usually not on interacting with multiple applications; however, sophisticated testing tools like the robot framework which will be used here allow for extracting data and processing it in more than one application.

#### Robot Software¶

This type of software is usually understood to control physical robots. RPA involves software robots i.e. programs that are not controlling a physical robot but perform actions by interacting with other software systems, not the physical world.

#### Artificial Intelligence¶

A confusing term that is hard to define, since even natural intelligence remains an ellusive concept. We will use the less misleading term machine learning instead to refer to a type of decision making that sometimes seems like AI but is really just another type of software.

#### Machine Learning¶

This usually refers to software that 'learns' from observation i.e. from data providing instances of situations and actions, e.g.

• credit card application data and corresponding decisions such as grant/reject
• measurements of petals and the corresponding species of Iris flowers (a classic dataset from 1936)

The system is usually understood to work on data from a large number of cases in an adaptive manner -- often, but not necessarily, working iteratively through the cases. It will (hopefully) adjust its behaviour towards optimal decisions, usually defined by minimising an error function.

Various approaches are used; connectionist models, especially deep learning neural nets, are currently very much in favour because of spectacular successes, particularly in image processing. However, there are a number of much simpler yet still useful approaches. Some of them will be discussed here, as it is feasible to apply them in RPA with limited resources in terms of time and coding skill.

#### Chatbots¶

RPA is usually aimed at back-office tasks while the automated servicing of customer request is increasingly offered using chatbot technology. Since chatbots automate front-office tasks they are often not seen as RPA; however, using the above definition chatbots qualify as RPA, since they are software robots that automate tasks otherwise carried out by employees.

### Benefits of RPA¶

There is always a lot of hype surrounding new concepts in management and technology. The following list of RPA benefits is somewhat conservative and also a little critical; just some food for thought.

• Cost savings. When robots do the work of people we can cut personel cost. Euphemistically this is described as freeing time for more creative work.

• Resilience. While the human workforce is limited there can always be more software robots, the only limit being the performance of the computing hardware. Therefore, if demand suddenly increases, the robot army is instantly ready.

• Accuracy. Once a process is defined any errors left are due to design, not the software robots. Humans make mistakes, particularly in tedious tasks; (software) robots do not. Physical robots are a different matter and can make disastrous mistakes.

• Compliance. An automated process can be defined so as to be fully compliant to some regulations at all times. Humans might make exceptions that can lead to trouble; robots make no exceptions, and no trouble.

• Productivity. When measured in terms of input/output relation robots are hard to beat, especially software robots who work 24/7 without any wear and tear. If needed they can be replicated at practically zero additional cost.

• Employee happiness. When freed from tedious tasks people can (in theory) do more creative and fulfilling things, and that may well make them happier (at least those who still have a job).

### RPA as Enterprise Software¶

This list is similar to the previous one but focuses on the development and deployment of software, which is often a huge and risky project. Fortunately, RPA is somewhat different from typical enterprise software projects:

• No Disruption. The RPA only uses the front-end of the system, so any problems caused must have been present already in day-to-day operations by human users (and have hopefully been wiped out). Deployment of RPA is less likely to cause disruption of service, whereas API-based process automation can access functions not available to users, or in a manner not possible when using the front-end GUI, and thereby causing unforeseen problems.

• Scalable. RPA software robots not only work faster than humans, they also run continously 24/7 all year long, thereby easily meeting with increased demand; and since they are just software programs we can have more than one instance running on one or more computers at the same time. The only limit is the performance of the computing hardware.

• Small Investment. Obviously this depends on the project. However, as we will see, at least some simple RPA projects can be cheap yet useful.

• Quick ROI. A simple RPA project can start generating return on investment relatively quickly since development and deployment tend to be less problematic compared to a similar process automation project based on API programming.

### Downsides of RPA¶

These are compared to traditional automation based on API programming i.e. using the back-end of the system. Most of these problems do not seem overwhelming or unsolvable, though.

• New Approach. Developers need to learn new methods, and in the case of the robot framework also a new language.

• Performance. Compared to API-based automation the RPA approach will tend to be slower, maybe even so much slower that it is not feasible for a particular project.

• Front-End Limitations. Remember that the front-end was not designed to be used by robots, but by humans. Problems may arise that were never faced before, because human users would know how to handle exceptional situations while robots just shamble on -- remember the paint robots in car factories that turn on each other

• Citizen Developers. Business units can now develop bots using simple end-user tools and without the need for support by an IT team, or any involvement (or even knowledge) of the IT department. This can be seen as a benefit or a nightmare, depending on where you stand.

### Examples¶

In the following we will look at examples of RPA in the following areas:

• Testing. While test suites can of course be run via the API (if one is available) there may be subtle differences to actual user interaction that are not easy to cover reliably. Using RPA and the GUI closely mimicks human interaction and (hopefully) bypasses those problems.

• Web Scraping. Since the Robot Framework uses XPath to access elements in HTML documents, and also allows for very simple integration of custom Python code in Robot test case files, it is easy to automatically extract content from web pages for further processing, also known as web scraping.

• Customer Service. Many customer requests fall into one of very few categories and are therefore prime candidates for automation. This is an area that applies concepts from chatbots and machine learning. We will look at a simple case study using robot testing for automation and open datasets for machine learning.

## Getting Started with the Robot Framework¶

The Robot Framework is available at robotframework.org. Its main purpose is automated testing; however, it is based on the Selenium library which can be used for general process automation in the interaction with web servers.

There are various types of testing frameworks; the Robot Framework uses keyword-driven testing: the idea is that the keywords

• describe the actions that need to be performed without too much detail
• are independent of the test framework being used

The approach somewhat resembles pseudo-code in algorithm design; it can be used for both manual and automated testing.

The following examples provide an introduction to the approach. The Robot Framework is written in Python, and we need to install some packages, and maybe Python itself as well.

### Installation¶

The installation can be tricky..

The robotframework github site has detailled installation instructions for various operating systems.

When you install Python from the official source python.org make sure to check the little box "Add Python to the Path".

Adding python to the PATH means that you can start python on the command line. However, we also need the web driver scripts in the PATH. If you see something like

Driver copied to: C:\Users\ramen\bin\geckodriver.exe

WARNING: Path 'C:\Users\ramen\bin' is not in the PATH environment variable.



you need to add that users directory to the PATH. Exactly how this works depends on your operating system.

For related problems:

##### Python in Linux¶

It is recommended to use Python in one of the popular Linux distributions where it comes with the rest of the system. Many Linux desktop components rely on Python, so usually both Python 2 and Python 3 are part of the distribution.

Linux can easily be installed alongside an already existing operating system via dual boot i.e. you choose which system to use in this session when you start up your computer. The installation will take maybe half an hour or so, but it can save a lot more time and frustration later. All you need is a USB stick and at least about 20 GB of free space on your drive. Download the installation image, put it on your stick, and boot from that. Your favorite Linux distribution web site has all the details; this author prefers Mint, but there are many others, see distrowatch.

##### Python in Other Operating Systems¶

Python comes in a number of distributions for various needs and operating systems:

• The primary source is python.org. This is the standard and reference CPython implementation, and it works perfectly for our purposes; it uses the pip installer which you see in all the examples.

• Another option is the Anaconda distribution which uses its own installer conda instead of pip. This comes bundled with heaps of software, including flask (but not robotframework). Work with this if you have Anaconda already installed.

Once you have Python running you should be able to use pip (or conda) to install additional Python packages, and everything should work just fine. Fingers crossed, knock on wood.

☆ Depending on your distribution/setup, you may have to use the command py instead of python to run Python scripts.

##### Python 3 vs 2¶

Sadly the Python developers made a decision years ago to make the new Python 3 incompatible with the older version 2. The differences are few and small, but still more than enough to cause trouble. We will continue to suffer the consequences for many years to come.

We are using Python 3 here. Depending on your distribution and operating system this may be standard; however, make sure that when you enter on the command line

python

you actually get the Python 3 interpreter prompt, not the older Python 2. Leave the interactive interpreter by entering ctrl-d

You see python3 in all examples here since this gives us Python 3 on Linux. Otherwise, depending on your configuration, you might get Python 2.

Depending on your distribution and operating system you may not have a python3 command, so you have to use python instead.

☆ On Linux there are both versions available, since many system/desktop components depend on Python 2. Do not remove Python 2 from your Linux system.

#### Pip - Package Installer for Python¶

Open a terminal window and enter the following statements on the command line. This only needs to be done once for our setup.

First we make sure to have the current version of the package installer pip:

python3 -m pip install --upgrade pip



The pip module should be part of your Python distribution; otherwise you will get an error and you have to install pip: on Debian-based Linux systems enter

sudo apt install python3-pip

Now we can install Python packages:

python3 -m pip install selenium
python3 -m pip install --upgrade robotframework-seleniumlibrary
python3 -m pip install webdrivermanager



Linux: Note the message after the webdrivermanager install about .local/bin not being in our PATH environment variable! We will need to fix this.

Current pip versions should automatically switch to user install when root permission is missing. If you get errors about permissions then add --user at the end of all the install commands, such as

python3 -m pip install selenium --user

#### Fix the PATH¶

Depending on your operating system environment you need to put .local/bin on our PATH. The following refers to Linux.

1. On the command line use the editor by entering the following:
    pico .bashrc

2. add another line at the end of the file (move the cursor to the end of the file, then press the Enter key for newline, and type or copy-paste the following):
    export PATH=$PATH:$HOME/.local/bin

3. Save the file and exit by using the key strokes: ctrl-O, Enter, ctrl-X
4. Do not close this terminal window!
5. Open a new terminal and continue work there.

If you get weird error messages then you messed up the .bashrc file. Go back to the first terminal, start the pico editor again, and fix the problem.

Now we can configure Firefox as our web browser: On the command line enter

${BROWSER} Firefox *** Test Cases *** Valid Home Page - Keywords Open Browser To Home Page Page Should Contain Credit App [Teardown] Close Browser *** Keywords *** Open Browser To Home Page Open Browser${URL}    ${BROWSER}  PASS Valid Home Page - Keywords  Things to note about the code above: • a keywords/argument format is used • default is two or more blanks for indent and separators • if blanks are inconvenient we can also use the pipe character | • start the line with | and then • use | with one space left and right for indent and separators • a trailing pipe at the end of the line is optional • in the Settings section we provide short documentation and request Selenium for web page processing • in the Variables section we define the URL and the browser • not strictly necessary, but good practise to do this here at the top, especially in longer files • the Test Cases section contains one or more cases • the first statement in that test case is defined in the Keywords section • the second statement contains the pre-defined keywords "Page Should Contain" • make sure to leave two or more blanks before the text to check for, otherwise • the Robot Framework will not recognise the text as a parameter and • try to treat it as another user-defined keyword phrase • The Teardown statement makes sure the browser windows is closed, instead of leaving it open • in the Keywords section we define what we mean by saying "Open Browser To Home Page" • we use the pre-defined phrase "Open Browser" • and supply it with the variables from above ### Expect to Fail¶ Let's try a test that we expect to fail. Use your text editor and put the following in a file t3.rob: In [21]: %%robot *** Settings *** Documentation Check for text in page, expect fail Library SeleniumLibrary *** Variables ***${LOGIN URL}      http://localhost:8080
${BROWSER} Firefox *** Test Cases *** Valid Home Page - Expect Fail Open Browser To Home Page Page Should Contain The Credit App [Teardown] Close Browser *** Keywords *** Open Browser To Home Page Open Browser${LOGIN URL}    ${BROWSER}  FAIL Valid Home Page - Expect Fail Page should have contained text 'The Credit App' but did not.  Note how we introduced a small change in the text: Our website says "Credit App", not "The Credit App" This test should fail. Our minimal app is performing its Hello function, but now we add some more features. We are slowly approaching a small application with database connection. Incremental development and testing will go side by side. Let's add a link to a list of clients. Note that at this point we do not yet need an actual procedure for listing anything, just the link to it. Use your text editor to change the content of the file credit.py to the following: from flask import Flask app = Flask(__name__) @app.route("/") def credit(): return """<h1>Credit App</h1> <p><a href=clients>Clients</a>"""  Remember the option --reload which we added to the unicorn web server command. Take a look the terminal window that runs the gunicorn web server: you should see a line that looks like this: ... [INFO] Worker reloading: /home/.../credit.py modified  The gunicorn web server has detected the change in the source file and restarted the web app. Our web app starts to get a little more elaborate. We now have a few distinct elements that we can check in our robot tests. We can use XPath expressions to find elements in the HTML and check their contents. Use your text editor to create a file t4.rob and put the following code into that file: In [22]: %%robot *** Settings *** Documentation Check shop page for header and links Library SeleniumLibrary *** Variables ***${LOGIN URL}      http://localhost:8080
${BROWSER} Firefox *** Test Cases *** Valid Home Page - XPath Check Home Page [Teardown] Close Browser *** Keywords *** Check Home Page Open Browser${LOGIN URL}    {BROWSER} Element Should Contain //h1 Credit App Element Should Contain //a Clients Element Should Contain //a[contains(text(), "Clients")] Clients  PASS Valid Home Page - XPath  Note that in the last line we need to supply the text twice although it is already obvious from the XPath expression. Run the robot again to see that the XPath expressions actually target the link as intended. robot t5.rob  The text output should show PASS. The test works as intended. Now we can move on the actually providing procedures for the functions, such as the list of clients. For that purpose, we chose a somewhat more elaborate route by introducing a database connection to our little sample app. ### SQLite Database Connection for the Shop App¶ To make our approach extensible and reasonably realistic we add more functions to the sample app instead of just providing toy examples. Playing in the sandbox can only get us so far. Fortunately there is a free open-source database management system that we can easily use in our sample app: SQLite. This DBMS is widely used since it is so easy to install and apply. It should be noted that it is also lacking in several respects, such as in terms of implementing standard SQL numeric data types. We happily accept these deficiencies since they do not bother us here (much). See the documentation for more details about SQLite2 and its use in Python3: The new version of our shop app contains two new routes: • initialise the DB table for the clients and insert some data • list the clients Our application is now a handsome size, make sure you do not miss anything when copy and paste into the file credit.py: from flask import Flask, render_template, request import sqlite3 app = Flask(__name__) def getconn(): return sqlite3.connect("credit.db") @app.route("/") def credit(): return """<h1>Credit App</h1> <ul> <li><a href=clients>Clients</a></li> <li><a href=newclient>New Client</a></li> <li><a href=initdb>Init DB</a></li> </ul>""" @app.route('/initdb') def initdb(): conn = getconn() cur = conn.cursor() cur.execute("drop table if exists client") cur.execute("create table client " + "(id int primary key, lim int, sex int, edu int, mar int, age int)") conn.commit() conn.close() return "DB initialized." @app.route('/clients') def clients(): conn = getconn() cur = conn.cursor() rows = cur.execute("select id, lim from client") html = "<h3>Clients</h3><table>\n" for row in rows: html += "<tr><td align=right> %d <td align=right> %.2f\n" % row return html + "</table>\n" conn.close() @app.route('/newclient') def newclient(): return """<h3>New Client</h3> <form action=insertclient method=POST> <table> <tr><td>Client ID:<td><input type=text name=id> <tr><td>Credit Limit:<td><input type=text name=lim> <tr><td>Sex:<td><input type=text name=sex value=1> <tr><td>Education:<td><input type=text name=edu value=1> <tr><td>Marriage:<td><input type=text name=mar value=1> <tr><td>Age:<td><input type=text name=age value=30> </table><input type=submit value=OK></form>""" @app.route('/insertclient', methods=['POST']) def insertclient(): id = request.form['id'] lim = request.form['lim'] sex = request.form['sex'] edu = request.form['edu'] mar = request.form['mar'] age = request.form['age'] conn = getconn() cur = conn.cursor() cur.execute('insert into client (id, lim, sex, edu, mar, age) ' + ' values (?, ?, ?, ?, ?, ?)', (id, lim, sex, edu, mar, age)) conn.commit() conn.close() return "Client inserted."  The SQLite3 connection module should be part of the Python3 distribution, we just need to import it. • we define a funtion getconn() to give a DB connection whenever we need it. SQlite stores the DB in a file in the current directory, as named in the connect() function. So, we should expect a file ending in ".db" after the initdb route is execute for the first time. • in the initdb route we use that getconn() function and get a cursor from the connection. With this cursor we can execute SQL statements. • we drop the table if it exists so we can run this code multiple times. Note that in this interface we do not end SQL statements with ";" • we create a simple table • we insert a few rows into our table. For the sake of simplicity we use integers for everything; SQLite does not worry about that at all, as we will see. • the commit() is necessary here since auto-commit is only default in interactive use. Without it all changes would be lost when the connection is closed: SQLite supports basic transaction logic. With the table initialised we can list its content: • again we get a cursor from the connection • for Select statements the execute() function returns the resulting rows • we use the printf() function to force two decimals in the currency values • without it we would see the actual float values as stored in the DB • we make a sporting attempt at presenting the content in a nice tabular layout • in the loop we go through the results and access the fields by index Now we can do quite a bit of testing! ### Including Python Code in Test Files¶ The following test will • insert a new client • list all clients and check for the newly inserted one A new client should have an ID and a credit limit; we can easily generate those in Python. We put our Python code into a file with the extension .py in the current directory. Let's call it mytools.py; create it with your text editor and put the following code into the file: import sqlite3 import random def get_new_client_id(): cur = sqlite3.connect("credit.db").cursor() row = cur.execute("select max(id)+1 from client").fetchone() return "%d" % row def get_new_client_limit(): return "%d" % (10000 * random.randint(1,5))  Do not try to run this file directly; it will not produce any useful results. It will work with the robot after the Library definition in the robot code file. The code above defines two functions for creating new values which are then available in our robot test files as user-defined phrases. The underscore character translates as blank in the robot code: get_new_client_id() becomes Get New Client Id get_new_client_limit() becomes Get New Client Limit Both functions return values which we can capture in the robot test file. ☆ Note that our simple solution is not thread-safe. SQLite DB files can be accessed be multiple processes; there is no locking for read access, but it employs database locking for write access. Two processes running at the same time will get the same value for max(id)+1 and therefore identical client records. The option AUTOINCREMENT set in the SQL table create statements should guarantee unique primary keys. Another clean solution here would be sequences. Sadly, SQLite does not support them. However, we could create a table in the DB init part: create table mycount(n int) insert into mycount values(0)  And then use the following code via the Python API to get a new number: update mycount set n = n + 1 select n from mycount  In theory, this solution should be thread-safe: • some process A starts the update • this should lock the DB • another process B wants to do the same update • since the DB is locked it has to wait • until the second statement in A has finished and the lock is released • new B can do its update and select A and B should always see different values in their select results. However, with respect to later development of the sample application we do not worry about this issue here, since we will bulk import external data. In the Settings section of the robot test file we use the keyword Library to include the code from our tools module. Now we can do a lot of testing! We need to call initdb first, otherwise everything else will fail, since the DB table would not yet exists. Let's put this into a file t6.rob: In [24]: %%robot *** Settings *** Documentation Check new client insert Library SeleniumLibrary Library mytools.py *** Variables ***{LOGIN URL}      http://localhost:8080
${BROWSER} Firefox${ID}             1

*** Test Cases ***
Open Browser    ${LOGIN URL}${BROWSER}

Valid Init DB
Go To           ${LOGIN URL} Click Link //a[@href="initdb"] Page Should Contain DB initialized. Valid Insert${LIM}=                  Get New Client Limit
Go To                   ${LOGIN URL} Click Link //a[@href="newclient"] Page Should Contain New Client Input Text //input[@name="id"]${ID}
Input Text              //input[@name="lim"]   ${LIM} Click Element //input[@type="submit"] Page Should Contain Client inserted Set Global Variable${LIM}

Valid Listing
Go To           ${LOGIN URL} Click Link //a[@href="clients"] Page Should Contain${ID}
Page Should Contain     ${LIM} [Teardown] Close Browser  PASS Valid Home Page PASS Valid Init DB PASS Valid Insert PASS Valid Listing  In this test file we have introduced several test cases; divide and conquer. To access the generated client ID in more than one test we make it global. Sadly this does not work in the Variables section as one would expect. Instead, we do this in the first test case. The code above contains some other new features: • the user-defined phrase Get New Client Limit returns a value which we put into a variabel LIM • note the dollar sign and curly braces with variable names • We do not (yet) use our Get New Client Id • we just initialized the DB; the table is empty, and the max(id) expression would result in 'None' • instead we define the client ID in the Variables section • the XPath expression //a[@href="newclient"] finds the first a element with an attribute href equal to "newclient" • we follow this link by using the pre-defined phrase Click Link • the pre-defined phrase Input Text finds the form elements and enters the values • click on the submit button and check the response • now we could go straight to the client listing, but instead • we go back to the start page and then • follow the link to the client listing, much like a human user would • Now we check for the name of the new client in the listing This test will take a little longer; we will probably be able to see the new entry briefly showing in the form fields and the listing. Performance is not the strong point of this type of automated testing. However, it is still much faster than human testers. Run the robot: robot t6.rob  and observe the results; you should see PASS for all tests. ### Tasks vs Tests¶ The Robot Framework can easily be used for robotic process automation. This can be made explicit by using the section header Tasks instead of Test Cases; everything else works in the same way as in tests. In order to facilitate our report processing later we will just continue to use "Test". We cannot use both tasks and tests in the same robot file. ### Arguments to User-Defined Keywords¶ When we create a task to initialise the DB and insert a few clients we do not want to go through all steps for inserting a new client again and again! DRY: Don't Repeat Yourself. Code duplication makes it harder to maintain code. It may well be the root of all evil (in software) We want to define a new user keywords with the required steps in one place and then use that code for repeated application, only supplying the necessary data in each call as arguments (parameters). Put the following into a file t7.rob: In [25]: %%robot *** Settings *** Documentation Init DB and insert some clients Library SeleniumLibrary Library mytools.py *** Variables ***${LOGIN URL}      http://localhost:8080
${BROWSER} Firefox *** Test Cases *** Init DB Open Browser${LOGIN URL}    ${BROWSER} Click Link //a[@href="initdb"] Page Should Contain DB initialized Insert Several Clients Insert Client id=1001 lim=4000 Insert Client id=1002 lim=8000 Insert Client id=2001 Insert Client id=2002 lim=6000 [Teardown] Close Browser *** Keywords *** Insert Client [Arguments]${id}        ${lim}=5000 Go To${LOGIN URL}
Page Should Contain        New Client
Input Text                 //input[@name="id"]     ${id} Input Text //input[@name="lim"]${lim}
Click Element              //input[@type="submit"]
Page Should Contain        Client inserted

PASS Init DB
PASS Insert Several Clients


In the Keywords section we use named arguments with default values. This allows us to call the user-defined phrase Insert Client in the Test section. We can supply all arguments, some, or none.

Run the robot:

robot t7.rob



and observe the results; again, all tests should PASS.

### XML Reports¶

The Robot Framework provides tools for generating summaries from the XML reports for each test; however, they are somewhat cumbersome, and it is easier and more flexible to go through the reports using the XML package of plain Python.

After some study of the structure of the XML files we find that the last status element in each test contains the overall status of the test; we can get that element with the expression findall("status")[-1]

Put the following code into a file xmlrep.py (identical to optional section on Automating the RPA above):

import xml.etree.ElementTree as ET
import sys

def report(fn):
for t in ET.parse(fn).getroot().iter('test'):
s = t.findall("status")[-1]
res = s.get("status") + ' ' + t.get("name")
if s.get("status") == "FAIL": res = res + ' -- ' + s.text
print(res)

if __name__ == '__main__': report(sys.argv[1])


Here the last line is executed when the complete file is run as a script (instead of just importing the report function):

Run it on the command line to check the file output.xml in the current directory:

python3 xmlrep.py output.xml



The output should look like this:

PASS Init DB
PASS Insert Several Clients


### Running Several Test Case Files¶

The Robot Framework features the concept of a Test Suite for this purpose; however, with our XML reporting tools already available it is more convenient to run individual test case files and then process the reports.

To run several robot tests and direct the output to different XML files for later analysis we can add the XML output file to the robot call.

Use your text editor to create a script that runs several tests at once and then summarizes all the XML reports.

Put the following code into a file mytests.sh:

robot -o o4.xml t4.rob
robot -o o5.xml t5.rob
robot -o o6.xml t6.rob
for file in o4.xml o5.xml o6.xml
do
echo $file python3 xmlrep.py$file
done


Run this script by entering the following on the command line:

bash mytests.sh

### Python built-in web server¶

If for some reason you do not want to use Flask and run a very simple local web server that serves just HTML pages then you do not need to write any Python code: just enter the following on the command line:

python3 -m http.server 8080 --bind 127.0.0.1



This will start a very basic web server with the current directory as root for all HTML files. Note that the port must be free at this time; if your flask server still runs on 8080 then the above command will not work.

### Optional: Extracting Robot Code from Notebook cells¶

This notebook (the one whose HTML version you are reading now) is by default saved as a JSON file, which means that it is relatively easy to process the contents of Jupyter notebook cells with our own Python code.

Here is a little script to

• extract the cells containing robot test code from this notebook
• execute the tests with the --quiet option
• summarize the results from the XML files
import json
import sys
import os

f = open(sys.argv[1], 'r')

n = 1
cells = data['cells']
for x in cells:
src = x['source']
typ = x['cell_type']
if typ == 'markdown' and len(src) > 1:
if src[1].startswith('%%robot'):
fout = open('tmp.rob', 'w')
fout.write( '\n'.join([ s[:-1] for s in src[1:-1] ]) )
fout.close()
os.system('robot --quiet -o tmp' + str(n) + '.xml tmp.rob')
n = n + 1
for i in range(n):
os.system('python3 xmlrep.py tmp' + str(i) + '.xml')


## Extracting Data from Web Pages -- Web Scraping¶

The Robot Framework can also be used for web scraping i.e. extracting data from web pages using

• XPath locators
• test case syntax, and
• user-defined Python functions.

The Robot Framework provides functions for file access. However, it is much more convenient and versatile to define our own in plain Python.

Add the following code to the mytools.py library file:

def create_my_file(fn):
fp = open(fn, 'w')
fp.close()

def append_my_file(fn, txt):
fp = open(fn, 'a')
fp.write(txt + '\n')
fp.close()


Now we can use these two new keywords in the following robot test file.

As a real-world practical example let's assume that we are interested in the titles of all Bond movies ever made (although the exact definition of that list is somewhat fuzzy), and after a bit of search we find a web site that has a relatively simple structure:

https://www.pocket-lint.com/tv/news/148096-james-bond-007-best-movie-viewing-order-chronological-release

After exploring the page source (right mouse button, View Page Source in Firefox) we can create our robot file:

*** Settings ***
Library             SeleniumLibrary
Library             mytools.py

*** Variables ***
${url} https://www.pocket-lint.com/tv/news/148096-james-bond-007-best-movie-viewing-order-chronological-release *** Test Cases *** Start Movie List Open Browser${url}    Firefox
Page Should Contain       James Bond movies
Create My File            list.txt

Iterate Through Movies
${elements}= Get WebElements //h3 FOR${element}    IN    @{elements}
Append My File    list.txt    \${element.text}
END


The robot code above

• uses the library mytools.py
• defines an URL with a conveniently simple HTML structure for our movie list: simply all level 3 headers
• in the first test case we create our output file
• in the second test case we
• find all h3 elements
• loop through the results and write each item to the file

## Exercises¶

• Further study the XPath syntax and examples, from sources such as

• Find external web pages with moderately complex structure

• Write more robot test files to check for elements and text content

• Specific challenges:

• relative paths
• position of elements
• wildcards
• Make sure not to call the robot too often; leave intervals of a minute or more between calls -- some websites apply automatic exclusion procedures when hit by too many requests from the same source.

In [ ]: