Here we will use the Robot framework and the connectionist machine learning methods in a case study for a consumer credit company.
For obvious reasons training datasets with actual data in this area are not available. However, there are some datasets in related areas that we can combine to create semi-synthetic data for our robots to work on.
We combine the following datasets for our sample application:
This will provide a moderately realistic scenario for robot automation.
The credit management application will be designed in Flask with a customer service interface that shows
from robotmagic import robot
The application needs to
In order to apply our machine learning model we need the financial data for the last 6 months; the credit card dataset provides this data for April-September 2005.
In order so model our application on a realistic scenario we implement a separate payment management application that provides the transactional data (billing, payment, and status) per month but does not include the client master data (credit limit, age..).
For this case study let's assume that
☆ Note that this type of separation happens quite often in practical settings; e.g. the grades for courses are computed by the teaching support system; however, there is no connection to the administrative system for entering those grades into the student records -- this has to be done by manual file download and import.
Again we rely on Python package Pandas and the Sqlite API; the creation of the montly billing record will also be done in Python script.
The listing will be provided via the application web site:
from flask import Flask, render_template, request
import sqlite3
app = Flask(__name__)
def getconn():
return sqlite3.connect("billing.db")
@app.route("/")
def billing():
return """<h1>Billing App</h1>
<ul>
<li><a href=getbill>Get Billing Data</a></li>
<li><a href=todo>Enter Billing Data </a></li>
</ul>"""
@app.route("/getbill")
def getbill():
return """<h1>Get Billing Data</h1>
<form action=listbill method=POST>
<table>
<tr><td>Client ID:<td><input type=text name=cid>
</table><input type=submit value=OK></form>"""
@app.route('/listbill', methods=['POST'])
def listbill():
cid = request.form['cid']
conn = getconn()
cur = conn.cursor()
rows = cur.execute("select cid, year, month, bill, pay, stat "
+ " from bildat where cid = ? order by year, month", (cid,))
html = "<h3>Billing Data</h3><table>\n"
html += "<tr><th>CID<th>Year<th>Month<th>Bill<th>Pay<th>Status</tr>\n"
for row in rows:
html += ("<tr>"
+ "<td align=right>%d"
+ "<td align=right>%d"
+ "<td align=right>%d"
+ "<td align=right>%.2f"
+ "<td align=right>%.2f"
+ "<td align=right>%d\n") % row
return html + "</table>\n"
conn.close()
We run our billing application on the local host just like the credit app, but we need to use a different port number. Open another terminal and enter the following command:
gunicorn -b localhost:8090 --reload --access-logfile - billing:app
Of course we start testing right away:
%%robot
*** Settings ***
Library SeleniumLibrary
*** Variables ***
${LOGIN URL} http://localhost:8090/
${BROWSER} Firefox
*** Test Cases ***
Valid Home Page
Open Browser ${LOGIN URL} Firefox
Page Should Contain Billing App
Valid Form
Click Link //a[@href="getbill"]
Page Should Contain Get Billing Data
Valid Listing
Input Text //input[@name="cid"] 1
Click Element //input[@type="submit"]
Page Should Contain Billing Data
Element Should Contain //table//tr/td[1] 1
Element Should Contain //table//tr/td[2] 2005
Element Should Contain //table//tr/td[3] 4
# XPath: td after another td with given content
Element Should Contain //table//tr/td[preceding::td[text()='3102.00']] 689.00
[Teardown] Close Browser
We generate customer records and corresponding records for billing and payment from the credit card default dataset.
!wget -nc https://archive.ics.uci.edu/ml/machine-learning-databases/00350/default%20of%20credit%20card%20clients.xls
import pandas as pd
import numpy as np
df = pd.read_excel('default of credit card clients.xls', skiprows=(1))
df = df.rename(columns={'ID':'CID', 'default payment next month':'DEF', 'PAY_0':'PAY_1'})
df.head()
Let's check the size of the dataset:
len(df)
We could now use our robot interface to insert billing, payment, and status records for each customer: however, since we are looking at many thousands of such records the robot interface is not feasible -- the performance is just nowhere near sufficient. Instead, we
This sounds complicated, but it's actually very fast, maybe even the fastest way to achieve this.
import sqlite3
data = np.asarray(df)
conn = sqlite3.connect("credit.db")
cur = conn.cursor()
cur.execute("drop table if exists client")
conn.commit()
fout = open("tmpdat.csv", "w")
fout.write("id,lim,sex,edu,mar,age\n")
for x in data:
fout.write("%d,%d,%d,%d,%d,%d\n" % tuple(x[:6]))
fout.close()
pd.read_csv("tmpdat.csv").to_sql("client", conn, index = False)
conn.commit()
conn = sqlite3.connect("billing.db")
cur = conn.cursor()
cur.execute('drop table if exists bildat')
conn.commit()
fout = open('tmpdat.csv', 'w')
fout.write('cid,year,month,bill,pay,stat\n')
for x in data:
for j in range(6):
cid = x[0]
year = 2005
month = 4 + j # start at April
stat = x[6 + j] # skip 6 columns
bill = x[12 + j]
pay = x[18 + j]
fout.write('%d,%d,%d,%d,%d,%d\n' % (cid, year, month, bill, pay, stat))
fout.close()
pd.read_csv('tmpdat.csv').to_sql('bildat', conn, index = False)
conn.commit()
Check DB contents: do we have 6 monthly entries per client?
for row in cur.execute('select * from bildat where cid = 2'):
print(row)
Here we create cases for complaints from the consumer complaints dataset and connect them to the credit card user records.
The service application interface uses RPA to query the customer data and integrate the results with the default estimation from the machine learning module.
Given a client ID we want to
Let's add the following to our mytools.py library:
def predict_default(cid, billdat):
np.set_printoptions(formatter={'all':lambda x: '%6.2f' % x})
cur = sqlite3.connect("credit.db").cursor()
credat = cur.execute("select lim, sex, edu, mar, age from client where id = "
+ cid).fetchone()
inp = np.asarray([[ float(x) for x in credat ] + [ float(x) for x in billdat ]])
clf, Xmax = pickle.load(open('ccd.pkl', 'rb'))
pred = clf.predict_proba(inp[:1] / Xmax)
fout = open('tmpdat.html', 'w')
fout.write('<p><b>Client ' + cid + '</b>\n'
+ '<p> Client Data: ' + str(credat) + '\n'
+ '<p> Billing Data: ' + str(billdat) + '\n'
+ '<p> Prediction Pay/Default: %f %f\n' % tuple(pred[0]))
fout.close()
Now we are ready to use the Robot Framework for accessing the Billing data of a given client.
To make this flexible with the client ID as a parameter we could e.g.
In this case the layout of the billing data is well-known and fixed so we can use indexing in the HTML table.
The FOR IN loop makes the code somewhat more elegant. Do NOT write 6 x 3 almost identical statements using Get WebElement.
Note the multiple spaces between each word in the FOR line; if we only leave one blank between FOR and ${i} this would count as single word in the Robot Framework syntax and produce an error. This really takes some getting used to.
Are you wondering about the nested loop? Compare the format of the training data file with the layout of the billing data web site:
Another typical case as of a practical situation: the data does not really need to be processed, just re-arranged.
☆ Be very, very careful about these situations: errors do not necessarily show immediately.
Append To List lets us create a list of values. Note the somewhat strange initialisation in the Variables section: no value for empty list.
%%robot
*** Settings ***
Library SeleniumLibrary
Library Collections
Library mytools.py
*** Variables ***
${URL} http://localhost:8090/
${BROWSER} Firefox
@{l}=
${cid}= 100
*** Test Cases ***
Predict Client Default
Open Browser ${URL} Firefox
Click Link //a[@href="getbill"]
Input Text //input[@name="cid"] ${cid}
Click Element //input[@type="submit"]
FOR ${j} IN 6 4 5
FOR ${i} IN 2 3 4 5 6 7
${s}= Get WebElement //table//tr[${i}]/td[${j}]
Append To List ${l} ${s.text}
END
END
Predict Default ${cid} ${l}
[Teardown] Close Browser
The HTML file contains the result of the prediction by the machine learning module.
We could invest in some more elaborate layout, and connect to the rest of the application in various ways, such as providing a link for the result of the analysis in the Credit App.
from IPython.display import HTML
HTML(filename="tmpdat.html")
Now it's your turn to create a stunning and inspiring application of robot process automation using machine learning!
The possibilities are endless. Set yourself a challenge -- make it something to be proud of.