Case Study: Personal Finance¶

Here we will use the Robot framework and the connectionist machine learning methods in a case study for a consumer credit company.

For obvious reasons training datasets with actual data in this area are not available. However, there are some datasets in related areas that we can combine to create semi-synthetic data for our robots to work on.

We combine the following datasets for our sample application:

credit card billing and payment data (credit card default dataset)
verbal communications of customers (consumer complaints dataset; optional)

This will provide a moderately realistic scenario for robot automation.

The credit management application will be designed in Flask with a customer service interface that shows

past payments and highlights the risk estimated by the machine learning module for an individual customer to default
ranks cases as initialized by customer complaints based on the default probability

Flask Application for Credit Management¶

The application needs to

import client records
create billing and payment records for clients and periods (calendar months):
- billing amounts BILL_AMT[1-6]
- payment amounts PAY_AMT[1-6]
- payment status PAY_[1-6]
list client data
list billing and payment data for individual clients

In order to apply our machine learning model we need the financial data for the last 6 months; the credit card dataset provides this data for April-September 2005.

In order so model our application on a realistic scenario we implement a separate payment management application that provides the transactional data (billing, payment, and status) per month but does not include the client master data (credit limit, age..).

Assumptions¶

For this case study let's assume that

we have full control over the client credit app and its Sqlite DB file
but not over the billing data: we have to get that via the web interface of the billing app

☆ Note that this type of separation happens quite often in practical settings; e.g. the grades for courses are computed by the teaching support system; however, there is no connection to the administrative system for entering those grades into the student records -- this has to be done by manual file download and import.

Again we rely on Python package Pandas and the Sqlite API; the creation of the montly billing record will also be done in Python script.

The listing will be provided via the application web site. Put the code into a file named e.g. billing.py and run it in a separate terminal.

from flask import Flask, render_template, request 
import sqlite3

app = Flask(__name__)

def getconn():
    return sqlite3.connect("billing.db")

@app.route("/")
def billing():
    return """<h1>Billing App</h1>
    <ul>
    <li><a href=getbill>Get Billing Data</a></li>
    <li><a href=todo>Enter Billing Data </a></li>
    </ul>"""

@app.route("/getbill")
def getbill():
    return """<h1>Get Billing Data</h1>
    <form action=listbill method=POST>
    <table>
    <tr><td>Client ID:<td><input type=text name=cid>
    </table><input type=submit value=OK></form>"""
    
@app.route('/listbill', methods=['POST'])
def listbill():
    cid = request.form['cid']
    conn = getconn()
    cur = conn.cursor()
    rows = cur.execute("select cid, year, month, bill, pay, stat "
        + " from bildat where cid = ? order by year, month", (cid,))
    html = "<h3>Billing Data</h3><table>\n" 
    html += "<tr><th>CID<th>Year<th>Month<th>Bill<th>Pay<th>Status</tr>\n"
    for row in rows:
        html += ("<tr>" 
        + "<td align=right>%d"
        + "<td align=right>%d"
        + "<td align=right>%d"
        + "<td align=right>%.2f"
        + "<td align=right>%.2f"
        + "<td align=right>%d\n") % row
    return html + "</table>\n"
    conn.close()

app.run(host='localhost', port=8090, debug=True, use_reloader=True)

We run our billing application on the local host while also running the credit app, so we need to use a different port number.

Now we can define some tests:

*** Settings ***
Library             SeleniumLibrary

*** Variables ***
${LOGIN URL}      http://localhost:8090/
${BROWSER}        Firefox

*** Test Cases ***
Valid Home Page
    Open Browser    ${LOGIN URL}    Firefox
    Page Should Contain    Billing App
       
Valid Form
    Click Link                 //a[@href="getbill"]
    Page Should Contain        Get Billing Data
    
Valid Listing
    Input Text                 //input[@name="cid"]             1
    Click Element              //input[@type="submit"]
    Page Should Contain        Billing Data
    Element Should Contain     //table//tr/td[1]    1
    Element Should Contain     //table//tr/td[2]    2005
    Element Should Contain     //table//tr/td[3]    4 
    # XPath: td after another td with given content
    Element Should Contain     //table//tr/td[preceding::td[text()='3102.00']]    689.00
    [Teardown]      Close Browser

Import Credit Card Dataset into Flask Application¶

We generate customer records and corresponding records for billing and payment from the credit card default dataset.

In [4]:

!wget -nc https://archive.ics.uci.edu/ml/machine-learning-databases/00350/default%20of%20credit%20card%20clients.xls

File ‘default of credit card clients.xls’ already there; not retrieving.

In [5]:

import pandas as pd
import numpy as np
df = pd.read_excel('default of credit card clients.xls', skiprows=(1))
df = df.rename(columns={'ID':'CID', 'default payment next month':'DEF', 'PAY_0':'PAY_1'})
df.head()

Out[5]:

	CID	LIMIT_BAL	SEX	EDUCATION	MARRIAGE	AGE	PAY_1	PAY_2	PAY_3	PAY_4	...	BILL_AMT4	BILL_AMT5	BILL_AMT6	PAY_AMT1	PAY_AMT2	PAY_AMT3	PAY_AMT4	PAY_AMT5	PAY_AMT6	DEF
0	1	20000	2	2	1	24	2	2	-1	-1	...	0	0	0	0	689	0	0	0	0	1
1	2	120000	2	2	2	26	-1	2	0	0	...	3272	3455	3261	0	1000	1000	1000	0	2000	1
2	3	90000	2	2	2	34	0	0	0	0	...	14331	14948	15549	1518	1500	1000	1000	1000	5000	0
3	4	50000	2	2	1	37	0	0	0	0	...	28314	28959	29547	2000	2019	1200	1100	1069	1000	0
4	5	50000	1	2	1	57	-1	0	-1	0	...	20940	19146	19131	2000	36681	10000	9000	689	679	0

5 rows × 25 columns

Let's check the size of the dataset:

In [6]:

len(df)

Out[6]:

Bulk Import Records via Pandas/Sqlite API¶

We could now use our robot interface to insert billing, payment, and status records for each customer: however, since we are looking at many thousands of such records the robot interface is not feasible -- the performance is just nowhere near sufficient. Instead, we

make a numpy array from the pandas dataframe
break the records into monthly entries and write to a CSV file
import the CSV file via Pandas into SQLite

This sounds complicated, but it's actually very fast, and in our Python code we have full control how we create the CSV files.

In [7]:

import sqlite3

data = np.asarray(df)

conn = sqlite3.connect("credit.db")
cur = conn.cursor()
cur.execute("drop table if exists client")
conn.commit()

fout = open("tmpdat.csv", "w")
fout.write("id,lim,sex,edu,mar,age\n")
for x in data:
    fout.write("%d,%d,%d,%d,%d,%d\n" % tuple(x[:6]))
fout.close()
pd.read_csv("tmpdat.csv").to_sql("client", conn, index = False)
conn.commit()

conn = sqlite3.connect("billing.db")
cur = conn.cursor()
cur.execute('drop table if exists bildat')
conn.commit()

fout = open('tmpdat.csv', 'w')
fout.write('cid,year,month,bill,pay,stat\n')
for x in data:
    for j in range(6): 
        cid = x[0]
        year = 2005
        month = 4 + j # start at April
        stat = x[6 + j] # skip 6 columns 
        bill = x[12 + j]
        pay =  x[18 + j]
        fout.write('%d,%d,%d,%d,%d,%d\n' % (cid, year, month, bill, pay, stat))

fout.close()        
pd.read_csv('tmpdat.csv').to_sql('bildat', conn, index = False)
conn.commit()

Check DB contents: do we have 6 monthly entries per client?

In [8]:

for row in cur.execute('select * from bildat where cid = 2'):
    print(row)

(2, 2005, 4, 2682, 0, -1)
(2, 2005, 5, 1725, 1000, 2)
(2, 2005, 6, 2682, 1000, 0)
(2, 2005, 7, 3272, 1000, 0)
(2, 2005, 8, 3455, 0, 0)
(2, 2005, 9, 3261, 2000, 2)

Optional: Generate Complaint Cases¶

Here we have the option to create cases for complaints from the consumer complaints dataset and connect them to the credit card user records. When we generate data we are in control of the parameters, e.g. the correlation between types of complaints and pay/default in the client billing data.

We can then apply the text encoding methods from the previous part to the complaints and connect them to the billing data:

train the pay/default forecast model to include the text data from complaints
apply the trained model to specific cases

☆ This is left as an exercise for those who desire an additional challenge!

Apply Machine Learning Module to Cases¶

The service application interface uses RPA to query the customer data and integrate the results with the default estimation from the machine learning module.

Given a client ID we want to

query the Credit App for the data of the client
query the Billing App for the payment and status data
apply our machine learning model to predict the default of this client
show the results in a reasonably nice layout

Let's add the following to our mytools.py library:

def predict_default(cid, billdat):
    np.set_printoptions(formatter={'all':lambda x: '%6.2f' % x})
    cur = sqlite3.connect("credit.db").cursor()
    credat = cur.execute("select lim, sex, edu, mar, age from client where id = "
                         + cid).fetchone()
    inp = np.asarray([[ float(x) for x in credat ] + [ float(x) for x in billdat ]])
    clf, Xmax = pickle.load(open('ccd.pkl', 'rb'))
    pred = clf.predict_proba(inp[:1] / Xmax)
    fout = open('tmpdat.html', 'w')
    fout.write('<p><b>Client ' + cid + '</b>\n'
        + '<p> Client Data: ' + str(credat) + '\n'
        + '<p> Billing Data: ' + str(billdat) + '\n'
        + '<p> Prediction Pay/Default: %f %f\n' % tuple(pred[0]))
    fout.close()

Now we are ready to use the Robot Framework for accessing the Billing data of a given client.

To make this flexible with the client ID as a parameter we could e.g.

dynamically generate Robot code files from a Python script (easy)
use this Robot Framework code in a larger test file as a module defined in the Keywords section (tricky)

Accessing the BIlling App Data¶

In this case the layout of the billing data is well-known and fixed so we can use indexing in the HTML table.

The FOR IN loop makes the code somewhat more elegant. Do NOT write 6 x 3 almost identical statements using Get WebElement.

Note the multiple spaces between each word in the FOR line; if we only leave one blank between FOR and ${i} this would count as single word in the Robot Framework syntax and produce an error. This really takes some getting used to.

Are you wondering about the nested loop? Compare the format of the training data file with the layout of the billing data web site:

in the training data we have pay (status) followed by bill amt and pay amt
in the billing web site we have bill, pay, status

Another typical case of a practical situation: the data does not really need to be processed, just re-arranged.

☆ Be very careful about these situations: errors do not necessarily show immediately.

Append To List lets us create a list of values. Note the somewhat strange initialisation in the Variables section: no value for empty list.

*** Settings ***
Library             SeleniumLibrary
Library             Collections
Library             mytools.py

*** Variables ***
${URL}              http://localhost:8090/
${BROWSER}          Firefox         
@{l}=
${cid}=             100

*** Test Cases ***
Predict Client Default
    Open Browser    ${URL}    Firefox
    Click Link                 //a[@href="getbill"]
    Input Text                 //input[@name="cid"]             ${cid}
    Click Element              //input[@type="submit"]
    FOR  ${j}  IN  6  4  5
        FOR  ${i}  IN  2  3  4  5  6  7
            ${s}=           Get WebElement    //table//tr[${i}]/td[${j}] 
            Append To List  ${l}  ${s.text} 
        END
    END
    Predict Default    ${cid}  ${l}
    [Teardown]      Close Browser

The HTML file contains the result of the prediction by the machine learning module.

In [9]:

from IPython.display import HTML

HTML(filename="tmpdat.html")

Out[9]:

Client 100

Client Data: (20000, 1, 2, 1, 38)

Billing Data: ['0', '0', '0', '0', '0', '-1', '17973.00', '19367.00', '19559.00', '18240.00', '17928.00', '150.00', '1699.00', '1460.00', '626.00', '1750.00', '150.00', '0.00']

Prediction Pay/Default: 0.469680 0.530320

☆ We now have robot code to perform the pay/default forecast which we run from the command line, but we could connect this feature to the rest of the application in various ways, such as

provide a link for the result of the analysis, e.g. add a forecast link in the client listing
start the robot when the link is clicked, e.g. using os.system() when running on localhost
include the forecast results directly in the response to the forecast link by reading and returning the content from tmpdat.html
make the solution multi-user (e.g. filenames based on time, client id, ..)

Your Own Project¶

Now it's your turn to create an inspiring application of robot process automation using machine learning!

Set yourself a challenge!