Case Study: Personal Finance

Here we will use the Robot framework and the connectionist machine learning methods in a case study for a consumer credit company.

For obvious reasons training datasets with actual data in this area are not available. However, there are some datasets in related areas that we can combine to create semi-synthetic data for our robots to work on.

We combine the following datasets for our sample application:

  • credit card billing and payment data (credit card default dataset)
  • verbal communications of customers (consumer complaints dataset; optional)

This will provide a moderately realistic scenario for robot automation.

The credit management application will be designed in Flask with a customer service interface that shows

  • past payments and highlights the risk estimated by the machine learning module for an individual customer to default
  • ranks cases as initialized by customer complaints based on the default probability
In [13]:
from robotmagic import robot

Flask Application for Credit Management

The application needs to

  • import client records
  • create billing and payment records for clients and periods (calendar months):
    • billing amounts BILL_AMT[1-6]
    • payment amounts PAY_AMT[1-6]
    • payment status PAY_[1-6]
  • list client data
  • list billing and payment data for individual clients

In order to apply our machine learning model we need the financial data for the last 6 months; the credit card dataset provides this data for April-September 2005.

In order so model our application on a realistic scenario we implement a separate payment management application that provides the transactional data (billing, payment, and status) per month but does not include the client master data (credit limit, age..).

Assumptions

For this case study let's assume that

  • we have full control over the client credit app and its Sqlite DB file
  • but not over the billing data: we have to get that via the web interface of the billing app

☆ Note that this type of separation happens quite often in practical settings; e.g. the grades for courses are computed by the teaching support system; however, there is no connection to the administrative system for entering those grades into the student records -- this has to be done by manual file download and import.

Again we rely on Python package Pandas and the Sqlite API; the creation of the montly billing record will also be done in Python script.

The listing will be provided via the application web site:

from flask import Flask, render_template, request 
import sqlite3

app = Flask(__name__)

def getconn():
    return sqlite3.connect("billing.db")

@app.route("/")
def billing():
    return """<h1>Billing App</h1>
    <ul>
    <li><a href=getbill>Get Billing Data</a></li>
    <li><a href=todo>Enter Billing Data </a></li>
    </ul>"""

@app.route("/getbill")
def getbill():
    return """<h1>Get Billing Data</h1>
    <form action=listbill method=POST>
    <table>
    <tr><td>Client ID:<td><input type=text name=cid>
    </table><input type=submit value=OK></form>"""

@app.route('/listbill', methods=['POST'])
def listbill():
    cid = request.form['cid']
    conn = getconn()
    cur = conn.cursor()
    rows = cur.execute("select cid, year, month, bill, pay, stat "
        + " from bildat where cid = ? order by year, month", (cid,))
    html = "<h3>Billing Data</h3><table>\n" 
    html += "<tr><th>CID<th>Year<th>Month<th>Bill<th>Pay<th>Status</tr>\n"
    for row in rows:
        html += ("<tr>" 
        + "<td align=right>%d"
        + "<td align=right>%d"
        + "<td align=right>%d"
        + "<td align=right>%.2f"
        + "<td align=right>%.2f"
        + "<td align=right>%d\n") % row
    return html + "</table>\n"
    conn.close()

We run our billing application on the local host just like the credit app, but we need to use a different port number. Open another terminal and enter the following command:

gunicorn -b localhost:8090 --reload --access-logfile -  billing:app

Of course we start testing right away:

In [14]:
%%robot

*** Settings ***
Library             SeleniumLibrary

*** Variables ***
${LOGIN URL}      http://localhost:8090/
${BROWSER}        Firefox

*** Test Cases ***
Valid Home Page
    Open Browser    ${LOGIN URL}    Firefox
    Page Should Contain    Billing App
       
Valid Form
    Click Link                 //a[@href="getbill"]
    Page Should Contain        Get Billing Data
    
Valid Listing
    Input Text                 //input[@name="cid"]             1
    Click Element              //input[@type="submit"]
    Page Should Contain        Billing Data
    Element Should Contain     //table//tr/td[1]    1
    Element Should Contain     //table//tr/td[2]    2005
    Element Should Contain     //table//tr/td[3]    4 
    # XPath: td after another td with given content
    Element Should Contain     //table//tr/td[preceding::td[text()='3102.00']]    689.00
    [Teardown]      Close Browser
PASS Valid Home Page
PASS Valid Form
PASS Valid Listing

Import Credit Card Dataset into Flask Application

We generate customer records and corresponding records for billing and payment from the credit card default dataset.

In [15]:
!wget -nc https://archive.ics.uci.edu/ml/machine-learning-databases/00350/default%20of%20credit%20card%20clients.xls
File ‘default of credit card clients.xls’ already there; not retrieving.

In [16]:
import pandas as pd
import numpy as np
df = pd.read_excel('default of credit card clients.xls', skiprows=(1))
df = df.rename(columns={'ID':'CID', 'default payment next month':'DEF', 'PAY_0':'PAY_1'})
df.head()
Out[16]:
CID LIMIT_BAL SEX EDUCATION MARRIAGE AGE PAY_1 PAY_2 PAY_3 PAY_4 ... BILL_AMT4 BILL_AMT5 BILL_AMT6 PAY_AMT1 PAY_AMT2 PAY_AMT3 PAY_AMT4 PAY_AMT5 PAY_AMT6 DEF
0 1 20000 2 2 1 24 2 2 -1 -1 ... 0 0 0 0 689 0 0 0 0 1
1 2 120000 2 2 2 26 -1 2 0 0 ... 3272 3455 3261 0 1000 1000 1000 0 2000 1
2 3 90000 2 2 2 34 0 0 0 0 ... 14331 14948 15549 1518 1500 1000 1000 1000 5000 0
3 4 50000 2 2 1 37 0 0 0 0 ... 28314 28959 29547 2000 2019 1200 1100 1069 1000 0
4 5 50000 1 2 1 57 -1 0 -1 0 ... 20940 19146 19131 2000 36681 10000 9000 689 679 0

5 rows × 25 columns

Let's check the size of the dataset:

In [17]:
len(df)
Out[17]:
30000

Bulk Import Records via Pandas/Sqlite API

We could now use our robot interface to insert billing, payment, and status records for each customer: however, since we are looking at many thousands of such records the robot interface is not feasible -- the performance is just nowhere near sufficient. Instead, we

  • make a numpy array from the pandas dataframe
  • break the records into monthly entries and write to a CSV file
  • import the CSV file via Pandas into SQLite

This sounds complicated, but it's actually very fast, maybe even the fastest way to achieve this.

In [18]:
import sqlite3

data = np.asarray(df)

conn = sqlite3.connect("credit.db")
cur = conn.cursor()
cur.execute("drop table if exists client")
conn.commit()

fout = open("tmpdat.csv", "w")
fout.write("id,lim,sex,edu,mar,age\n")
for x in data:
    fout.write("%d,%d,%d,%d,%d,%d\n" % tuple(x[:6]))
fout.close()
pd.read_csv("tmpdat.csv").to_sql("client", conn, index = False)
conn.commit()

conn = sqlite3.connect("billing.db")
cur = conn.cursor()
cur.execute('drop table if exists bildat')
conn.commit()

fout = open('tmpdat.csv', 'w')
fout.write('cid,year,month,bill,pay,stat\n')
for x in data:
    for j in range(6): 
        cid = x[0]
        year = 2005
        month = 4 + j # start at April
        stat = x[6 + j] # skip 6 columns 
        bill = x[12 + j]
        pay =  x[18 + j]
        fout.write('%d,%d,%d,%d,%d,%d\n' % (cid, year, month, bill, pay, stat))

fout.close()        
pd.read_csv('tmpdat.csv').to_sql('bildat', conn, index = False)
conn.commit()

Check DB contents: do we have 6 monthly entries per client?

In [19]:
for row in cur.execute('select * from bildat where cid = 2'):
    print(row)
(2, 2005, 4, 2682, 0, -1)
(2, 2005, 5, 1725, 1000, 2)
(2, 2005, 6, 2682, 1000, 0)
(2, 2005, 7, 3272, 1000, 0)
(2, 2005, 8, 3455, 0, 0)
(2, 2005, 9, 3261, 2000, 2)

Optional: Generate Complaint Cases

Here we create cases for complaints from the consumer complaints dataset and connect them to the credit card user records.

Apply Machine Learning Module to Cases

The service application interface uses RPA to query the customer data and integrate the results with the default estimation from the machine learning module.

Given a client ID we want to

  • query the Credit App for the data of the client
  • query the Billing App for the payment and status data
  • apply our machine learning model to predict the default of this client
  • show the results in a reasoably nice layout

Let's add the following to our mytools.py library:

def predict_default(cid, billdat):
    np.set_printoptions(formatter={'all':lambda x: '%6.2f' % x})
    cur = sqlite3.connect("credit.db").cursor()
    credat = cur.execute("select lim, sex, edu, mar, age from client where id = "
                         + cid).fetchone()
    inp = np.asarray([[ float(x) for x in credat ] + [ float(x) for x in billdat ]])
    clf, Xmax = pickle.load(open('ccd.pkl', 'rb'))
    pred = clf.predict_proba(inp[:1] / Xmax)
    fout = open('tmpdat.html', 'w')
    fout.write('<p><b>Client ' + cid + '</b>\n'
        + '<p> Client Data: ' + str(credat) + '\n'
        + '<p> Billing Data: ' + str(billdat) + '\n'
        + '<p> Prediction Pay/Default: %f %f\n' % tuple(pred[0]))
    fout.close()

Now we are ready to use the Robot Framework for accessing the Billing data of a given client.

To make this flexible with the client ID as a parameter we could e.g.

  • dynamically generate Robot code files from a Python script (easy)
  • use this Robot Framework code in a larger test file as a module defined in the Keywords section (tricky)

Accessing the BIlling App Data

In this case the layout of the billing data is well-known and fixed so we can use indexing in the HTML table.

The FOR IN loop makes the code somewhat more elegant. Do NOT write 6 x 3 almost identical statements using Get WebElement.

Note the multiple spaces between each word in the FOR line; if we only leave one blank between FOR and ${i} this would count as single word in the Robot Framework syntax and produce an error. This really takes some getting used to.

Are you wondering about the nested loop? Compare the format of the training data file with the layout of the billing data web site:

  • in the training data we have pay (status) followed by bill amt and pay amt
  • in the billing web site we have bill, pay, status

Another typical case as of a practical situation: the data does not really need to be processed, just re-arranged.

☆ Be very, very careful about these situations: errors do not necessarily show immediately.

Append To List lets us create a list of values. Note the somewhat strange initialisation in the Variables section: no value for empty list.

In [20]:
%%robot

*** Settings ***
Library             SeleniumLibrary
Library             Collections
Library             mytools.py

*** Variables ***
${URL}              http://localhost:8090/
${BROWSER}          Firefox         
@{l}=
${cid}=             100

*** Test Cases ***
Predict Client Default
    Open Browser    ${URL}    Firefox
    Click Link                 //a[@href="getbill"]
    Input Text                 //input[@name="cid"]             ${cid}
    Click Element              //input[@type="submit"]
    FOR  ${j}  IN  6  4  5
        FOR  ${i}  IN  2  3  4  5  6  7
            ${s}=           Get WebElement    //table//tr[${i}]/td[${j}] 
            Append To List  ${l}  ${s.text} 
        END
    END
    Predict Default    ${cid}  ${l}
    [Teardown]      Close Browser
PASS Predict Client Default

The HTML file contains the result of the prediction by the machine learning module.

We could invest in some more elaborate layout, and connect to the rest of the application in various ways, such as providing a link for the result of the analysis in the Credit App.

In [21]:
from IPython.display import HTML

HTML(filename="tmpdat.html")
Out[21]:

Client 100

Client Data: (20000, 1, 2, 1, 38)

Billing Data: ['0', '0', '0', '0', '0', '-1', '17973.00', '19367.00', '19559.00', '18240.00', '17928.00', '150.00', '1699.00', '1460.00', '626.00', '1750.00', '150.00', '0.00']

Prediction Pay/Default: 0.469680 0.530320

Exercises

Now it's your turn to create a stunning and inspiring application of robot process automation using machine learning!

The possibilities are endless. Set yourself a challenge -- make it something to be proud of.

In [ ]: