Generating RTF Reports

The Rich Text Format is often used for interfacing different applications working with text documents. RTF is a markup language, similar to HTML. Here is a very small RTF document:

{\rtf {\fonttbl {\f0 Times New Roman;}}
\f0\fs30 Hello World! } 

Opening and closing braces must match. Line breaks are not relevant for the result (just like in HTML).

Some other RTF elements:

Here is a RTF sample with a table.

Sample code in R generates an RTF file pm.rtf showing the result of a linear model fit.

Very similar routines can be written to produce HTML reports.

Python

While R has sufficient text processing functions for producing reports in various formats, the language is not particularly elegant in this respect.

With a scripting language like Python forming the link between database and statistics package we can easily generate reports on our data, and the code tends to be more legible than R. Suppose we are looking for areas that are dominated by certain industries, and we want to base our assertions on a statistical test, e.g. the chisq.test() provided in R, here used as a goodness of fit test.

After trying several industries using SQL queries interactively we find a very uneven distribution for construction sales.

select area, count(*) from sales, cust where sales.cid=cust.cid 
and indus = 'Construction' group by area order by area;
 area | count 
------+-------
   10 |     3
   20 |     1
   30 |    20

Next we would test for significance, but this might fail, and we do not want to go back to interactive SQL. Instead, we want to automate the whole process in order to save time in the future.

The chisqfit.py script combines several elements in our analysis:

The example contains well-known formatting code for HTML, and another version in RTF.

And here are some reports for Construction sales: