Python Basics

Johann Mitlöhner, 2012

Why use Python?

Python focuses on productivity and low maintance cost. However, there are situations when not to use Python:

History and Characteristics

Python is well suited for data preparation, printing various file formats, and automating repeated tasks with other software packages, such as statistics systems. Accessing databases is very easy, and due to it simple and clear syntax, the language can be learned in a short time. Therefore, Python is often used as a 'glue' between different systems, such as retrieving data from a DB, formatting in specific ways, and calling other software for further processing, such as statistical analysis.

Some examples (Unix, Python 2)

Python can be used interactively, but usually by Script: use text editor to create file hello.py, e.g. pico hello.py:

#!/usr/bin/python

print "Hello!"

Save file and leave editor. Then,

chmod +x hello.py
./hello.py

There are other options to make this script run:

Indentation is used to mark blocks, such as loops:

for i in range(10):
  print "i:", i

Some other operations:

Python supports object oriented programming, but this text will not cover more than necessary to use the standard tools.

  x = "ab,cd,efg".split(",")
  x
['ab', 'cd', 'efg']
  len(x)
3
  x[2]
'efg'

Reading and Writing Files

The following program will read a file; if you put this code in a file called fileread.py, it will read itself!

f = open("fileread.py")
for rec in f:
  print rec.strip()
f.close()

The following program will write to a file called tmp.dat:

from random import randint

f = open("tmp.dat", "w")
f.write("x     y\n")
for i in range(1,11):
  x = randint(15000, 25000)
  y = randint(10, 50)
  f.write(str(x) + " " + str(y) + "\n")
f.close()

After running this program the file tmp.dat will contain something like this:

x     y
20779 23
20163 42
19763 10
17358 11
24389 47
19770 19
21764 48
22383 37
17829 50
17884 12

The format of this file allows it to be used as a data file in the statistics system R.

Functions and Modules

Well-structured code is much easier to maintain. Function definition is a must for every non-trivial program. For large applications, defining separate modules usually achieves better structure.

#!/usr/bin/python

# use modulo % to check even/odd
def odd(x):
  if (x % 2) == 1: return True
  else: return False

def main():
  for i in range(10):
    if odd(i):
      print i, "is odd!"

if __name__ == '__main__': main()

The last line decides whether to start the main() function:

Coding style:

Your code will be easier to read and debug if you stick with a particular coding style, e.g. for

The following module should go into a file called tools.py; it contains its own main() function:

#!/usr/bin/python

def prime(x):
  for i in range(2, x):
    if(x % i) == 0: return False
  return True

def main():
  print "main() from tools"

if __name__ == '__main__': main()

However, when imported in the following code, the tools main() function will not be executed.

#!/usr/bin/python

import tools

def main():
  for i in range(1,10):
    if tools.prime(i):
      print i, "is prime!"

if __name__ == '__main__': main()

Instead, the main() function in this file is called.

Accessing Databases

The following program will access a Postgres database, execute an SQL select statement, and print the result. Obviously, this will only work on a computer where

#!/usr/bin/python

import psycopg2
import os

def testpg():
  conn = psycopg2.connect(dbstr())
  cur = conn.cursor()

  cur.execute("select color, count(*) from PROD group by color limit 5;")
  for rec in cur:
    (color, cnt) = rec
    print color, cnt

  cur.close()
  conn.close()

def main():
  testpg()

def dbstr():
  return open("db.txt").read()

if __name__ == '__main__': main()

When inserting values into database tables, use the following code snippet as a template:

...
id = 10
name = "Smith"
salary = 20000
cur.execute("insert into EMP (eid, name, salary) values (%s, %s, %s)", (id, name, salary))
...
cur.close()
conn.commit()
conn.close()

Various Python-related

Interpreters and Compilers

Performance

Python was developed with programmer performance in mind, not computer performance. Note that programmer time is much more expensive than computer time. However, some applications demand certain response times, and Python may not be able to cope.

Performance benchmarks are notoriously difficult and misleading. Here, the Quicksort algorithm has been implemented in several languages, using the same approaches everywhere as far as the language permits. The table shows the execution time in seconds for sorting n random integers, run on a 32 bit Intel Core 2 Duo E8400, 3 GHz, 6 MB cache, 4 GB RAM.

n C Java Python Pypy Psyco
10,000 0.003 0.084 0.056 0.187 0.032
100,000 0.018 0.092 0.545 0.906 0.175
1,000,000 0.123 0.216 6.256 4.066 1.814

Here is the same benchmark on a server running 64 bit Linux, a virtual machine with 6 cores, Intel Xeon E7310 1.6 GHz, 2 MB cache, 8 GB RAM.

n C Java Python Pypy Psyco
10,000 0.005 0.163 0.103 0.374 -
100,000 0.021 0.256 0.888 1.227 -
1,000,000 0.187 0.437 10.208 1.978 -

Some notes of caution for interpreting and drawing conclusions from these benchmarks:

External Code: Python can call external code, such as C programs. If the performance bottleneck can be tracked to a small number of simple functions, those pieces can be written in C with acceptable effort and integrated by normal Python function calls. There are several alternatives, one of them is ctypes. The interface is not trivial, but its use it feasible for moderately experienced developers.

Other Resources

The primary source for everything concerning Python is python.org.

A good source for solutions to common programming problems is stackoverflow.com.