Basics

Hello world!

  • getting your first Python program to run
  • command line vs Jupyter notebooks
  • comments
  • print()

Python 3 vs Python 2:

Sadly a few years ago the Python developers decided to switch to a new version that introduces a number of tricky and subtle incompatibilities with the older Python 2. At this point in time we have to accept the facts and use Python 3.

However, note that Python 2 is still widely used and e.g. essential to the functioning of the Linux desktop environment. It is not a good idea to remove Python 2 from your system.

When running Python scripts on the command line it is not immediately clear which version is used. To make sure you can add the version explicitely by using the command python3 instead of just python.

Start your favorite text editor and enter the following line:

print('Hello World')

Save the file as hello.py and exit the editor. Run the script on the command line by entering

python3 hello.py

Jupyter Notebooks

The text you are reading was generated from a Jupyter notebook by simply using File/Dowload as HTML.

Notebooks combine markdown (a simplyfied version of HTML) and Python code.

Instead of editing scripts and running them on the command line you can put the python code into a code cell in a Jupter notebook and press the Run button.

In [7]:
# this is a comment
print('Hello World!')
Hello World!

Most of what we do in the following will work identically in both notebook and command line. If not, we will note the differences.

If you like the notebook approach you can easily install Jupyter on your own computer; see https://jupyter.org/install

Variables and types

Python values come in a number of basic types. Python uses the term class instead of type since it is more consistent with the object oriented view (which is also supported in Python).

Commonly used types are:

  • integer
  • float
  • string
  • boolean
  • list

When we use variables in Python we note that:

  • there are no declarations
  • the type has meaning for the operations
  • e.g. the operator + (plus) works for numbers and strings, but of course not in the same way
  • type() shows the type of a value; which is not always obvious

Python does not need declarations; just assign your variables as you need them:

In [23]:
i = 42
f = 3.14
s = 'a string'
lst = [2, 3, 5, 7, 11]
  
print(i, type(i))
print(f, type(f))
print(s, type(s))
print(lst, type(lst))
print(lst[0], type(lst[0]))
42 <class 'int'>
3.14 <class 'float'>
a string <class 'str'>
[2, 3, 5, 7, 11] <class 'list'>
2 <class 'int'>

Indexing starts at 0 in Python. The first element of lst is lst[0].

Lists can be nested:

In [11]:
lines = [ ['the', 'cat', 'sat'], ['on', 'the', 'mat'] ]
lst = [2, 3, 5, 7, 11]
lst = [ 42, [2,3,7], lines ]
print(lst)
  
[42, [2, 3, 7], [['the', 'cat', 'sat'], ['on', 'the', 'mat']]]

As we can see above, a list does not need to have a single data type; each item can be any value at all, including another list.

Types can be mixed at will. However, this sort of thing can get out of hand quickly and become very hard to read and understand; it is a powerful feature that should be used with prudence.

Nested lists as matrices; Numpy

Vectors and matrices in Python can be implemented using the list, but there is better way: Numpy.

The package Numpy comes standard with many Python distributions; if not, we need to install it (see below). It deals with numerical arrays efficiently and elegantly:

In [14]:
import numpy as np

# np.asarray(): convert to numpy.ndarray if necessary

a = [ [2, 3, 5], [7, 11, 13] ]
b = np.asarray(a)
print('type of a:', type(a))
print('type of b:', type(b))

# indexing: X[:,0] for first column of matrix X
print('matrix b:')
print(b)
print('first column of b:', b[:,0])
type of a: <class 'list'>
type of b: <class 'numpy.ndarray'>
matrix b:
[[ 2  3  5]
 [ 7 11 13]]
first column of b: [2 7]

Numpy can make use of multiple CPU cores: if the proper libraries are installed and configured (usually ATLAS, BLAS, OpenBLAS or similar) then the numpy code runs in parallel resulting in significant performance gains; e.g., when multiplying large matrices the speedup is close to the number of cores.

Numpy also offers a huge amount of utility functions, e.g. summary statistics for each column:

In [15]:
print(np.std(b, axis=0))
[2.5 4.  4. ]

Similarly:

  • mean(), sum(), min(), max()
  • dot product
  • matrix-vector multiplication

Exercise:

Find some more Numpy functions in the documentation https://numpy.org/ and add a few bits of code to see if you can use them confidently, e.g.

  • create a little matrix of some numerical data
  • at least 5 or 6 lines to make it more interesting
  • calculate some useful statistics, such as
    • mean
    • std dev
    • median
  • present this with a bit of text in the print statements

Block structure: loops, conditions

  • indent for block structure
  • keywords for, in, if, then, else
  • len()
  • range()

Python uses indentation for block structure:

In [16]:
for x in lst:
    print(x)
  
42
[2, 3, 7]
[['the', 'cat', 'sat'], ['on', 'the', 'mat']]

Conditions are expressed with indentation.

When moving code from one part of a script to another, take care that the proper level of indentation is achieved. This can become confusing when going beyond four or five levels; avoid that.

In [17]:
x = 12

if x == 42:
    print('x is 42!')
else:
    print('x is not 42!')
  
x is not 42!

Defined functions

  • keyword def
  • divide and conquer
  • no spaghetti code!

Functions are defined with the def keyword:

In [19]:
def myfun(x):
    print('x is', x)

myfun(42)
x is 42

The first two lines define the function, the last line actually calls it.

Note that inside the function the value of x is the value that has been passed as a parameter, not the value of any x that may have been assigned previously.

Command line parameters

Command line parameters are accessed using the package sys:

import sys

This is an area that works differently on the command line and in Jupyter notebooks. On the command line we have the following situation:

  • sys.argv[0] is the name of the script
  • sys.argv[1] is the first parameter on the command line, etc

Within a Jupyter notebook the list sys.argv contains other values. However, it is still a list, and to simulate the situation on the command line we use the following trick:

In [26]:
import sys

# trick for Jupyter notebook: make argv look as on the command line
sys.argv = ['myscript.py', '42', '12']

print('Command line parameters:', sys.argv[1:])
Command line parameters: ['42', '12']

The first parameter on the command line is sys.argv[1] since sys.argv[0] is the name of the script.

The notation argv[1:] uses a range; this indicates everything from 1 to the end.

Converting strings to numbers

When reading from the command line, and when reading from files, we need to remember that strings are not numbers. For computation we need conversion:

In [24]:
s = '42'
x = int(s)
print(type(s), type(x))
<class 'str'> <class 'int'>

Read two values from the command line and add them up:

In [27]:
x = int(sys.argv[1])
y = int(sys.argv[2])
print(x + y)
54

Note what happens when we forget about the conversion:

In [28]:
x = sys.argv[1]
y = sys.argv[2]
print(x + y)
4212

The operator + (plus) works on strings as well, concatenation them.

Exercise:

Using what we have discussed so far, write a little function that gets a string as a parameter (you can also pass it from the command line to make testing easier) and converts it into a Roman number, if possible, e.g.

XVII should be 17

Any order of the symbols will do, no need for the more sophisticated forms like IV = 4 (of course take that challenge, if you can).

In [31]:
# convert string to Roman number

def roman(s):
    r = 0
    for c in s:
        if c == 'I': r += 1
        if c == 'V': r += 5
        if c == 'X': r += 10
    return r

print(roman('XVII'))
17

The solution is pretty straight-forward once you realize that a string is a sequence of characters and can be treated just like a list in this respect.

The notation x += y is just shorthand for x = x + y

Refinements:

  • add the other symbols: L, C, M (very easy)
  • support the positional subtraction, such as IV = 4 (a little tricky)

Constructing lists

There are several ways of making lists. Often the list comprehension is more elegant and also more efficient. However, the append approach is sometimes more suitable:

  • start empty and append:

      lst = []
    
      for x in someExpression:
    
          lst.append(..)
  • List comprehension:

    • [ expr(x) for x in lst ]

    • [ expr(x) for x in lst if cond(x) ]

    • can be nested: [ [ ... for ... in ... ] for ... in ... ]

Example:

We want to split a text into sentences, and then split the individual sentences into words. This is a nice case for list comprehension.

All we need is a function to split a string give a delimiter: the split() function.

This function follows the object-oriented approach and is used on an existing string:

In [34]:
print('This is a text'.split())
['This', 'is', 'a', 'text']

Without parameter the function splits on white space i.e. blanks, tabs, and newlines.

With this function we can easily split a text into sentences and then, using list comprehension, into words:

In [33]:
txt = 'There are several ways of making lists. List comprehension is elegant. Append is sometimes better.'
sents = txt.split('.')
words = [ s.split() for s in sents ]
print(sents)
print(words)
['There are several ways of making lists', ' List comprehension is elegant', ' Append is sometimes better', '']
[['There', 'are', 'several', 'ways', 'of', 'making', 'lists'], ['List', 'comprehension', 'is', 'elegant'], ['Append', 'is', 'sometimes', 'better'], []]

Exercise:

In the example above there is an empty list element. Get rid of it.

Exercise:

Look at the following piece of code:

In [38]:
import random

alf = 'abcdef'
print(random.choice(alf))
f

When you execute that bit of code it will obviously print a random character every time.

Use this for the following:

Exercise:

  • Level 1: Write a little function that creates a short random poem.
  • Level 2: Make the words sound reasonable natural e.g. not consisting of 10 consonants only.
  • Level 3: Make the sentences adhere to some very simple grammar.
  • Level 4: The poem should make some sort of sense.

Obviously, the levels get progressively tougher, and level 4 is practically a research project. However, level 1 and 2 are certainly within your abilities at this time.

Other (possibly) useful functions:

  • len(s): the length of string s
  • random.randint(a, b): random integer x with a <= x <= b

Other Resources

DataCamp is a learning platform for data science.

  • In DataCamp you can learn Python, R, and SQL through a combination of videos and exercises.
  • Some lessons are free, e.g., the Intro to Python for Data Science.