Python 3 vs Python 2:
Sadly a few years ago the Python developers decided to switch to a new version that introduces a number of tricky and subtle incompatibilities with the older Python 2. At this point in time we have to accept the facts and use Python 3.
However, note that Python 2 is still widely used and e.g. essential to the functioning of the Linux desktop environment. It is not a good idea to remove Python 2 from your system.
When running Python scripts on the command line it is not immediately clear which version is used. To make sure you can add the version explicitely by using the command python3 instead of just python.
Start your favorite text editor and enter the following line:
print('Hello World')
Save the file as hello.py and exit the editor. Run the script on the command line by entering
python3 hello.py
The text you are reading was generated from a Jupyter notebook by simply using File/Dowload as HTML.
Notebooks combine markdown (a simplyfied version of HTML) and Python code.
Instead of editing scripts and running them on the command line you can put the python code into a code cell in a Jupter notebook and press the Run button.
# this is a comment
print('Hello World!')
Most of what we do in the following will work identically in both notebook and command line. If not, we will note the differences.
If you like the notebook approach you can easily install Jupyter on your own computer; see https://jupyter.org/install
Python values come in a number of basic types. Python uses the term class instead of type since it is more consistent with the object oriented view (which is also supported in Python).
Commonly used types are:
When we use variables in Python we note that:
Python does not need declarations; just assign your variables as you need them:
i = 42
f = 3.14
s = 'a string'
lst = [2, 3, 5, 7, 11]
print(i, type(i))
print(f, type(f))
print(s, type(s))
print(lst, type(lst))
print(lst[0], type(lst[0]))
Indexing starts at 0 in Python. The first element of lst is lst[0].
Lists can be nested:
lines = [ ['the', 'cat', 'sat'], ['on', 'the', 'mat'] ]
lst = [2, 3, 5, 7, 11]
lst = [ 42, [2,3,7], lines ]
print(lst)
As we can see above, a list does not need to have a single data type; each item can be any value at all, including another list.
Types can be mixed at will. However, this sort of thing can get out of hand quickly and become very hard to read and understand; it is a powerful feature that should be used with prudence.
Vectors and matrices in Python can be implemented using the list, but there is better way: Numpy.
The package Numpy comes standard with many Python distributions; if not, we need to install it (see below). It deals with numerical arrays efficiently and elegantly:
import numpy as np
# np.asarray(): convert to numpy.ndarray if necessary
a = [ [2, 3, 5], [7, 11, 13] ]
b = np.asarray(a)
print('type of a:', type(a))
print('type of b:', type(b))
# indexing: X[:,0] for first column of matrix X
print('matrix b:')
print(b)
print('first column of b:', b[:,0])
Numpy can make use of multiple CPU cores: if the proper libraries are installed and configured (usually ATLAS, BLAS, OpenBLAS or similar) then the numpy code runs in parallel resulting in significant performance gains; e.g., when multiplying large matrices the speedup is close to the number of cores.
Numpy also offers a huge amount of utility functions, e.g. summary statistics for each column:
print(np.std(b, axis=0))
Similarly:
Exercise:
Find some more Numpy functions in the documentation https://numpy.org/ and add a few bits of code to see if you can use them confidently, e.g.
Python uses indentation for block structure:
for x in lst:
print(x)
Conditions are expressed with indentation.
When moving code from one part of a script to another, take care that the proper level of indentation is achieved. This can become confusing when going beyond four or five levels; avoid that.
x = 12
if x == 42:
print('x is 42!')
else:
print('x is not 42!')
Functions are defined with the def keyword:
def myfun(x):
print('x is', x)
myfun(42)
The first two lines define the function, the last line actually calls it.
Note that inside the function the value of x is the value that has been passed as a parameter, not the value of any x that may have been assigned previously.
Command line parameters are accessed using the package sys:
import sys
This is an area that works differently on the command line and in Jupyter notebooks. On the command line we have the following situation:
Within a Jupyter notebook the list sys.argv contains other values. However, it is still a list, and to simulate the situation on the command line we use the following trick:
import sys
# trick for Jupyter notebook: make argv look as on the command line
sys.argv = ['myscript.py', '42', '12']
print('Command line parameters:', sys.argv[1:])
The first parameter on the command line is sys.argv[1] since sys.argv[0] is the name of the script.
The notation argv[1:] uses a range; this indicates everything from 1 to the end.
When reading from the command line, and when reading from files, we need to remember that strings are not numbers. For computation we need conversion:
s = '42'
x = int(s)
print(type(s), type(x))
Read two values from the command line and add them up:
x = int(sys.argv[1])
y = int(sys.argv[2])
print(x + y)
Note what happens when we forget about the conversion:
x = sys.argv[1]
y = sys.argv[2]
print(x + y)
The operator + (plus) works on strings as well, concatenation them.
Exercise:
Using what we have discussed so far, write a little function that gets a string as a parameter (you can also pass it from the command line to make testing easier) and converts it into a Roman number, if possible, e.g.
XVII should be 17
Any order of the symbols will do, no need for the more sophisticated forms like IV = 4 (of course take that challenge, if you can).
# convert string to Roman number
def roman(s):
r = 0
for c in s:
if c == 'I': r += 1
if c == 'V': r += 5
if c == 'X': r += 10
return r
print(roman('XVII'))
The solution is pretty straight-forward once you realize that a string is a sequence of characters and can be treated just like a list in this respect.
The notation x += y is just shorthand for x = x + y
Refinements:
There are several ways of making lists. Often the list comprehension is more elegant and also more efficient. However, the append approach is sometimes more suitable:
start empty and append:
lst = []
for x in someExpression:
lst.append(..)
List comprehension:
[ expr(x) for x in lst ]
[ expr(x) for x in lst if cond(x) ]
can be nested: [ [ ... for ... in ... ] for ... in ... ]
Example:
We want to split a text into sentences, and then split the individual sentences into words. This is a nice case for list comprehension.
All we need is a function to split a string give a delimiter: the split() function.
This function follows the object-oriented approach and is used on an existing string:
print('This is a text'.split())
Without parameter the function splits on white space i.e. blanks, tabs, and newlines.
With this function we can easily split a text into sentences and then, using list comprehension, into words:
txt = 'There are several ways of making lists. List comprehension is elegant. Append is sometimes better.'
sents = txt.split('.')
words = [ s.split() for s in sents ]
print(sents)
print(words)
Exercise:
In the example above there is an empty list element. Get rid of it.
Exercise:
Look at the following piece of code:
import random
alf = 'abcdef'
print(random.choice(alf))
When you execute that bit of code it will obviously print a random character every time.
Use this for the following:
Exercise:
Obviously, the levels get progressively tougher, and level 4 is practically a research project. However, level 1 and 2 are certainly within your abilities at this time.
Other (possibly) useful functions:
DataCamp is a learning platform for data science.