Programming Languages for Linux

Here is a short overview of commonly used languages in the Linux environment, with their typical areas of application. Obviously most if not all of these languages are also available on other operating systems.

Performance measurement is a tricky subject. A first approach can be done using the time command:

time myprog someargs

Sed

The streaming editor is useful for text processing, especially within vi. Here is an example to translate string xx to yy:
sed '1,$s/xx/yy/g' < infile 

The notation 1,$ means from line 1 to the end of the input; s for substitute; g for global i.e. for every occurrence of the string.

Awk

Simple programming language named after the authors Aho, Weinberger, Kernighan. Good for one-liners, such as

awk '{ s += $2 } END { print s }' < infile  

This little program sums the numbers in the second column of the input lines.

Comparing solutions for Insertion Sort

Sed and Awk are capable of general-purpose problem solving, but they are typically used for small tasks. Large programming projects most commonly employ languages like Python, Java, or C/C++ [1]. As one of the typical Linux programming/scripting languages Perl is also thrown into the mix. These languages will be compared for implementing a simple sorting method called Insertion Sort. In pseudo-code it works like this (index origin zero):

insertion_sort(array a, length n):
  for i = 1 to n-1:
    x = a[i]
    for j = i to 0:
      if j = 0 or x >= a[j-1]: 
        a[j] = x
        break
      else:
        a[j] = a[j-1]

For the following comparison a complete little application is developed in each case, including random array generation, and testing the sorted array.

Perl

Originally designed as the Practical Extraction and Reporting Language; now used for many system administration tasks, and as a general purpose programming language. The Syntax takes some getting used to:

#!/usr/bin/perl

sub insertion_sort {
    my $a = shift;
    for ($i = 1; $i < @$a; $i++) {
        my $x = $$a[$i];
        for ($j = $i; $j >= 0; $j--) {
            if ($j == 0 || $x >= $$a[$j-1]) { 
                $$a[$j] = $x;
                last;
            } else {
                $$a[$j] = $$a[$j-1];
            }
        }
    }
}

sub check {
    my $a = shift;
    for ($i = 0; $i < @$a-1; $i++) {
        if ($$a[$i] > $$a[$i+1]) {
            print STDERR "NOT SORTED!\n";
            return 0;
        }
    }
    return 1;
}

$n = $ARGV[0] * 1;
$debug = $ARGV[1] * 1;
@a = ();
for ($i = 0; $i < $n; $i++) { 
    $a[$i] = int(rand($n + 10*$n));
}
if ($debug==1) { print "list:   ", (join " ", @a), "\n"; }
insertion_sort \@a;
if ($debug==1) { print "sorted: ", (join " ", @a), "\n"; }
check \@a;

Python

Python is a very good programming language to learn and use if maximum performance is not required. It supports the programmer rather than the machine.

Here is the insertion sort in Python (2023: outdated, see Notes below):

#!/usr/bin/python

import string
import sys
import random
import psyco
psyco.full()

def insertion_sort(a):
    n = len(a)
    for i in xrange(1,n):
        x = a[i]
        for j in xrange(i,-1,-1):
            if (j == 0 or x >= a[j-1]): 
                a[j] = x
                break
            else:
                a[j] = a[j-1]
    
def check(a):
    n = len(a)
    for i in xrange(0,n-1):
        if (a[i] > a[i+1]):
            print "NOT SORTED!"
  
def main():
    n = string.atoi(sys.argv[1])
    debug = string.atoi(sys.argv[2])
    a = range(n)
    for i in xrange(0,n):
        a[i] = random.randint(n,10*n)
    if (debug==1): print "random: ", a
    insertion_sort(a)
    if (debug==1): print "sorted: ", a
    check(a)

if __name__ == '__main__': main()

C

This is the classic Unix programming language; many parts of the Unix system are written in C, and many new pieces of software are still developed in C. It offers some unique advantages:

Of course, there are disadvantages as well:

Here is the insertion sort in C:

#include <stdio.h>
#include <stdlib.h>

void insertion_sort(int* a, int n) {
  int i,j,x;
  for (i = 1;i < n; i++) {
    x = a[i];
    for (j = i; j >= 0; j--) {
      if (j == 0 || x >= a[j-1]) { 
        a[j] = x;
        break;
      } else {
        a[j] = a[j-1];
      }
    }
  }
}

void print(char* s, int* a, int n) {
  int i;
  printf("%s",s);
  for (i = 0; i < n; i++) {
    printf("%d ",a[i]);
  }
  printf("\n");
}    

int main(int argc, char* argv[]) {
  int i,n,debug,*a;
  n = atoi(argv[1]);
  debug = atoi(argv[2]);
  a = malloc(n*sizeof(int));
  for (i = 0; i < n; i++) { 
    a[i] = rand() % (n + 10*n);
  }
  insertion_sort(a,n);
  if (debug==1) print("ins: ", a, n);
}

Java

The object-oriented language Java promotes the concept write once, run anywhere; however, problems with different version of development kit and runtime can still occur.

The syntax is similar to C and C++, which means about the same length of code, and similarly long development time. Not a very high level language.

The Java compiler generates byte code which is interpreted, and in certain cases compiled to native code, by the runtime. Because of this Just-In-Time compilation, the performance is on a level with C and C++. However, memory requirements tend to be much higher, to the extent that applications sometimes become practically unusable.

The JDK (Java development kit) is needed to develop applications. This is usually not part of the Linux distribution. A JRE (Java runtime environment) is needed to run Java code; this is often, but not always, part of the initial installation from the Live CD.

Here is the insertion sort in Java:

import java.util.*;
import java.text.*;

class InsertionSort {
  
  public static void main(String[] args) {
    int n = Integer.parseInt(args[0]);
    int debug = Integer.parseInt(args[1]);
    Random rand = new Random();
    int a[] = new int[n];
    for (int i = 0; i < n; i++) { 
      a[i] = rand.nextInt(n + 10*n);
    }
    insertionSort(a);
    if (debug==1) print("ins: ", a);
  }
  
  static void insertionSort(int[] a) {
    int n = a.length;
    for (int i = 1; i < n; i++) {
      int x = a[i];
      for (int j = i; j >= 0; j--) {
        if (j == 0 || x >= a[j-1]) { 
          a[j] = x;
          break;
        } else {
          a[j] = a[j-1];
        }
      }
    }
  }
  
  static void print(String s, int[] a) {
    System.out.print(s);
    for (int i = 0; i < a.length; i++) {
      System.out.print(a[i] + " ");
    }
    System.out.println("");
  }
}

Runtime comparison

The following table shows the runtime in seconds for insertion sort of n integers, on various computers:

     n       C     Java   Python     Perl 
               PC 2006, 32 bit
  5000    0.01     0.10     0.33     6.36 
 10000    0.04     0.20     0.71    25.14 
 20000    0.20     0.48     2.74      
 40000    0.74     1.70          
 80000    2.83     6.39          
               PC 2009, 32 bit
  5000    0.01     0.09     0.22     3.62 
 10000    0.03     0.09     0.41    14.13
 20000    0.11     0.19     1.55      
 40000    0.47     0.44          
 80000    1.85     1.54          
               Server 2008, 64 bit
  5000    0.02     0.14     3.27     5.99  
 10000    0.05     0.24    12.81    28.30  
 20000    0.19     0.41    50.63      
 40000    0.76     1.07           
 80000    3.02     3.70           
               PC 2015, 64 bit
  5000    0.01     0.06     0.05     1.46 
 10000    0.01     0.06     0.07     5.92 
 20000    0.06     0.11     0.21    23.76
 40000    0.26     0.27     0.73 
 80000    1.04     0.86     2.74 

Notes

References

[1] TIOBE popularity of programming languages