Linux History, Terms, and Concepts

History

In terms of lines of code: Linux System = Linux Kernel (6%) + GNU software (15%) + other free software (79%)

Terms

File systems

File systems provide a unified interface to physical media for applications i.e. program code for reading and writing files to hard disk, USB stick, and SD card is identical. File systems differ in terms of (among other things)

Often file names are given and cannot be changed. However, if you have the choice, consider the following:

Here is an overview of the most commonly used file systems:

Buffers and Page Cache

When applications read or write files the system does not implement each access as a physical operation on the disk; this would result in very poor performance. Instead, files are copied to RAM where they stay for some time so subsequent requests can be performed much faster. A large amount of available RAM is therefore beneficial for performance.

Buffers are used for I/O operations that move data between storage devices, such as from RAM to disk.

The page cache is an area of memory where file content is stored for faster access. This area is typically much larger than the buffers and can occupy a significant part of the total RAM, leaving very little free memory, since the cache memory can be quickly re-allocated for programs if needed. Memory usage is only meaningful when listed for programs and cache separately.

Under normal conditions the system takes care of the buffering and page caching without the user noticing, with the exception of removable drives like USB sticks which need to be 'unmounted' or 'ejected' before physical disconnect in order to make sure that any data in memory can be written to the device before it is unplugged.

Encodings

Character encodings are used to translate byte streams into characters displayed on the screen. The hard drive contains files which in turn contain bytes. Each byte contains 8 bits, therefore a single byte can encode 256 different characters.

The ASCII code (American Standard Code for Information Interchange) is a 7 bit code. Its 128 positions are occupied by the characters used in the English language, and some control characters used in data communication. Each character is encoded in one byte.

The ISO 8859-1 character encoding contains the ASCII characters as the first 128 entries, and other characters used in Western European languages in the remaining 128 positions. Each character is encoded in one byte.

UTF8 is the standard encoding for Unicode. The Unicode project aims at providing support for all characters used by any major community in the world. Currently the Unicode table contains about 110,000 characters.

From the position in the Unicode table the UTF8 encoding of the character can be derived:

UTF8 is becoming the standard in information processing. However, many applications still use other encodings, and this continues to cause problems.

Note that the ASCII characters have the same byte values in all three encodings described above. Files containing only ASCII characters have the best chance of being correctly transferred across different types of systems and processed by whatever application software will work on them. For this reason it is still a good idea to only use ASCII characters if at all feasible.

Network Basics

Linux and Unix systems are usually connected to the Internet, and often function as servers i.e. provide a number of services to the outside world.

In order to achieve data communication between networked computers (hosts) a number of protocols have to be established; these form the Internet Protocol Suite which is commonly described in the following layers (in the TCP/IP model):

Ethernet is a family of networking technologies commonly used in the LAN (Local Area Network).

DSL (Digital Subscriber Line) is a family of technologies for transmitting digital data over telephone lines.

PPP (Point to Point Protocol) is used to establish a direct connection between two nodes over many types of physical networks, including cellular phone.

IP (Internet Protocol) is used for packet construction, addressing and routing along a number of nodes from source to destination.

UDP (User Datagram Protocol) is a fast and lightweight connection-less protocol. TCP (Transmission Control Protocol) is a slower, heavyweight connection-oriented protocol.

Both UDP and TCP use port numbers: when an application sends a request to a server host, the correspoding service at the destination is identified by port number, since that host may provide a number of different services. Some services are identified by their well-known ports, such as

A firewall is commonly used to allow only certain types of network traffic to certain hosts and ports, thereby avoiding a large number of problems associated with malicious requests. A firewall establishes a barrier against attacks and involves one or more computers or specialized hardware.

If a host is meant to answer HTTP request and allow users to connect via ssh then the ports 80 and 22 have to be open. The following command can be used to find the open ports of a given host, and the services listening on those ports. Scanning for open ports can be interpreted as preparing for an attack, so consider carefully before using this command on hosts outside you own control.

nmap localhost

Note that while this command identifies the open ports on your local host, the results do not mean that those ports are actually reachable from outside your LAN or organisation. A number of port scanner web sites are available to test the ports on your host reachable from the Internet.

Linux Installation

Look at the website distrowatch.com to get an overview of current Linux distributions, their popularity, and their features. The most popular distributions are:

  1. Mint is based on Ubuntu and comes in several 'spins':
  2. Ubuntu is based on Debian. It comes with the new Gnome 3 desktop which is not universally popular with users, especially those coming from Gnome 2, as the interface design has been changed considerably and for no good reason, many would feel, the present author included. By switching to the 'Classic' desktop upon login the traditional Gnome 2 design can be restored to some extent.
  3. Debian is the system of choice for servers. It can be run on the desktop, but this is best left to more experienced users.

32 or 64 bit: Until a few years ago the 64-bit versions of Linux had some problems (flash animations, to name just one), but today 64-bit is the better choice for most users. Typical current desktop computers and notebooks support 64-bit mode. However, check if your machine actually does: Just try to boot from the live CD or stick. The 64-bit versions of applications tend to show somewhat better performance, but the main difference is memory addressing. A 32-bit pointer can address 2^32 = 4294967296 bytes of memory, i.e. about 4 GB, a serious limitation when many desktop PCs today come with 8 GB of RAM or more.

Steps for installation:

Partition size: For a basic desktop system you only need one partition mounted as / (root) with a minimum size of about 10 GB. Given current sizes and prices of hard disks about 30-50 GB is a more reasonable choice.

If you already have another operating system installed then that partition will have to be resized to make room for the new Linux partition. Resizing a partition can take from several minutes to half an hour or more. This process must not be interrupted.

Swap space: The installer suggests to create a swap partion, but for a desktop installation it is more flexible to use a swap file which can be set up later.

Encrypt home directory: This is very much recommended, especially for notebooks. Obviously it means using a sufficiently strong login password. See section Tools/Keyring for details.

A strong password should be at least 11 characters long and must not be based on dictionary words. Use upper case, lower case, and digits, but do not substitute digits for letters. Good old simple passwords like netw0rk or g0ldf1sh can be cracked almost instantly nowadays.

After the installation is finished you will see the boot manager taking over at startup. It shows the boot menu and allows you to choose an operating system for this session.

After installation the boot manager defaults to the new Linux system. Once the system is up and you are logged in the default can be changed with

sudo grub-set-default n

in a terminal window, where n is the number of the entry you see in the boot menu (index origin zero!).

When your system is up and running you may want to start your Software Manager and install some additional packages. Here are some suggestions:

Running software for other operating systems

Within your Linux session you sometimes want to run software from other operating systems.

Participate!

Even if your programming skills are not top-notch, and you don't have the skill or time to write documentation and tutorials, you can still take part in the Open Source movement with moderate effort: