Linux - History, Terms, and Concepts
Motivation - Why Open Source?
Big Tech Giants - short lists of selected acquisitions
- Apple: tends to buy lots of smaller companies
- NeXT, 1997, 400M, hardware and software
- Beats, 2014, 3B, headphones and music streaming
- Shazam, 2018, 400M, music and image recognition
- Intel's smartphone modem business, 2019, 1B
- Microsoft:
- Skype, 2011, 8.5B, video chat and messaging
- Nokia, 2013, 7.2B, mobile phones
- LinkedIn, 2016, 26B, professional social networking
- GitHub, 2018, 7.5B, software development and version control
- Activision Blizzard, 2022, 68.7B, video games
- openAI chatGPT, 2023, 10B - 49% stake in company (?), artifical intelligence
- Google/Alphabet:
- YouTube, 2006, 1.6B, video sharing
- DoubleClick, 2007, 3.1B, online advertising
- Motorola, 2011, 12.5B, smartphones
- Kaggle, 2017, ?B, data science online community
- FitBit, 2021, 2.1B, wearables
- Amazon:
- Goodreads, 2014, ?M, social book cataloging
- IMDB, 1998, 55M, Internet Movie Database
- Audible, 2008, 300M, audio books
- Twitch, 2014, 970M, live streaming video games and esports
- Whole Foods Market, 2017, 13.7B, multinational supermarket chain
- MGM, 2021, 8.5B, Film and TV media
- Facebook/Meta:
- Instagram, 2012, 1B, photo and video sharing social networking service
- WhatsApp, 2014, 19B, messaging and voice over IP service
- Oculus VR, 2014, 2B, VR and AR hardware and software
See also
History
-
1969: Ken Thompson wants an efficient bare-bones system to run his
favourite Multics game Space Travel on a PDP-7 (4k of 18 bit words)
-
1972: Unix rewritten in C
- source code for both Unix and C compiler made available to many academic institutions
- this resulted in porting to a wide variety of machines
-
1978: Berkeley Software Distribution (BSD)
- Unix derivative developed by University of California, Berkely until 1995
- initially closed source, but easily licensed, and transition to open source from 1991
- base for many proprietary Unix versions, and open source versions such as FreeBSD
-
1983: GNU Project announced by Richard Stallman at MIT
-
free software: not in price, but in being unrestricted by other
distributors to modify the program as necessary
-
mass collaboration project
-
develop a sufficient body of free software ... to get along without
any software that is not free
-
full access to source code
-
requirement of then posting any changes made publicly for other
users to benefit from as well
-
lots of software, but no stable kernel
-
26 Aug 1991: Linus Torvalds posts to comp.os.minix: ``I'm doing a (free)
operating system''
-
Sep 1991: Linux version 0.01 released to FTP server ftp.funet.fi of the
Helsinki University of Technology
-
Dec 1991: Linux 0.11 - self-hosted (compile Linux on Linux)
-
Aug 1993: Debian
-
Nov 1994: Red Hat
-
1996: S.u.S.E Linux 4.2
-
2000: Knoppix Live CD
-
2003: Fedora (free Red Hat)
-
2004: Ubuntu 4.10 The Warty Warthog Release
In terms of lines of code:
Linux System = Linux Kernel (6%) + GNU software (15%) +
other free software (79%)
Terms
-
Distributions :
- Debian, Ubuntu, Mint, Suse, Red Hat, Fedora, ...
- software collections with support for easy installation
- package manager installs dependencies automatically
-
updates : security patches and other updates can be installed
via the package manager, for all installed applications
-
release : depending on the distribution, new releases become
available with certain frequencies, e.g. Ubuntu has a new release every 6
months for up to 18 months for regular releases, and up to 3 years for
LTS i.e. long term support releases.
Going to a new release is often a
difficult and risky process, and is therefore avoided unless absolutely
necessary.
-
kernel : central part of the operating system, always in memory,
takes care of the most essential functions. Modular kernels employ
dynamically re-configurable sets of software modules that provide
additional functionality, e.g. encryption.
-
swap space : virtual memory extension for parts of data and code
that are not needed at the moment and only read back at a later time.
When swapping starts the performance is affected strongly, but at least
the system continues to work. Without swap space the system starts
terminating arbitrary processes, often with unfortunate results. There
are two options for swap space:
-
swap partition: a contigous physical area on the disk is reserved
-
swap file: a file is created with physically adjacent blocks of data
for fast access. Tools are dd, mkswp. Performance is only slightly
worse then a swap partition, and the setup is much more flexible.
Desktop computers can work without a swap space, although creating one
is still recommended. Server installations always use swap space; about
3 x physical memory is a rule of thumb.
-
boot manager : a small piece of code presenting a list of
operating systems at startup. The selected system is started and used
for the remainder of the session.
-
dual-boot : two operating systems are installed, one of them is
selected in the boot manager for use in the session.
-
desktop : graphical framework including window manager, task
panels, and many other user interface elements.
-
Some commonly used desktops: Gnome, KDE, Cinnamon, XFCE
-
appearance and interface design varies, but functionality within
applications remains unchanged, e.g. Gnome apps can be run in KDE or XFCE
-
X window system : implementation of client-server-architecture
for graphical user interface in Unix systems. The X server runs on the
local machine and provides a graphics (and mouse/keyboard) interface;
clients run locally or remotely and communicate with the X server
-
Live CD : Linux installation on CD or USB stick to be used
during a session, without changes to the hard disk.
Useful for evaluating a distribution; can also be used for installation.
-
ram disk : a section of RAM is used as a temporary hard disk,
e.g., for installation of additional applications; these are gone when
the live session is terminated.
-
root user : computer account with unlimited permissions. Only
used for adminstration, such as installation of additional packages.
Should never be used for day-to-day work.
-
installation : permantly putting the operating system onto the
hard disk, possibly using dual-boot and a boot manager. Usually involves
setting up partitions.
-
apt : Mint, Ubuntu, and Debian package manager. Most versatile in the
shell: apt-get
- apt-get update: get the current list of packages from the repository
-
apt-get dist-upgrade : every installed package is upgraded to the
current version. Should be done in regular intervals, especially with
computers that are connected to the Internet.
-
repositories (Ubuntu specific): software is available with different
licenses and levels of support, reflected in the different Ubuntu repositories:
-
main : free software, maintained by Canonical (the company
the provides the Ubuntu distribution)
-
restriced : proprietary i.e. non-free, but supported by
Canonical, e.g., graphic card drivers developed and provided by card
manufacturer
-
universe : free software maintained by the open source
community
-
multiverse : software encumbered by license problems, which
are irrelevant for most users
In most cases all repositories can be used
(System/Administration/Software Sources). This is the default since
Ubuntu 9.04.
-
apt-mirror : with slow or bandwith-limited Internet connections
the installation of additional package can be very time consuming or
costly; apt-mirror provides a simple solution. Take an external USB hard
disk with at least about 50 GB of free space to a location with good
Internet access and use apt-mirror to put the complete Ubuntu
distribution on the disk. This disk can then be used at home as
installation source.
File systems
File systems provide a unified interface to physical media
for applications
i.e. program code for reading and writing files to hard disk, USB stick,
and SD card is identical.
File systems differ in terms of (among other things)
- support across various operating systems and devices
- maximum volume size
- maximum file size
- character set in file names
Often file names are given and
cannot be changed. However, if you have the choice, consider the
following:
-
for maximum compatibility only use file names containing a-z, A-Z,
0-9, dot (.), underscore (_), and hyphen (- not at start of name)
-
under no circumstances use backslash (\), forward slash (/),
semicolon (;), or colon (:) in your filenames.
-
national language characters such as German umlauts are also not
recommended for filenames.
Here is an overview of the most commonly used file systems:
-
FAT32
-
old Windows file system
-
simple and robust
-
best support in non-Windows systems, including devices like mp3
players
-
many tools e.g. for recovery
-
maximum file size of 4 GB
-
recommended maximum volume size of 2 TB, otherwise compatibility problems
- file names are not case-sensitive
NTFS
-
newer file system for Windows
- supports large files and volumes
-
fully supported in Linux, but not necessarily by other systems or
devices
ext3/ext4
-
standard Linux file system
- very high limits for file and volume sizes; details (such as total number of files on volume)
depend on configuration at file system creation
-
support in Windows via free driver Ext2Fsd
fine in XP, problems in Vista/7/8 due to driver signing forced by Microsoft
workaround by putting Windows into test mode, search for install unsigned drivers Windows
Buffers and Page Cache
When applications read or write files the system does not implement each access as a physical operation
on the disk; this would result in very poor performance. Instead, files are copied to RAM
where they stay for some time so subsequent requests can be performed much faster.
A large amount of available RAM is therefore beneficial for performance.
Buffers are used for I/O operations that move data between storage devices, such as
from RAM to disk.
The page cache is an area of memory where file content is stored for faster access. This area is
typically much larger than the buffers and can occupy a significant part of the total RAM, leaving very little
free memory, since the cache memory can be quickly re-allocated for programs if needed. Memory usage is only
meaningful when listed for programs and cache separately.
Under normal conditions the system takes care of the
buffering and page caching without the user noticing, with the exception of removable drives like USB sticks which
need to be 'unmounted' or 'ejected' before physical disconnect in order to make sure that any data in memory can
be written to the device before it is unplugged.
Encodings
Character encodings are used to translate byte streams into characters
displayed on the screen. The hard drive contains files which in turn
contain bytes. Each byte contains 8 bits, therefore a single byte can
encode 256 different characters.
The ASCII code (American Standard Code for Information Interchange)
is a 7 bit code. Its 128 positions are occupied by the characters used in
the English language, and some control characters used in data
communication. Each character is encoded in one byte.
The ISO 8859-1 character encoding contains the ASCII characters as
the first 128 entries, and other characters used in Western European
languages in the remaining 128 positions. Each character is encoded in one
byte.
UTF8 is the standard encoding for Unicode. The Unicode
project aims at providing support for all characters
used by any major community in the world. Currently the Unicode table contains about
110,000 characters.
From the position in the Unicode table the UTF8 encoding of the character can
be derived:
-
The first 128 characters in the Unicode table are the ASCII characters.
These characters are encoded in single bytes, and the byte values are identical to the ASCII code.
-
The next 1920 characters are encoded in two bytes. This part of the
Unicode table contains characters frequently used in European languages, such as
German umlauts, French accents, and Cyrillic letters.
-
The remaining characters are encoded in three bytes (e.g. Chinese and
Japanese characters, and, oddly enough, the Euro symbol), and four
bytes.
UTF8 is becoming the standard in information processing. However,
many applications still use other encodings, and this continues to cause
problems.
Note that the ASCII characters have the same byte values in all three
encodings described above. Files containing only ASCII characters have
the best chance of being correctly transferred across different types
of systems and processed by whatever
application software will work on them. For this reason it is still a
good idea to only use ASCII characters if at all feasible.
Network Basics
Linux and Unix systems are usually connected to the Internet, and often function as servers i.e.
provide a number of services to the outside world.
In order to achieve data communication between networked computers (hosts)
a number of protocols have to be established; these form the Internet Protocol Suite which is commonly
described in the following layers (in the TCP/IP model):
- Link Layer: protocols concerned with the local network segment a host is directly connected to,
such as Ethernet, PPP, DSL
- Internet Layer: protocols for getting individual packets from the source to the destination host
across network boundaries, thereby forming an inter-net by connecting multiple networks through
gateways; IP
- Transport Layer: protocols providing host-to-host communication services, most importantly
TCP and UDP
- Application Layer: protocols and methods covering various specific Internet functionalities, such as HTTP, SSH, DHCP, DNS
Ethernet is a family of networking technologies commonly used in the LAN (Local Area Network).
DSL (Digital Subscriber Line) is a family of technologies for transmitting digital data
over telephone lines.
PPP (Point to Point Protocol) is used to establish a direct connection between two nodes
over many types of physical networks, including cellular phone.
IP (Internet Protocol) is used for packet construction, addressing and routing along a number
of nodes from source to destination.
-
A packet contains a header with information such
as destination address, followed by the data.
- Each host in the Internet needs a unique numeric IP address which is associated with the MAC
(media access control) address of the network interface hardware.
- Clients such as notebooks or desktop PCs are usually assigned IP addresses temporarily via DHCP
(Dynamic Host Configuration Protocol).
- Servers
are usually assigned permanent IP addresses.
- Numeric IP addresses are associated with more easily remembered host names
via DNS (Domain Name System).
- Routing involves reading the packet destination address and selecting the next
node along the path through the network in order to
propagate the packet towards the destination. This is usually done by special hardware called routers.
UDP (User Datagram Protocol) is a fast and lightweight connection-less protocol.
- In order to send data across the network the data is split into packets.
- In UDP these packets are sent without guarantee of success
or order of arrival. At the destination packets can arrive in changed order or in duplicate, and packets can be lost.
- UDP is used in applications where speed is paramount, and occasional errors are not
much of a problem, e.g. video streaming; however, DHCP and DNS also use UDP.
TCP (Transmission Control Protocol) is a slower, heavyweight connection-oriented protocol.
- TCP employs mechanisms that guarantee
that packets are assembled at the destination in correct order without any packets missing.
- TCP is used to transfer files and other data
that need to arrive complete and unchanged, e.g. HTTP and Email.
- Note that while checking and correcting mechanisms are applied
there is still a (very) small risk of error. Therefore, checksums are often used when very large files
are transferred.
Both UDP and TCP use port numbers: when an application sends a request to a server host, the correspoding service at the destination
is identified by port number, since that host may provide a number of different services.
Some services are identified by their well-known ports, such as
- Port 80: HTTP (HyperText Transfer Protocoll). A web server (http daemon, such as the Apache httpd)
is a program that runs continously and listens
for requested URLs on this port. When a request arrives, it spawns an instance
that answers the request by sending a reply in HTML.
- Port 22: SSH Secure SHell. The ssh server allows remote users to connect to the server
and login to the system, using their credentials.
A firewall is commonly used to allow only certain types of network traffic to certain hosts and
ports, thereby avoiding a
large number of problems associated with malicious requests. A firewall establishes a barrier
against attacks and involves one or more computers or specialized hardware.
If a host is meant
to answer HTTP request and allow users to connect via ssh then the ports 80 and 22 have to be open.
The following command can be used to find the open ports of a given host, and the services
listening on those ports. Scanning
for open ports can be interpreted as preparing for an attack, so consider carefully before using
this command on hosts outside you own control.
nmap localhost
Note that while this command identifies the open ports on your local host, the results do not
mean that those ports are actually reachable from outside your LAN or organisation.
A number of port scanner web sites are available to test the ports on your host reachable from the
Internet.
Linux Installation
Look at the website distrowatch.com to get
an overview of current Linux distributions, their popularity, and their features.
Popular distributions are:
- Mint is based on Ubuntu and comes in several 'spins' i.e. desktops:
- The Cinnamon desktop is based on Gnome 2 and sports a user interface
that includes effects and tends to offer a little more brand-new features.
- The XFCE desktop uses a little less memory and processing resources and
is therefore suited for machines with slower CPUs or small RAM, and for
people who prefer a particularly simple and slim design.
- The MATE desktop is another plain desktop close to Gnome 2.
- Ubuntu is based on Debian. It comes with the new Gnome 3 desktop which is not universally popular with
users, especially those coming from Gnome 2, as the interface design has been changed
considerably and for no good reason, many would feel, the present author included.
By switching to the 'Classic' desktop upon login the traditional Gnome 2 design can be restored to some extent.
- Debian is the system of choice for servers. It can be run on the
desktop, but this is best left to more experienced users.
- Fedora is another popular choice; this one is not based on Debian.
The choice of desktop is not critical, you can always install additional desktops later. Switching to
another distribution can be a hassle; the software will be basically the same, but all those personal configurations
need to be migrated, and many things will work slightly differently -- expect some headaches.
Better choose once and stick with it.
Download the ISO image for the distribution of your choice, e.g. linuxmint-21.2-mate-64bit.iso
32 or 64 bit: today 64-bit is the only sensible choice for practically everyone.
Allmost all current desktop computers and notebooks work in 64-bit mode.
The main difference is memory addressing. A 32-bit pointer can address 2^32 = 4294967296 bytes
of memory i.e. 4 GB, a serious limitation when PCs today come with 8 GB of RAM or more.
VirtualBox Installation:
You can install Linux inside a virtual machine such as VirtualBox.
Download the software from
virtualbox.org
(in Windows you probably also need the Visual C++ runtime),
then use the Linux ISO file when you create
the VM for you Linux system; the new version 7 of virtualbox makes this quite easy and intuitive.
There will an impact on performance, but you do not need to worry about disturbing your existing operating
system (the 'host' system in VM terms), and you can
run both systems at the same time. Setup is also much easier, just start the VM with the ISO file mounted
(e.g. as 'optical drive'), and then run the Linux installer.
Especially for trying things out, comparing various distributions, and
getting comfortable with the Linux system the VM is certainly a
sensible option. However, at some point you will probably become unhappy with the slower performance,
and you will want to switch to dual-boot.
Dual-boot Installation:
- If you already have another operating system
installed then that partition will have to be
resized to make room for the new Linux partition. On Windows you can use "Disk Management"
to shrink the C: partition (enter "disk m" in the search bar). Make it about 50 GB of free space.
- Put the ISO image file on a USB stick with Etcher,
or Unetbootin, or a similar tool.
The stick needs to be at least 4 GB (for Mint).
- Reboot and watch the screen during startup:
You need to press a key like Del, F12, or Esc to get to the BIOS/Startup/Setup menu and
boot from the stick. You want to select the USB stick as a temporary startup device; often there is an option
just for that.
- The system boots from the stick, and you can now try the live system. See if everything works: sound, WiFi,
Bluetooth, ...
- Start the installer. It should
guide you through the whole process.
☢ A little thinking is still needed: if you already
have another operating system on your PC which you want to keep then you do NOT want to use the entire
disk for Linux. Instead, choose 'install alongside' (in the Mint installer), or define the partitions yourself.
If you do not intend a dual-boot setup the installation will still use the boot manager. However, you can
set the grub timeout to something like 2 seconds to speed up the startup. Do not set it to 0, you
want to be able to go into rescue mode (without a USB stick containing the live system, which is always an option).
Partitions: For a very basic desktop system you only need one partition mounted as / (root) with
a minimum size of about 20 GB (Mint).
Unless your hard disk or SSD space is severly limited about 50 GB (at least) is a more reasonable choice.
Swap space: The installer suggests to create a swap partion, but for a desktop installation
with a single Linux system it is more flexible to use a swap file which can be set up later. If you plan on installing
other Linux version as well then a separate /swap partition can be shared among all of them to save some space.
/home partition: by default /home is in the system partition so the disk space can be shared among
system and user data; however, there are advantages to having a separate /home partition, such as
easily keeping your data and user settings when you upgrade to a new major release of your distribution,
and installing and using more than one Linux system. In both cases there is a chance of incompatible user settings.
Encrypt home directory: usually, but not always, a good idea.
Notebooks: very much recommended - they can get lost or stolen easily.
Obviously it means using a sufficiently strong login password. See
section Tools/Keyring for details.
A strong password should be at least 12 characters long
and must not be based on dictionary words. Use upper case, lower case, and digits, but do not
substitute digits for letters: good old simple passwords like netw0rk or g0ldf1sh
can be cracked almost instantly nowadays.
Desktop PC: they tend to be at a much lower risk of theft or loss compared to laptops,
and home directory encryption comes with some downsides:
- If things go wrong recovery can be a nightmare. You cannot simply mount the disk on another Linux system, as you
can with an unencrypted file system.
- ecryptfs limits the number of characters in filenames to about 140 vs 255 in ext4. This can cause
problems for some applications.
For these reasons many people opt for other encryption solutions on their desktop, such as using it only on a
particular folder or virtual volume. Tools are e.g. gocryptfs (apt-get install gocryptfs) and
veracrypt.
Boot Menu: After the installation is finished you will see the boot manager taking over at startup.
It shows the boot menu and allows you to choose an operating system for this session.
After installation the boot manager defaults to the new Linux system. Once the system is up and you are logged in
the default can be changed with
sudo grub-set-default n
in a terminal window, where n is the number of the entry you see in the boot menu
(index origin zero!).
New Release: this depends on your distribution; in Linux Mint there are
new major versions every other year or so.
There are also three point releases (or minor releases).
A release is identified by major and minor number and a name, e.g. Linux Mint 19.3 Tricia.
Releases are supported for five years, e.g. Mint 19 was released
in 2018 and supported until (end of) 2023.
Transitions between minor releases such as from 19.2 to 19.3 tend to cause no problems and can be done
with the Update Manager: Refresh, then look in Edit.
Going to a new major release is usually not
painless.
There are several options, and the choice is tricky.
Whatever you opt for, backup your home directory first; also make a list of your installed packages.
The Backup Tool (in the Administration menu) helps with that; however, it creates a single (possibly huge)
tar file. Another simple backup of your home
directory is e.g. cp -rp /home/myuser /media/myuser/somedrive. This gives you a copy on your external
drive with all the files and directories, ready to work with.
The following are feasible options, in order of probable usefulness, depending on your situation:
- Make a fresh install and keep your user settings, e.g. in a separate /home partition or backup and restore your whole
home directory, including dot files i.e. configuration files and directories starting with a '.'.
There may be some (or many) incompatible settings.
- Make a fresh install and restore your home directory without the config files. This means you will have to do a lot
of configuration in the new system, but once you are finished everything works as intended for the new release.
- Use the mintupgrade tool. apt install mintupgrade, then sudo mintupgrade.
It may work flawlessly, or not at all, or somewhere in between.
Here there be dragons.
Additional Packages: When your system is up and running you may want to start your Software Manager
and install some additional packages. There are thousands of packages, here are only a few suggestions
(some of them may be part of your system already, depending on your distro):
- Wesnoth: a rather challenging turn-based strategy game
- Gnome-Mahjongg: solitaire version
of the ancient Asian game
- Stellarium: astronomy software showing realistic
and beautiful views of the night sky
- Vim, vi improved, the editor of choice for the serious Linux user
- VLC media player
- Firefox - one browser to rule them all
- LibreOffice: the free office suite
- Evolution: mail, calendar, groupware
- Thunderbird: another popular Email client
- r-base, the Gnu R statistical computation and graphics system
- GIMP, the Gnu Image Manipulation Program
- Blender 3D modeling software
- Krita painting app - unleash your inner Picasso
Running software for other operating systems
Within your Linux session you sometimes want to run software from other
operating systems.
Participate!
Even if your programming skills are not top-notch, and you don't have the skill or time
to write documentation and tutorials,
you can still take part in the Open Source movement with moderate effort:
- Submit bug reports and help making applications better
- Spread the word and get people away from proprietary systems to free and open
solutions
- Simply set an example just by using Open Source software