Overview and Basic Concepts of Relational Database Systems

Some Definitions

Data: from Latin given; given facts, from which additional facts can be inferred (= response to a user query)
given fact: true proposition in logic

Database: collection of true propositions

Relational approach:

Data is represented by rows in tables, which can be interpreted as true propositions
Operators are provided for operating on rows in tables, providing additional true propositions from the existing ones

Entity: any object about which we wish to record information

Relationship: linking entities (can themselves be interpreted as a type of entity)

Properties: used to record information about entities: numbers, dates, strings

DBMS: Database Management System; software that implements DB concepts, e.g. Postgres, Mysql, SQLite

Advantages of relational Databases

over application-specific files such as spreadsheets:

Centralized control over data
Data can be shared
Redundancy can be reduced
Inconsistencies can be avoided
Integrity constraints
Transaction support
Security constraints
Data independence
immunity of applications to change in physical representation and access technique
Standards: ISO SQL 2016.
Much effort since the late 1980s. However,
- DBMS manufacturers aim to lock-in their customers by providing many non-standard convenience extensions. Once some of these are in use, it is virtually impossible for a customer to switch to another DBMS.
- Even without extensions there are subtle differences in implementation that are very hard to spot, again ensuring lock-in.

Information Principle

The entire information content of the database is represented in only one way, explicit values in column positions in table rows.

The Relational Model

relation = math. for table, NOT from ER-model

A relation has two parts:

heading: set of attributes A:T with name A and type T
denotes a predicate (truth-valued function)
body: set of tuples A:V with attribute A and value V
i.e. a set of tuples that conforms to the heading
each tuple denotes a true proposition, i.e. instantiates the predicate

When the context is clear the attribute names can be ommitted, e.g.
the (unnamed) tuples of relation Age are { (John, 30), (Jane, 25) }

Relations are sets of tuples:

→ there is no preferred order for the tuples, and
→ there is no point in stating a fact more than once.

Relational Algebra: a brief Overview

Operations on one or more relation always result in a relation (not some other kind of object).

Restrict A: all attributes of tuples in A satisfying a condition
Project A: specified attributes of all tuples in A
remove duplicates!
Product A,B: all combinations of tuples in A and B
Union A,B: all tuples appearing in either or both
remove duplicates!
Difference A,B: all tuples in A but not in B
Intersect A,B: all tuples appearing in both
Join: A,B: all combinations with common value(s) for common attribute(s)
Emp = { (Smith, Accounting), (Jones, Accounting), (Brown, Sales) }
Bonus = { (Accounting, 300), (Sales, 400), (Marketing, 0) }
Join Emp, Bonus = { (Smith, Accounting, 300), (Jones, Accounting, 300), (Brown, Sales, 400) }
Divide A,B,C: tuples in A that match via B with all tuples in C
Doctor = { (Semmelweiss), (Billroth), (Paracelsus) }
Treatment = { (Billroth, Smith), (Billroth, Jones), (Billroth, Brown), (Semmelweiss, Smith), (Paracelsus, Luther) }
Sick = { (Smith), (Brown) }
Divide Doctor, Treatment, Sick = { (Billroth) }
Interpretation e.g. doctors who treat all patients
in the supplier-parts example: suppliers (relation S) that supply (SP) all parts (P)

History

The Structured Query Language is the standard for data query and manipulation in relational database systems. It implements many (but not all) concepts of the relational model.

early 1970s: Donald Chamberlin and Raymond Boyce develop SEQUEL as query language, later renamed to SQL (Structured Query Language)
- based on Edgar Codd's Relational Model
- declarative rather than procedural as in earlier navigational DB systems (network and hierarchical)
System R (for relational) as first implementation, IBM Research in early 1970s
- proof of concept that automatic query optimizer can produce decent performance for relational system
1983 DB2 as commercial product
1986 Postgres
1995 Mysql
2000 SQLite

Table vs. Relation

The basic concept of the SQL implementation is the table. When going from the relation model to the SQL implementation relations become tables.

An SQL table is similar to a relation, with some important differences:

Doubles:
- A relation contains a set of tuples, which means that by definition there are no doubles.
- An SQL table and the result of an SQL query can contain several rows with identical values.
Order:
- Since there is no order defined for the elements of a set, any order is valid when listing the tuples of a relation.
- Results of SELECT statements can be ordered at will with the ORDER BY clause.

Data Types

The relational model is not concerned with technical implementation.
SQL DBMS must know in great detail the type of data stored in tables.

Numbers:

precision: number of significant digits
scale: position of the least significant digit to the right of the decimal
e.g. 1234.56 has precision 6, scale 2, defined as NUMERIC(6,2)

SQL92 standard numeric data types:

INT (precision)
- exact numeric type
- scale 0
NUMERIC (precision, scale)
- exact numeric type
- i.e. rounding and truncating dictated by predefined formula
- decimal portion exactly scale
DECIMAL (precision, scale)
- exact numeric type
- decimal portion at least scale
REAL
- approximate numeric type
- binary precision based on manufacturer limit
- default precision must be less than DOUBLE PRECISION
DOUBLE PRECISION
- approximate numeric type
- binary precision based on manufacturer limit
- default precision must be greater than REAL
FLOAT (precision)
- approximate numeric type
- DOUBLE with given precision

Postgres: DECIMAL(p, s) and NUMERIC(p, s) equivalent.

Other Types:

CHAR (size)
- fixed length
- unused space empty
- offered better performance than VARCHAR in early implementations
VARCHAR (size)
- variable length up to size
- no unused space
- in current implementations usually no performance disadvantage compared to CHAR
DATE
- Various input formats, e.g. '2012-04-01'
- Calendar functions such as length of time, weekday
BOOLEAN