Overview and Basic Concepts of Relational Database Systems
Some Definitions
Data: from Latin given; given facts, from which additional facts
can be inferred (= response to a user query)
given fact: true proposition in logic
Database: collection of true propositions
Relational approach:

Data is represented by rows in tables, which can be interpreted as
true propositions

Operators are provided for operating on rows in tables, providing
additional true propositions from the existing ones
Entity: any object about which we wish to record information
Relationship: linking entities (can themselves be interpreted as a type of entity)
Properties: used to record information about entities: numbers, dates, strings
DBMS: Database Management System; software that implements DB concepts, e.g. Postgres, Mysql, SQLite
Advantages of relational Databases
over applicationspecific files such as spreadsheets:
 Centralized control over data
 Data can be shared
 Redundancy can be reduced
 Inconsistencies can be avoided
 Integrity constraints
 Transaction support
 Security constraints
 Data independence
immunity of applications to change in physical representation and access technique
 Standards: ISO SQL 2016.
Much effort since the late 1980s. However,
 DBMS manufacturers aim to lockin their customers by providing many nonstandard
convenience extensions. Once some of these are in use, it is virtually
impossible for a customer to switch to another DBMS.
 Even without extensions there are subtle differences in implementation
that are very hard to spot, again ensuring lockin.
Information Principle
The entire information content of the database is represented in only one way, explicit values in column positions in table rows.
The Relational Model
relation = math. for table, NOT from ERmodel
A relation has two parts:
 heading: set of attributes A:T with name A and type T
denotes a predicate (truthvalued function)
 body: set of tuples A:V with attribute A and value V
i.e. a set of tuples that conforms to the heading
each tuple denotes a true proposition, i.e. instantiates the predicate
When the context is clear the attribute names can be ommitted, e.g.
the (unnamed) tuples of relation Age are { (John, 30), (Jane, 25) }
Relations are sets of tuples:
→ there is no preferred order for the tuples, and
→ there is no point in stating a fact more than once.
Relational Algebra: a brief Overview
Operations on one or more relation always result in a relation (not some other kind of object).

Restrict A: all attributes of tuples in A satisfying a condition

Project A: specified attributes of all tuples in A
remove
duplicates!

Product A,B: all combinations of tuples in A and B

Union A,B: all tuples appearing in either or both
remove
duplicates!

Difference A,B: all tuples in A but not in B

Intersect A,B: all tuples appearing in both

Join: A,B: all combinations with common value(s) for common attribute(s)
Emp = { (Smith, Accounting), (Jones, Accounting), (Brown, Sales) }
Bonus = { (Accounting, 300), (Sales, 400), (Marketing, 0) }
Join Emp, Bonus = { (Smith, Accounting, 300), (Jones, Accounting, 300), (Brown, Sales, 400) }

Divide A,B,C: tuples in A that match via B with all tuples in C
Doctor = { (Semmelweiss), (Billroth), (Paracelsus) }
Treatment = { (Billroth, Smith), (Billroth, Jones), (Billroth, Brown), (Semmelweiss, Smith), (Paracelsus, Luther) }
Sick = { (Smith), (Brown) }
Divide Doctor, Treatment, Sick = { (Billroth) }
Interpretation e.g. doctors who treat all patients
in the supplierparts example: suppliers (relation S) that supply (SP) all parts (P)
History
The Structured Query Language is the standard for data query and manipulation
in relational database systems. It implements many (but not all) concepts
of the relational model.

early 1970s: Donald Chamberlin and Raymond Boyce develop SEQUEL as query
language, later renamed to SQL (Structured Query Language)
 based on Edgar Codd's Relational Model
 declarative rather than procedural as in earlier navigational DB systems
(network and hierarchical)

System R (for relational) as first implementation, IBM Research in early
1970s

proof of concept that automatic query optimizer can produce decent
performance for relational system
 1983 DB2 as commercial product
 1986 Postgres
 1995 Mysql
 2000 SQLite
Table vs. Relation
The basic concept of the SQL implementation is the table. When going from
the relation model to the SQL implementation relations become tables.
An SQL table is similar to a relation, with some important differences:
 Doubles:
 A relation contains a set of tuples, which means that by definition
there are no doubles.
 An SQL table and the result of an SQL query can contain several rows with identical values.
 Order:
 Since there is no order defined for the elements of a set, any order is valid
when listing the tuples of a relation.
 Results of SELECT statements can be ordered at will with the ORDER BY clause.
Data Types
 The relational model is not concerned with technical implementation.
 SQL DBMS must know in great detail the type of data stored in tables.
Numbers:

precision: number of significant digits

scale: position of the least significant digit to the right of the
decimal

e.g. 1234.56 has precision 6, scale 2, defined as NUMERIC(6,2)
SQL92 standard numeric data types:

INT (precision)

exact numeric type

scale 0

NUMERIC (precision, scale)

exact numeric type

i.e. rounding and truncating dictated by predefined formula

decimal portion exactly scale

DECIMAL (precision, scale)

exact numeric type

decimal portion at least scale

REAL

approximate numeric type

binary precision based on manufacturer limit

default precision must be less than DOUBLE PRECISION

DOUBLE PRECISION

approximate numeric type

binary precision based on manufacturer limit

default precision must be greater than REAL

FLOAT (precision)

approximate numeric type

DOUBLE with given precision
Postgres: DECIMAL(p, s) and NUMERIC(p, s) equivalent.
Other Types:

CHAR (size)
 fixed length
 unused space empty
 offered better performance than VARCHAR in early implementations

VARCHAR (size)
 variable length up to size
 no unused space
 in current implementations usually no performance disadvantage compared to CHAR

DATE
 Various input formats, e.g. '20120401'
 Calendar functions such as length of time, weekday

BOOLEAN