Overview and Basic Concepts of Relational Database Systems
Some Definitions
Data: from Latin given; given facts, from which additional facts
can be inferred (= response to a user query)
given fact: true proposition in logic
Database: collection of true propositions
Relational approach:
-
Data is represented by rows in tables, which can be interpreted as
true propositions
-
Operators are provided for operating on rows in tables, providing
additional true propositions from the existing ones
Entity: any object about which we wish to record information
Relationship: linking entities (can themselves be interpreted as a type of entity)
Properties: used to record information about entities: numbers, dates, strings
DBMS: Database Management System; software that implements DB concepts, e.g. Postgres, Mysql, SQLite
Advantages of relational Databases
over application-specific files such as spreadsheets:
- Centralized control over data
- Data can be shared
- Redundancy can be reduced
- Inconsistencies can be avoided
- Integrity constraints
- Transaction support
- Security constraints
- Data independence
immunity of applications to change in physical representation and access technique
- Standards: ISO SQL 2016.
Much effort since the late 1980s. However,
- DBMS manufacturers aim to lock-in their customers by providing many non-standard
convenience extensions. Once some of these are in use, it is virtually
impossible for a customer to switch to another DBMS.
- Even without extensions there are subtle differences in implementation
that are very hard to spot, again ensuring lock-in.
Information Principle
The entire information content of the database is represented in only one way, explicit values in column positions in table rows.
The Relational Model
relation = math. for table, NOT from ER-model
A relation has two parts:
- heading: set of attributes A:T with name A and type T
denotes a predicate (truth-valued function)
- body: set of tuples A:V with attribute A and value V
i.e. a set of tuples that conforms to the heading
each tuple denotes a true proposition, i.e. instantiates the predicate
When the context is clear the attribute names can be ommitted, e.g.
the (unnamed) tuples of relation Age are { (John, 30), (Jane, 25) }
Relations are sets of tuples:
→ there is no preferred order for the tuples, and
→ there is no point in stating a fact more than once.
Relational Algebra: a brief Overview
Operations on one or more relation always result in a relation (not some other kind of object).
-
Restrict A: all attributes of tuples in A satisfying a condition
-
Project A: specified attributes of all tuples in A
remove
duplicates!
-
Product A,B: all combinations of tuples in A and B
-
Union A,B: all tuples appearing in either or both
remove
duplicates!
-
Difference A,B: all tuples in A but not in B
-
Intersect A,B: all tuples appearing in both
-
Join: A,B: all combinations with common value(s) for common attribute(s)
Emp = { (Smith, Accounting), (Jones, Accounting), (Brown, Sales) }
Bonus = { (Accounting, 300), (Sales, 400), (Marketing, 0) }
Join Emp, Bonus = { (Smith, Accounting, 300), (Jones, Accounting, 300), (Brown, Sales, 400) }
-
Divide A,B,C: tuples in A that match via B with all tuples in C
Doctor = { (Semmelweiss), (Billroth), (Paracelsus) }
Treatment = { (Billroth, Smith), (Billroth, Jones), (Billroth, Brown), (Semmelweiss, Smith), (Paracelsus, Luther) }
Sick = { (Smith), (Brown) }
Divide Doctor, Treatment, Sick = { (Billroth) }
Interpretation e.g. doctors who treat all patients
in the supplier-parts example: suppliers (relation S) that supply (SP) all parts (P)
History
The Structured Query Language is the standard for data query and manipulation
in relational database systems. It implements many (but not all) concepts
of the relational model.
-
early 1970s: Donald Chamberlin and Raymond Boyce develop SEQUEL as query
language, later renamed to SQL (Structured Query Language)
- based on Edgar Codd's Relational Model
- declarative rather than procedural as in earlier navigational DB systems
(network and hierarchical)
-
System R (for relational) as first implementation, IBM Research in early
1970s
-
proof of concept that automatic query optimizer can produce decent
performance for relational system
- 1983 DB2 as commercial product
- 1986 Postgres
- 1995 Mysql
- 2000 SQLite
Table vs. Relation
The basic concept of the SQL implementation is the table. When going from
the relation model to the SQL implementation relations become tables.
An SQL table is similar to a relation, with some important differences:
- Doubles:
- A relation contains a set of tuples, which means that by definition
there are no doubles.
- An SQL table and the result of an SQL query can contain several rows with identical values.
- Order:
- Since there is no order defined for the elements of a set, any order is valid
when listing the tuples of a relation.
- Results of SELECT statements can be ordered at will with the ORDER BY clause.
Data Types
- The relational model is not concerned with technical implementation.
- SQL DBMS must know in great detail the type of data stored in tables.
Numbers:
-
precision: number of significant digits
-
scale: position of the least significant digit to the right of the
decimal
-
e.g. 1234.56 has precision 6, scale 2, defined as NUMERIC(6,2)
SQL92 standard numeric data types:
-
INT (precision)
-
exact numeric type
-
scale 0
-
NUMERIC (precision, scale)
-
exact numeric type
-
i.e. rounding and truncating dictated by predefined formula
-
decimal portion exactly scale
-
DECIMAL (precision, scale)
-
exact numeric type
-
decimal portion at least scale
-
REAL
-
approximate numeric type
-
binary precision based on manufacturer limit
-
default precision must be less than DOUBLE PRECISION
-
DOUBLE PRECISION
-
approximate numeric type
-
binary precision based on manufacturer limit
-
default precision must be greater than REAL
-
FLOAT (precision)
-
approximate numeric type
-
DOUBLE with given precision
Postgres: DECIMAL(p, s) and NUMERIC(p, s) equivalent.
Other Types:
-
CHAR (size)
- fixed length
- unused space empty
- offered better performance than VARCHAR in early implementations
-
VARCHAR (size)
- variable length up to size
- no unused space
- in current implementations usually no performance disadvantage compared to CHAR
-
DATE
- Various input formats, e.g. '2012-04-01'
- Calendar functions such as length of time, weekday
-
BOOLEAN