Functional Dependencies

If for any given value of x there is only one value for y, or none, then y is functionally dependent on x.

In other words, a given value for x points to a certain value for y. This is written as

x → y

Note that different values of x can point to the same value y; but a given value of x cannot point to more than one value for y.

In the following table some data about suppliers is stored:

ID Name City Status
1 Smith New York 3
2 Jones New York 3
3 Black Los Angeles 3
4 Brown Denver 2
5 White Denver 2
6 Green Salem 1

For the table above we have ID → NAME, STATUS, CITY and CITY → STATUS

When we look at functional dependencies, we distinguish between:

We are usually interested in (b).

Functional dependencies are important for practical applications. A good table design helps to automatically enforce functional dependencies.

Deriving Functional Dependencies

Give a set of functional dependencies we can derive further FDs that must also hold, e.g. given

we can derive (among others):

Trivial Dependencies

Trivial Dependencies: e.g. ID → ID

A FD is trivial if the right-hand side is a subset of the left-hand side.

Armstrong Axioms

These axioms give us a set of rules to manipulate functional dependencies.

Let A, B, C arbitrary subsets of attributes of relation R, and denote AB the union of A and B.

First rule = definition of trivial FD

Rules are complete: all implied FDs can be derived

Rules are sound: no additional FDs (not implied by S) can be so derived

Several more rules can be derived to simplify the practical task of manipulating FDs:

Example: A → BC, B → E, CD → EF (e.g. A EMPID, B DEPID, C MGR, D PROJID, E DEPNAME, F PERCTIME)

Show: AD → F

In other words, show that in this application the percentage of time worked on a project is determined by employee and project.

The functional dependency AD → F can be derived with a suitable choice of operations.

EXERCISE: bc → a, de → c, a → bcde

Show that bc → d

Integrity

Ensure that the data is correct.

Constraints

These rules restrict the data that can be stored. If an attempt is made to violate such a rule, the DB systems responds with an error message.

State Constraints

These type of constraints are widely supported by relational DB systems.

Relation Predicate: logical AND of all relation constraints

Golden Rule: no update operation may leave any relation in a state violating its predicate

Transition Constraints

These type of constraints are typically not supported by the DB system out of the box, and have to be implemented by application logic i.e. we must program them.

Example: relationship status

Keys

When functional dependencies have been identified the logical next step is to define keys for tables.

Candidate Key

A candidate key K for a given relation is a set of attributes with

Every relation has at least one candidate key: if we cannot find any subset of attributes that form a candidate key we can use the complete set of attributes.

Note that this does not hold for tables, since a table can contain multiple identical tuples.

In case of more than one candidate key: choose one as primary key, the others are alternate keys

Superkey

A superkey is unique, but not necessarily irreducible

Foreign Key

A foreign key is a set of attributes F in relation B, where

The value of F is a reference to the tuple containing C.

Example: value of ORDR.CUST must be equal to an existing value of CUST.ID

Referential Integrity

Ensure valid foreign key values. Default: record cannot be deleted while there are still references.

SQL Facilities

Example: Supplier and Parts

We want to keep track of our suppliers and record which supplier deliver which parts. We put the data about the suppliers in table S and the parts data in table P. Since we get certain parts from certain suppliers, we record that data in another table SP.

Again, to make sure that all the code in this notebook can be run repeatedly from top to bottom we drop the table if they already exist:

Now we create out supplier table:

The table exists and is empty. We fill it with some data.

To make sure that this actually worked we generate a list of the records in the table.

In the same fashion we create a table for some parts. Again the ID identifies the part, therefore we define it as the primary key.

Again, insert some data and check the contents of the table:

In this situation, we get parts from various suppliers. Not all suppliers deliver all parts. The following table allows us to define every possible situation:

Similar for the parts; if a part is delivered by no supplier, it has no entry in the SP table.

We want to ensure referential integrity:

Because of the REFERENCES definitions the DB will enforce these constraints.

Answer the following questions by looking at the data.

Which of these questions could be answered easily by using an operation from relational algebra?

The beauty of Jupyter notebooks is that they support experimentation in a straight-forward fashion. Feel free to create more tables for subjects of your interest, and see what kind of information they can store.

Views

A view is a named relational expression, from a practical point very similar to a defined function without parameters in programming languages.

The view can be used in select statements just like a table:

Advantages of Views

Multi-user DB systems typically let us define permissions for individual tables and views, which allows us to implement various schemes for security and user roles; views come in very handy in such an environment.

Logical data independence:

Immunity of users and programs to changes in the logical structure of the database (conceptual level)

Catalog Tables

Since everything in a relational DB system is stored in tables, the data on which users and tables exist is also stored in tables: the catalog tables.

With the help of these meta-tables the DB system allows for changing the structure of user tables, even when they already contain data, e.g. to add a column:

ALTER TABLE table_name ADD COLUMN column_definition ;

SQLite takes a fairly unusual approach to this topic by storing the SQL statement that created the object, rather than the structural information on columns and their data types. Because of this, the ALTER TABLE command is severly limited in SQLite.

For comparison, Postgres uses a separate catalog table to store information on the columns of user tables. This allows for the full range of ALTER TABLE commands, and makes it easy to e.g. query the names of all numeric columns:

Postgres:

SELECT table_name,column_name FROM information_schema.columns WHERE data_type = 'numeric';

table_name column_name
p weight
s status
sp qty

EXERCISE: