Relational Database Systems: Basic Concepts

Some Definitions

Data: from Latin given; given facts, from which additional facts can be inferred (= response to a user query)

Given fact: true proposition in logic

Database: collection of true propositions

Relational approach:

Advantages of databases

over application-specific files such as spreadsheets:

Standards are important efforts, but

Information Principle

The entire information content of the database is represented in only one way, explicit values in column positions in table rows.

The Relational Model

Relational Databases are based on the relational model. Objects and operations defined in this theoretical approach are found in a similar form in practical implementations i.e. working DB management systems.

relation = math. for table, NOT from ER-model

A relation has two parts:

denotes a predicate (truth-valued function)

each tuple denotes a true proposition, i.e. instantiates the predicate

SQL

The Structured Query Language is the standard for data query and manipulation in relational database systems. It implements many (but not all) concepts of the relational model.

Table vs. Relation

The basic concept of the SQL implementation is the table. When going from the relation model to the SQL implementation relations become tables.

An SQL table is similar to a relation, with some important differences:

Data Types

Numbers:

SQL92 standard numeric data types:

Other Types:

These are standard definitions. Implementations vary considerably, e.g.

Time for studying the documentation of your DBMS is well spent. Otherwise, subtle errors will haunt you in years to come.

SQLite in Jupyter Notebook

SQLite is a simple but powerful DBMS that uses a single file in your current directory to store all data.

We load the SQL extensions into the current notebook:

This allows us to use SQL commands in code cells. We need to start the cell with

First we connect to a database. We choose SQLite for convenience.

We ensure that foreign keys are enabled. This is specific to SQLite.

Now we are ready to create tables. However, since we probably want to run this code multiple times, and a table cannot be created with the same name twice, we must first make sure that existing tables are removed:

EXERCISE: The Birding Club

You local birding club decides to finally put some structure into all those reports from members and enter them into a database. The following things are reported by the birders:

The birders are fairly learned in the avian ways, and the common names they use identify the species. Location and nickname also identify their entities perfectly (e.g., no two birders use the same nickname). Birders always report the same type of terrain for a given location.

Basic SQL is quite intuitive. By looking at the following statement we can easily see what it does:

The statement could be written on a singe line, but it is more readable this way. Layout matters not to the computer, but to the human who may be forced to read and understand our statements at some point in the future.

Why do we first drop the table, if it exists? Due to the nature of Jupyter notebooks it is much more convenient to ensure that all code in a notebook can run from start to end repeatedly. It is possible to hand-pick the order of execution by running cells individually, but this gets out of hand very quickly.

Now that we have the table structure, we can fill it with some data:

To check if everything worked the way we expected, we generate a listing:

Select statements can use conditions on records to be displayed, and state the columns, e.g.

Get the SQL code for the BIRDING table to run in your own Jupyter notebook.

If you want a little more challenge, install Jupyter on your own computer. You will probably need the following packages:

If you are using Linux (and why would you not??) then enter the following on the command line:

EXERCISE:

Your local chess community is organized into several clubs; each club has a name and a home location where practise games are offered, usually once a week, but subject to change. One function of a database solution would be to simply announce the various training evenings for the clubs, and their location. Another very useful feature would be a list of participants for each event, so that furniture and equipment can be prepared in suitable number. For each evenings one of the clubs is hosting; but members from other clubs are also welcome.

Fill in information as you see fit to make this more interesting. Create the tables and use some select statements for useful listings.