Relational Database Systems

Some Definitions

Data: from Latin given; given facts, from which additional facts can be inferred (= response to a user query)

Given fact: true proposition in logic

Database: collection of true propositions

Relational approach:

  • Data is represented by rows in tables, which can be interpreted as true propositions

  • Operators are provided for operating on rows in tables, providing additional true propositions from the existing ones

Entity: any object about which we wish to record information

Relationship: linking entities (can themselves be interpreted as a type of entity)

Properties: used to record information about entities: numbers, dates, strings

Components of Database Systems

  • Data - one of the enterprise's most valuable assets; should be:

    • integrated
    • shared
    • persistent
  • Hardware; should have

    • lots of memory
    • many cores
    • fast external media, such as SSD and high-performance HD
  • Software

    • DBMS database management system
      • Query language processor
    • Application programs
  • People

    • Application programmers
    • Database administrator
      • keep the system operational
      • security policy -- who can perform which operations on what data under which circumstances
      • technical implementation of policy
    • End users
      • The necessary evil
      • Admin heaven: 'System is 100% operational with 0 users.'

Advantages of databases

over application-specific files such as spreadsheets:

  • Centralized control over data
  • Data can be shared
  • Redundancy can be reduced
  • Inconsistencies can be avoided
  • Transaction support
  • Integrity constraints
  • Security constraints
  • Data independence, immunity of applications to change in physical representation and access technique

Standards

ISO SQL 2016 and later

  • Much effort since the late 1980s. However,
  • DBMS manufacturers aim to lock-in their customers by providing many non-standard convenience extensions.
  • Once some of these are in use, it is virtually impossible for a customer to switch to another DBMS.
  • Even without extensions there are subtle differences in implementation that are very hard to spot, again ensuring lock-in.

Standards are important efforts, but

  • the practical reality is that the choice of a particular DBMS is a long-term one
  • switching to a different DBMS is costly and risky, and will be avoided if at all possible

Information Principle

The entire information content of the database is represented in only one way, explicit values in column positions in table rows.

DBMS provides for

  • Data definition, DDL (create table)
  • Data manipulation, DML (select, insert, update, delete)
  • Optimization, esp. query optimizer
  • Checks for integrity constraints and security (access control)
  • Recovery and concurrency controls (transaction manager)
  • Data dictionary (data about data, catalog)

Client/Server

  • Server: DBMS
  • Clients: applications run on top of DBMS
  • allows for distributed processing: run on different machines

Utilities

  • Backup/Restore
  • Reorganization (performance, reclaim unused space,..)
  • Statistics (for query optimisation)

The Relational Model

Relational Databases are based on the relational model. Objects and operations defined in this theoretical approach are found in a similar form in practical implementations i.e. working DB management systems.

relation = math. for table, NOT from ER-model

A relation has two parts:

  • heading: set of attributes A:T with name A and type T

denotes a predicate (truth-valued function)

  • body: set of tuples A:V with attribute A and value V i.e. a set of tuples that conforms to the heading

each tuple denotes a true proposition, i.e. instantiates the predicate

When the context is clear the attribute names can be ommitted, e.g. the (unnamed) tuples of relation Age are { (John, 30), (Jane, 25) }

Relations are sets of tuples:

  • there is no preferred order for the tuples, and
  • there is no point in stating a fact more than once.

Relational Algebra: a brief Overview

Operations on one or more relation always result in a relation (not some other kind of object).

  • Restrict A: tuples in A satisfying a condition

  • Project A: specified attributes of tuples in A

    remove duplicates!

  • Product A,B: all combinations of tuples in A and B

  • Union A,B: all tuples appearing in either or both

    remove duplicates!

  • Difference A,B: all tuples in A but not in B

  • Intersect A,B: all tuples appearing in both

  • Join: A,B: all combinations with common value(s) for common attribute(s)

    Example:

    • Emp = { (Smith, Accounting), (Jones, Accounting), (Brown, Sales) }

    • Bonus = { (Accounting, 300), (Sales, 400), (Marketing, 0) }

    • Join Emp, Bonus = { (Smith, Accounting, 300), (Jones, Accounting, 300), (Brown, Sales, 400) }

  • Divide A,B,C: tuples in A that match via B with all tuples in C

    Example:

    • Doctor = { (Semmelweiss), (Billroth), (Paracelsus) }

    • Treatment = { (Billroth, Smith), (Billroth, Jones), (Billroth, Brown), (Semmelweiss, Smith), (Paracelsus, Luther) }

    • Sick = { (Smith), (Brown) }

    • Divide Doctor, Treatment, Sick = { (Billroth) }

    Interpretation: doctors who treat all patients

    Supplier-parts example: suppliers (relation S) that supply (SP) all parts (P)

SQL

The Structured Query Language is the standard for data query and manipulation in relational database systems. It implements many (but not all) concepts of the relational model.

History

  • early 1970s: Donald Chamberlin and Raymond Boyce develop SEQUEL as query language, later renamed to SQL (Structured Query Language)
  • based on Codd's Relational Model
  • declarative rather than procedural as in earlier navigational DB systems (network and hierarchical)
  • System R (for relational) as first implementation, IBM Research in early 1970s
  • proof of concept that an automatic query optimizer can produce decent performance for a relational system (not at all clear at the time)
  • DB2 in 1983 as commercial product
  • ANSI Standard 1986, ISO 1987
    • Major revision SQL92 (SQL2)
    • SQL99 (SQL3)
    • SQL 2003/2006/2008/...

Table vs. Relation

The basic concept of the SQL implementation is the table. When going from the relation model to the SQL implementation relations become tables.

An SQL table is similar to a relation, with some important differences:

  • Doubles:

    • A relation contains a set of tuples, which means that by definition there are no doubles.
    • An SQL table and the result of an SQL query can contain several rows with identical values.
  • Order:

    • Since there is no order defined for the elements of a set, any order is valid when listing the tuples of a relation.
    • Results of SQL SELECT statements can be ordered at will with the ORDER BY clause.

Data Types

  • The relational model is not concerned with technical implementation.
  • SQL DBMS must know in great detail the type of data stored in tables.

Numbers:

  • precision: number of significant digits
  • scale: position of the least significant digit to the right of the decimal e.g. 1234.56 has precision 6, scale 2, defined as NUMERIC(6,2)

SQL92 standard numeric data types:

  • INT (precision)

    • exact numeric type
    • scale 0
  • NUMERIC (precision, scale)

    • exact numeric type i.e. rounding and truncating dictated by predefined formula
    • decimal portion exactly scale
  • DECIMAL (precision, scale)

    • exact numeric type
    • decimal portion at least scale
  • REAL

    • approximate numeric type
    • binary precision based on manufacturer limit
    • default precision must be less than DOUBLE PRECISION
  • DOUBLE PRECISION

    • approximate numeric type
    • binary precision based on manufacturer limit
    • default precision must be greater than REAL
  • FLOAT (precision)

    • approximate numeric type
    • DOUBLE with given precision

Other Types:

  • CHAR (size)

    • fixed length
    • unused space empty
  • VARCHAR (size)

    • variable length up to size
    • no unused space
    • without size: not in SQL92 standard, but widely supported
  • DATE

    • Various input formats, some manufacturer depdendent, e.g. '2012-04-01'
    • Calendar functions such as length of time, weekday
  • BOOLEAN

    • two values, true or false

These are standard definitions. Implementations vary considerably, e.g.

  • Postgres: DECIMAL(p, s) and NUMERIC(p, s) are equivalent
  • SQLite: CHAR and VARCHAR are both converted to TEXT with default max length of 1,000,000,000

Time for studying the documentation of your DBMS is well spent. Otherwise, subtle errors will haunt you in years to come.

A desirable (but widely unsupported) Feature

define restrictions for valid operations, e.g. certain operations with part weights and quantities make sense, while others do not:

  • Valid: P.WEIGHT * SP.QTY
  • Invalid: P.WEIGHT + SP.QTY

SQL92: named types in terms of primitive types, no operation restrictions

Postgres:

  • create domain ssn char(10);
  • create table emp (id ssn not null, name char(20) not null, primary key(id));
  • insert into emp (id, name) values ('846312041972', 'Jones');
  • ERROR: value too long for type character(10)

When the packages are installed we need to load the SQL extensions into the current notebook:

In [37]:
 %load_ext sql
The sql extension is already loaded. To reload it, use:
  %reload_ext sql

This allows us to use SQL commands in code cells. We need to start the cell with

  • %sql for a single line of code
  • %%sql for multiple lines/statements

First we connect to a database. We choose SQLite for convenience.

  • SQLite is free and Open Source, and very easy to install and use.
  • A file in the current directory stores our tables.
  • The current directory is the one where you started your notebook; probably your home directory.
In [38]:
%sql sqlite:///sp.db
Out[38]:
'Connected: @sp.db'

We ensure that foreign keys are enabled. This is specific to SQLite.

In [39]:
%%sql

PRAGMA foreign_keys = ON;
 * sqlite:///sp.db
Done.
Out[39]:
[]

Now we are ready to create tables. However, since we probably want to run this code multiple times, and a table cannot be created with the same name twice, we must first make sure that existing tables are removed:

EXERCISE: The Birding Club

You local birding club decides to finally put some structure into all those reports from members and enter them into a database. The following things are reported by the birders:

  • common name of the bird
  • date of observation
  • location
  • type of terrain
  • nickname of birder

The birders are fairly learned in the avian ways, and the common names they use identify the species. Location and nickname also identify their entities perfectly (e.g., no two birders use the same nickname). Birders always report the same type of terrain for a given location.

Basic SQL is quite intuitive. By looking at the following statement we can easily see what it does:

  • a table named BIRDING is created
  • for each column we specify a name and a data type
  • the list of columns is enclosed in parenthesis
  • a semicolon terminates the statement

The statement could be written on a singe line, but it is more readable this way. Layout matters not to the computer, but to the human who may be forced to read and understand our statements at some point in the future.

Why do we first drop the table, if it exists? Due to the nature of Jupyter notebooks it is much more convenient to ensure that all code in a notebook can run from start to end repeatedly. It is possible to hand-pick the order of execution by running cells individually, but this gets out of hand very quickly.

In [40]:
%%sql

drop table if exists BIRDING;
create table BIRDING (
  BIRDNAME varchar,
  OBSDATE date,
  LOCATION varchar,
  TERRAIN varchar,
  NICKNAME varchar);
 * sqlite:///sp.db
Done.
Done.
Out[40]:
[]

Now that we have the table structure, we can fill it with some data:

In [41]:
%%sql

insert into BIRDING values ('House sparrow', '2018-06-12', 'Cold Striffen', 'Garden', 'jsmith');
insert into BIRDING values ('Tree sparrow', '2018-06-12', 'Cold Striffen', 'Garden', 'jsmith');
insert into BIRDING values ('Blackcap', '2018-06-14', 'Sherwood Forest', 'Woodland', 'mjones');
insert into BIRDING values ('European robin', '2018-07-04', 'Little Winging', 'Garden', 'gmorgan');
insert into BIRDING values ('Common gull', '2017-06-01', 'Cliffs of Moher', 'Beach', 'pmurphy');
insert into BIRDING values ('Green woodpecker', '2018-06-15', 'Sherwood Forest', 'Woodland', 'mjones');
 * sqlite:///sp.db
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
Out[41]:
[]

To check if everything worked the way we expected, we generate a listing:

In [42]:
%%sql

select * from BIRDING;
 * sqlite:///sp.db
Done.
Out[42]:
BIRDNAME OBSDATE LOCATION TERRAIN NICKNAME
House sparrow 2018-06-12 Cold Striffen Garden jsmith
Tree sparrow 2018-06-12 Cold Striffen Garden jsmith
Blackcap 2018-06-14 Sherwood Forest Woodland mjones
European robin 2018-07-04 Little Winging Garden gmorgan
Common gull 2017-06-01 Cliffs of Moher Beach pmurphy
Green woodpecker 2018-06-15 Sherwood Forest Woodland mjones

Select statements can use conditions on records to be displayed, and state the columns, e.g.

In [43]:
%%sql

select location, nickname from birding where obsdate < '2018-01-01';
 * sqlite:///sp.db
Done.
Out[43]:
LOCATION NICKNAME
Cliffs of Moher pmurphy

Get the SQL code for the BIRDING table to run in your own Jupyter notebook in the lab.

If you want a little more challenge, install the software on your own computer. You will probably need the following packages:

  • sqlite3
  • jupyter-notebook
  • ipython-sql

If you are using Linux (and why would you not??) then enter the following on the command line:

  • sudo apt install sqlite3
  • sudo apt install jupyter-notebook
  • sudo pip3 install ipython-sql

    or, if you are using a Conda distribution of Python then instead of pip3

  • sudo conda install ipython-sql

EXERCISE:

Your local chess community is organized into several clubs; each club has a name and a home location where practise games are offered, usually once a week, but subject to change. One function of a database solution would be to simply announce the various training evenings for the clubs, and their location. Another very useful feature would be a list of participants for each event, so that furniture and equipment can be prepared in suitable number. For each evenings one of the clubs is hosting; but members from other clubs are also welcome.

Fill in information as you see fit to make this more interesting. Create the tables and use some select statements for useful listings.

Functional Dependencies

If for any given value of x there is only one value for y, or none, then y is functionally dependent on x.

In other words, a given value for x points to a certain value for y. This is written as

x → y

Note that different values of x can point to the same value y; but a given value of x cannot point to more than one value for y.

In the following table some data about suppliers is stored:

  • each suppliers is identified by an ID
  • a supplier is located in a city
  • the status is an attribute of the supplier; however, it is actually a size code for the city
SID Name City Status
S1 Smith New York 30
S2 Jones New York 30
S3 Black Los Angeles 30
S4 Brown Denver 20
S5 White Denver 20
S6 Green Salem 10

For the table above we have SID → NAME, STATUS, CITY and CITY → STATUS

When we look at functional dependencies, we distinguish between:

  • (a) the value of a relation or table at some point in time
  • (b) the set of all possible values of a relation

We are usually interested in (b).

Functional dependencies are important for practical applications. A good table design helps to automatically enforce functional dependencies.

Deriving Functional Dependencies

Give a set of functional dependencies we can derive further FDs that must also hold, e.g. given

  • SID → NAME, STATUS, CITY

we can derive (among others):

  • SID → NAME
  • SID → NAME, STATUS
  • SID → STATUS, CITY

Trivial Dependencies

Trivial Dependencies: e.g. SID → SID

A FD is trivial if the right-hand side is a subset of the left-hand side.

Armstrong Axioms

These axioms give us a set of rules to manipulate functional dependencies.

Let A, B, C arbitrary subsets of attributes of relation R, and denote AB the union of A and B.

  • Reflexivity: if B is a subset of A, then A → B
  • Augmentation: if A → B, then AC → BC
  • Transitivity: if A → B and B → C, then A → C

First rule = definition of trivial FD

Rules are complete: all implied FDs can be derived

Rules are sound: no additional FDs (not implied by S) can be so derived

Several more rules can be derived to simplify the practical task of manipulating FDs:

  • Decomposition: if A → BC, then A → B and A → C

  • Union: if A → B and A → C, then A → BC

  • Composition: if A → B and C → D, then AC → BD

  • Self-determination: A → A

Example: A → BC, B → E, CD → EF (e.g. A EMPID, B DEPID, C MGR, D PROJID, E DEPNAME, F PERCTIME)

Show: AD → F

In other words, show that in this application the percentage of time worked on a project is determined by employee and project.

  • A → BC
  • A → C (decomposition)
  • AD → CD (augmentation)
  • CD → EF
  • AD → EF (transitivity)
  • AD → F (decomposition)

The functional dependency AD → F can be derived with a suitable choice of operations.

EXERCISE: bc → a, de → c, a → bcde

Show that bc → d

Integrity

Ensure that the data is correct.

Constraints

These rules restrict the data that can be stored. If an attempt is made to violate such a rule, the DB systems responds with an error message.

State Constraints

These type of constraints are widely supported by relational DB systems.

  • Type and values of individual Attributes, e.g.

    • STATUS >= 1 and STATUS <= 100
    • WEIGHT > 0.0
  • Constraints on a Relation

    • unique SID
    • no supplier with CITY = 'London' and STATUS != 20
  • Constraints on the whole Database, e.g.

    • no supplier with STATUS < 20 and QTY > 500

Relation Predicate: logical AND of all relation constraints

Golden Rule: no update operation may leave any relation in a state violating its predicate

Transition Constraints

These type of constraints are typically not supported by the DB system out of the box, and have to be implemented by application logic i.e. we must program them.

Example: relationship status

  • Valid transitions

    • Single to Married
    • Married to Divorced
    • Married to Widowed
  • Not valid

    • Single to Widowed
    • Single to Divorced
    • Widowed to Divorced

Keys

When functional dependencies have been identified the logical next step is to define keys for tables.

Candidate Key

A candidate key K for a given relation is a set of attributes with

  • Uniqueness: no distinct tuples with same value for K
  • Irreducibility: no subset of K has Uniqueness

Every relation has at least one candidate key: if we cannot find any subset of attributes that form a candidate key we can use the complete set of attributes.

Note that this does not hold for tables, since a table can contain multiple identical tuples.

In case of more than one candidate key: choose one as primary key, the others are alternate keys

Superkey

A superkey is unique, but not necessarily irreducible

Foreign Key

A foreign key is a set of attributes F in relation B, where

  • there is another relation A with candidate key C
  • each value of F in B is identical to the value of C in some tuple in A

The value of F is a reference to the tuple containing C.

Example: value of ORDR.CUST must be equal to an existing value of CUST.ID

Referential Integrity

Ensure valid foreign key values. Default: record cannot be deleted while there are still references.

SQL Facilities

  • PRIMARY KEY implies NOT NULL i.e. must have a value
  • UNIQUE can be NULL i.e. without value, not practical for identifiers

  • REFERENCES A(C)

    • foreign key, this column references column C in table A
    • optional: ON DELETE
      • RESTRICT to cases without matching tuples
      • CASCADE delete matching tuples
      • SET NULL not a contradiction to definition since NULL is not a value

Example: Supplier and Parts

We want to keep track of our suppliers and record which supplier deliver which parts. We put the data about the suppliers in table S and the parts data in table P. Since we get certain parts from certain suppliers, we record that data in another table SP.

Again, to make sure that all the code in this notebook can be run repeatedly from top to bottom we drop the table if they already exist:

In [44]:
%%sql

drop table if exists SP;
drop table if exists S;
drop table if exists P;
 * sqlite:///sp.db
Done.
Done.
Done.
Out[44]:
[]

Now we create out supplier table:

  • The supplier ID is the identifying attribute.
  • Given a value for ID we always find one supplier, or none; never more than one.
  • Because of the PRIMARY KEY definition the DB will ensure this constraint.
  • If we try to enter a supplier with an ID that already exists the DB will respond with an error message.
In [45]:
%%sql

create table S (
  SID    varchar primary key,
  NAME   varchar,
  STATUS numeric(5),
  CITY   varchar);
 * sqlite:///sp.db
Done.
Out[45]:
[]

The table exists and is empty. We fill it with some data.

In [46]:
%%sql

insert into S values ('S1', 'Smith', 30, 'New York');
insert into S values ('S2', 'Jones', 30, 'New York');
insert into S values ('S3', 'Black', 30, 'Los Angeles');
insert into S values ('S4', 'Brown', 20, 'Denver');
insert into S values ('S5', 'White', 20, 'Denver');
insert into S values ('S6', 'Green', 10, 'Salem');
 * sqlite:///sp.db
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
Out[46]:
[]

To make sure that this actually worked we generate a list of the records in the table.

In [47]:
%%sql

select * from S;
 * sqlite:///sp.db
Done.
Out[47]:
SID NAME STATUS CITY
S1 Smith 30 New York
S2 Jones 30 New York
S3 Black 30 Los Angeles
S4 Brown 20 Denver
S5 White 20 Denver
S6 Green 10 Salem

In the same fashion we create a table for some parts. Again the ID identifies the part, therefore we define it as the primary key.

In [48]:
%%sql

create table P (
  PID    varchar primary key,
  NAME   varchar,
  COLOR  varchar,
  WEIGHT numeric(5,1));
 * sqlite:///sp.db
Done.
Out[48]:
[]

Again, insert some data and check the contents of the table:

In [49]:
%%sql

insert into P values ('P1', 'Nail', 'Gray', 12);
insert into P values ('P2', 'Bolt', 'Black', 17);
insert into P values ('P3', 'Screw', 'Gray', 17);
insert into P values ('P4', 'Screw', 'Silver', 14);
insert into P values ('P5', 'Hub', 'Silver', 12);
insert into P values ('P6', 'Cog', 'Gray', 19);

select * from P;
 * sqlite:///sp.db
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
Done.
Out[49]:
PID NAME COLOR WEIGHT
P1 Nail Gray 12
P2 Bolt Black 17
P3 Screw Gray 17
P4 Screw Silver 14
P5 Hub Silver 12
P6 Cog Gray 19

In this situation, we get parts from various suppliers. Not all suppliers deliver all parts. The following table allows us to define every possible situation:

  • a supplier delivers all parts
  • a supplier delivers some parts
  • a supplier delivers no parts (no entry in the table)

Similar for the parts; if a part is delivered by no supplier, it has no entry in the SP table.

We want to ensure referential integrity:

  • Values for supplier ID must be present in table S, and
  • values for part ID must be present in table P.
  • Otherwise, the DB system will respond with an error message.

Because of the REFERENCES definitions the DB will enforce these constraints.

In [50]:
%%sql

create table SP (
  SID     int references S(SID),
  PID     int references P(PID),
  QTY     numeric(6));
 * sqlite:///sp.db
Done.
Out[50]:
[]
In [51]:
%%sql

insert into SP values ('S1', 'P1', 300);
insert into SP values ('S1', 'P2', 200);
insert into SP values ('S1', 'P3', 400);
insert into SP values ('S1', 'P4', 200);
insert into SP values ('S1', 'P5', 100);
insert into SP values ('S1', 'P6', 100);
insert into SP values ('S2', 'P1', 300);
insert into SP values ('S2', 'P2', 400);
insert into SP values ('S3', 'P2', 200);
insert into SP values ('S4', 'P2', 200);
insert into SP values ('S4', 'P4', 300);
insert into SP values ('S4', 'P5', 500);

select * from SP;
 * sqlite:///sp.db
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
Done.
Out[51]:
SID PID QTY
S1 P1 300
S1 P2 200
S1 P3 400
S1 P4 200
S1 P5 100
S1 P6 100
S2 P1 300
S2 P2 400
S3 P2 200
S4 P2 200
S4 P4 300
S4 P5 500

Answer the following questions by looking at the data.

  • Which suppliers deliver all parts?
  • Which suppliers deliver no parts?
  • Which parts are delivered by all suppliers?
  • Which parts are delivered by no suppliers?

Which of these questions could be answered easily by using an operation from relational algebra?

The beauty of Jupyter notebooks is that they support experimentation in a straight-forward fashion. Feel free to create more tables for subjects of your interest, and see what kind of information they can store.

Views

A view is a named relational expression, from a practical point very similar to a defined function without parameters in programming languages.

In [52]:
%%sql

drop view if exists GS;
create view GS as select sid, status, city from S where status > 1;
 * sqlite:///sp.db
Done.
Done.
Out[52]:
[]

The view can be used in select statements just like a table:

In [53]:
%%sql

select * from GS;
 * sqlite:///sp.db
Done.
Out[53]:
SID STATUS CITY
S1 30 New York
S2 30 New York
S3 30 Los Angeles
S4 20 Denver
S5 20 Denver
S6 10 Salem

Advantages of Views

  • Security: force access through view, hide data
  • Shorthand, macro
  • Different users see same data in different ways
  • Logical data independence

Multi-user DB systems typically let us define permissions for individual tables and views, which allows us to implement various schemes for security and user roles; views come in very handy in such an environment.

Logical data independence:

Immunity of users and programs to changes in the logical structure of the database (conceptual level)

  • Growth
    • new attributes
    • new relations
  • Restructuring
    • E.g. split relation in two and provide view instead
    • Select should work
    • Expect problems with updates

Catalog Tables

Since everything in a relational DB system is stored in tables, the data on which users and tables exist is also stored in tables: the catalog tables.

  • Metadata i.e. data about data
    • information about tables and their columns
    • optimizer information (indexes, physical storage structures, ..)
    • security constraints
  • This is also stored in database tables, the catalog tables
  • Therefore can be queried in the same form as user tables, using SQL

With the help of these meta-tables the DB system allows for changing the structure of user tables, even when they already contain data, e.g. to add a column:

ALTER TABLE table_name ADD COLUMN column_definition ;

SQLite takes a fairly unusual approach to this topic by storing the SQL statement that created the object, rather than the structural information on columns and their data types. Because of this, the ALTER TABLE command is severly limited in SQLite.

In [54]:
%%sql

select * from sqlite_master;
 * sqlite:///sp.db
Done.
Out[54]:
type name tbl_name rootpage sql
table BIRDING BIRDING 2 CREATE TABLE BIRDING (
BIRDNAME varchar,
OBSDATE date,
LOCATION varchar,
TERRAIN varchar,
NICKNAME varchar)
table S S 3 CREATE TABLE S (
SID varchar primary key,
NAME varchar,
STATUS numeric(5),
CITY varchar)
index sqlite_autoindex_S_1 S 4 None
table P P 5 CREATE TABLE P (
PID varchar primary key,
NAME varchar,
COLOR varchar,
WEIGHT numeric(5,1))
index sqlite_autoindex_P_1 P 6 None
table SP SP 7 CREATE TABLE SP (
SID int references S(SID),
PID int references P(PID),
QTY numeric(6))
view GS GS 0 CREATE VIEW GS as select sid, status, city from S where status > 1

For comparison, Postgres uses a separate catalog table to store information on the columns of user tables. This allows for the full range of ALTER TABLE commands, and makes it easy to e.g. query the names of all numeric columns:

Postgres:

SELECT table_name,column_name FROM information_schema.columns WHERE data_type = 'numeric';

table_name column_name
p weight
s status
sp qty

EXERCISE:

  • Create the S, P, and SP tables in your own notebook.
  • Experiment with various INSERT operations and test if the DB actually ensures all constraints.
  • Create some views
  • Create some more tables on a topic of your interest, and fill them with some data, such as
    • chess club: each club has a number of players, clubs meet on certain dates for tournaments, games player have winners and losers, how does this affect the rating of players
    • hunting trips: which animals can be shot when and where, and how much does it cost; which trips provide for certain elements, such as specific types of animals and season
    • classical music: which recordings exist of which pieces, written by which composer, who are the participating interpreters, with roles depending on the work
  • Create some views on your data that help with various aspects, such as user roles