Normalisation

The process of converting a set of tables into a form that avoids certain anomalies (i.e. problems) is called normalisation. It involves decomposing tables into parts. Obviously, we do not want to introduce any errors in this process.

Nonloss Decomposition

E.g. decomposition of S {SID, STATUS, CITY} with SID → STATUS and SID → CITY

Heath's Theorem

Proof:

  1. No tuple lost:
  2. No spurious tuples:

Note that

Experiment with the theorem

Left-irreducible FDs: left-hand side is 'not too big'

E.g. relation SCP {SID, CITY, PID, QTY} with SID → CITY

1NF - First normal form

Every tuple contains exactly one value for each attribute.

Relations are by definition always in 1NF.

If a relation is only in 1NF we get Update anomalies

Example: lump S and SP together into FIRST {SID, STATUS, CITY, PID, QTY}
with {SID, PID} → QTY, SID → CITY, SID → STATUS,
and also assume CITY → STATUS

SQL:

create table FIRST as select sid, status, city, pid, qty from S natural join SP;
select * from FIRST;
  sid  | status |      city       |  pid   | qty 
-------+--------+-----------------+--------+-----
 S1    |     20 | London          | P1     | 300
 S1    |     20 | London          | P2     | 200
 S1    |     20 | London          | P3     | 400
 S1    |     20 | London          | P4     | 200
 S1    |     20 | London          | P5     | 100
 S1    |     20 | London          | P6     | 100
 S2    |     10 | Paris           | P1     | 300
 S2    |     10 | Paris           | P2     | 400
 S3    |     30 | Paris           | P2     | 200
 S4    |     20 | London          | P2     | 200
 S4    |     20 | London          | P4     | 300
 S4    |     20 | London          | P5     | 500
But we want CITY → STATUS:
update FIRST set status = 10 where city = 'Paris';
select * from FIRST;
  sid  | status |      city       |  pid   | qty 
-------+--------+-----------------+--------+-----
 S1    |     20 | London          | P1     | 300
 S1    |     20 | London          | P2     | 200
 S1    |     20 | London          | P3     | 400
 S1    |     20 | London          | P4     | 200
 S1    |     20 | London          | P5     | 100
 S1    |     20 | London          | P6     | 100
 S4    |     20 | London          | P2     | 200
 S4    |     20 | London          | P4     | 300
 S4    |     20 | London          | P5     | 500
 S2    |     10 | Paris           | P1     | 300
 S2    |     10 | Paris           | P2     | 400
 S3    |     10 | Paris           | P2     | 200

Anomalies:

2NF - Second normal form

1NF and every non-key attribute is irreducibly dependent on every candidate key.

Procedure: take (nonloss) projections to eliminate reducible FDs

E.g. R {A, B, C, D} PRIMARY KEY {A, B} and A → D

For relation FIRST: replace with SECOND { SID, STATUS, CITY } and SP { SID, PID, QTY }

But relation SECOND still contains CITY → STATUS and therefore a transitive FD SID → CITY → STATUS

Update anomalies:

SQL:

create table SECOND as select distinct SID, STATUS, CITY from FIRST;
select  * from second;
  sid  | status |      city       
-------+--------+-----------------
 S1    |     20 | London         
 S2    |     10 | Paris          
 S3    |     10 | Paris          
 S4    |     20 | London         

Check reversibility:

select distinct * from SECOND natural join SP;
  sid  | status |      city       |  pid   | qty 
-------+--------+-----------------+--------+-----
 S1    |     20 | London          | P1     | 300
 S1    |     20 | London          | P2     | 200
 S1    |     20 | London          | P3     | 400
 S1    |     20 | London          | P4     | 200
 S1    |     20 | London          | P5     | 100
 S1    |     20 | London          | P6     | 100
 S2    |     10 | Paris           | P1     | 300
 S2    |     10 | Paris           | P2     | 400
 S3    |     10 | Paris           | P2     | 200
 S4    |     20 | London          | P2     | 200
 S4    |     20 | London          | P4     | 300
 S4    |     20 | London          | P5     | 500