Normalization: 3rd and BCNF

Third normal form

with only one candidate key which is the primary key: 2NF and every non-key attribute is non-transitively dependent on the primary key

Procedure: take (nonloss) projections to eliminate transitive FDs

E.g. R { A, B, C } PRIMARY KEY { A } and B → C

For relation SECOND (decomposition A): replace with SC { SID, CITY } and CS { CITY, STATUS }

SQL:

create table SC as select distinct SID, CITY from second;
select * from sc;
  sid  |      city       
-------+-----------------
 S1    | London         
 S2    | Paris          
 S3    | Paris          
 S4    | London         

create table cs as select distinct CITY, STATUS from second;
select * from cs;
      city       | status 
-----------------+--------
 London          |     20
 Paris           |     10

Check reversibility:

select distinct sid, city, status from sc natural join cs;
  sid  |      city       | status 
-------+-----------------+--------
 S1    | London          |     20
 S2    | Paris           |     10
 S3    | Paris           |     10
 S4    | London          |     20

Dependency preservation

E.g. (decomposition B) project SECOND into SC { SID, CITY } and SS { SID, STATUS }

SQL:

create table ss as select distinct sid, status from second;
select * from ss;
  sid  | status 
-------+--------
 S1    |     20
 S2    |     10
 S3    |     10
 S4    |     20

Check reversibility:

select distinct * from sc natural join ss;
  sid  |      city       | status 
-------+-----------------+--------
 S1    | London          |     20
 S2    | Paris           |     10
 S3    | Paris           |     10
 S4    | London          |     20

Theorem of Rissanen

Projections R1 and R2 of relation R are independent iff (if and only if):

Check decomposition A:

Check decomposition B:

Note that decomposing SECOND into {SID, STATUS} and {CITY, STATUS} is not reversible!

Atomic relation: cannot be decomposed into independent projections

E.g. SP is atomic, S and P are not

Boyce/Codd normal form

A relation is in BCNF iff every non-trivial, left-irreducible FD has a candidate key as its determinant (left-hand side).

In a diagram: only FD arrows out of candidate keys.

Codd's original definition of 3NF did not adequately deal with a relation that

  1. has two or more candidate keys such that
  2. candidate keys are composite, and
  3. they overlap

BCNF is a stronger definition, and therefore not equal to 3NF:

3NF and BCNF are equivalent when the three conditions above are not met.

Example: Student-Subject-Teacher

Relation SJT with FDs { S, J } → T and T → J

S     J       T
-------------------
Smith Math    White
Smith Physics Green
Jones Math    White
Jones Physics Brown

We can go to BCNF by projecting to ST { S, T } and TJ { T, J }

Note that the only candidate key for ST is { S, T }.

☞ Find a similar example with patients and specialist doctors!

Decomposition in SQL:

create table SJT (S char(8), J char(8), T char(8), primary key (S, J), unique (S, T));
insert into sjt values ('Smith', 'Math', 'White');
insert into sjt values ('Smith', 'Physics', 'Green');
insert into sjt values ('Jones', 'Math', 'White');
insert into sjt values ('Jones', 'Physics', 'Brown');

select s, j, t from SJT;
    s     |    j     |    t     
----------+----------+----------
 Smith    | Math     | White   
 Smith    | Physics  | Green   
 Jones    | Math     | White   
 Jones    | Physics  | Brown   

Project to ST and TJ:

create table ST as select distinct s, t from sjt;
select * from st;
    s     |    t     
----------+----------
 Jones    | Brown   
 Jones    | White   
 Smith    | Green   
 Smith    | White   

create table TJ as select distinct t, j from sjt;
select * from TJ;
    t     |    j     
----------+----------
 Brown    | Physics 
 Green    | Physics 
 White    | Math    

Check reversibility:

select s, j, t from ST natural join TJ;
    s     |    j     |    t     
----------+----------+----------
 Jones    | Physics  | Brown   
 Smith    | Physics  | Green   
 Jones    | Math     | White   
 Smith    | Math     | White   

We cannot decompose every relation into components that are BCNF and independent.

We also observe that SJT is atomic.

Zaniolo's definition of 3NF

For a definition of BCNF drop condition 3.

Apply Zaniolo's definition to SJT:

  1. Functional dependency { S, J } → T

    OK for 3NF, OK for BCNF

  2. Functional dependency T → J

    OK for 3NF, not OK for BCNF

Therefore, SJT is in 3NF but not in BCNF.

Algorithms for Normalization

Synthesis to 3NF

Go from relation R to set S of relations in 3NF:

S = empty
Find a minimal cover C of FDs in relation R
For every FD X → Y in C:
   Add relation XY to S
If no attribute set of any relation in S contains a key K for R:
   Add relation K to S

The resulting decomposition is always lossless and dependency-preserving.

The minimal cover of a set of functional dependencies is the smallest set of FDs implying all FDs in the original set; not necessarily unique.

Two relations with common keys can be combined, e.g., { A, B } and { A, C } can be combined to { A, B, C }.

In the result of this algorithm a relation can be dropped if all its attributes are contained in those of another relation.

Decomposition to BCNF

Go from relation R to set S of relations in BCNF:

S = { R } 
While S has a relation R' that is not in BCNF do:   
   Choose FD X → Y in R' that violates BCNF
   Add relation XY to S
   R' = R' - Y

Note that dependency preservation is not guaranteed.