# Normalization

Normalization theory is a useful aid in the design of database. This theory is built around the concept of normal form. It is a step method to refine the database design. Finally the database becomes available in the highest normal form.

## Normal form in Normalization

A relation is said to be in a normal form if it satisfies a certain specified set of constraints. The normal forms are:

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)

Boyce/Codd Normal Form (BCNF)

Fourth Normal Form (4NF)

## Non-Normal Form in Normalization

A relation is said to be in non-normal form if and only if the underlying domains do not contain atomic values.

Atomic value means of composite attribute over here. In the example, order-details attribute of the relation ORDERS contains the details about part-no and qty in single attribute. It is called non-atomic value.

**ORDERS**

order-no | order-date | order-details | |

part-no | qty | ||

1 | 10/9/2018 | p1 | 10 |

p2 | 30 | ||

2 | 2/10/2019 | p5 | 10 |

p6 | 16 | ||

p7 | 52 | ||

3 | 15-11-2019 | p6 | 52 |

p3 | 44 | ||

p8 | 49 | ||

p3 | 15 |

## All The Normal forms of Normalization

### First Normal Form (1NF)

A relation R is in first normal form (1NF) if and only if the entire underlying domain contains atomic values.

The above table can be converted into 1NF as follows:

**ORDERS**

order-no | order-date | order-details | |

part-no | qty | ||

1 | 10/9/2018 | p1 | 10 |

1 | 10/9/2018 | p2 | 30 |

2 | 2/10/2019 | p5 | 10 |

2 | 2/10/2019 | p6 | 16 |

2 | 2/10/2019 | p7 | 52 |

3 | 15/11/2019 | p6 | 52 |

3 | 15/11/2019 | p3 | 44 |

3 | 15/11/2019 | p8 | 49 |

3 | 15/11/2019 | p3 | 15 |

### Second Normal Form (2NF)

A relation R is in second normal form (2NF) if and only if it is in 1NF and every non-key attribute is fully functionally dependent on the primary key.

To show how this constraints applies, let us turn to relation ORDERS as shown in above two tables. Here the relation key is { order-no, part-no}. The key attributes are order-no and part-no. The non key attributes are order-date and qty. The value of order-date, however depends only on order-no. The relation is therefore not in 2NF.

Relations that are not in second normal form can always be decomposed into second normal form relations. To do this we can use a very simple rule that is: remove the offending functional dependency.

This means that we take the functional dependency that violated the 2NF constraint and make a new relation out of all attributes in this functional dependency. For example, the ORDERS is not in 2NF. The offending functional dependency is:

**order-no → order-date**

To make it in 2NF we make a new relation ORDERS from this functional dependency and remove order-date from the original relation. Relation ORDERS now stores facts about orders only. The original relation is now replaced by relation ORDERS-CONTENTS, which stores facts about order lines. The relation key of relation ORDERS is order-no and ORDERS-CONTENTS is {order-no, part-no}. The newly generated relations are as shown in the following tables:

**ORDERS**

order-no | order-date |

1 | 10/9/2018 |

2 | 2/10/2019 |

3 | 15/11/2019 |

**ORDER-CONTENTS**

order-no | part-no | qty |

1 | p1 | 10 |

1 | p2 | 30 |

2 | p5 | 10 |

2 | p6 | 16 |

2 | p7 | 52 |

3 | p6 | 52 |

3 | p3 | 44 |

3 | p8 | 49 |

3 | p3 | 15 |

### Third Normal Form (3NF)

A relation R is in third normal form (3NF) if and only if it is 2NF and every non key attribute is not-transitively dependent on the primary key.

Relations in 2NF still contain redundancies and additional constraint must be satisfied to eliminate redundancies. Relations in third normal form (3NF) must satisfy yet another constraint. In such relations there should be no dependencies between non-key attributes.

Let us consider a relation VEHICLE, which in not in third normal form.

**VEHICLE**

reg-no | owner | model | manufacturer | no-cyl |

01 | Ram | m1 | Ford | 4 |

77 | Mary | m2 | Maruti | 6 |

30 | Nalin | m3 | Tata | 4 |

37 | George | m4 | Ford | 4 |

83 | Ramesh | m1 | Ford | 4 |

99 | Nutan | m6 | Matiz | 6 |

46 | Niraj | m7 | Bajaj | 4 |

96 | Ashish | m6 | Matiz | 6 |

74 | Amit | m2 | Maruti | 6 |

92 | Asha | m1 | Ford | 4 |

The VEHICLE stores information regarding vehicles. Each vehicle is uniquely identified by reg-no and has one owner. The relation key is reg-no. The values for all the non key attributes can be determined easily by reg-no, therefore this relation is in 2NF. But non key attribute are transitive. For example no-cyl is dependent on model and manufacturer. Hence to make it in third normal form we decompose the relation in two relations REGISTRATION AND VEHICLE. These are as follows:

**REGISTRATION (key: reg-no)**

reg-no | owner | model | manufacturer |

1 | Ram | m1 | Ford |

77 | Mary | m2 | Maruti |

30 | Nalin | m3 | Tata |

37 | George | m4 | Ford |

83 | Ramesh | m1 | Ford |

99 | Nutan | m6 | Matiz |

46 | Niraj | m7 | Bajaj |

96 | Ashish | m6 | Matiz |

74 | Amit | m2 | Maruti |

92 | Asha | m1 | Ford |

**VEHICLE (key: mode, manufacturer)**

model | manufacturer | no-cyl |

m1 | Ford | 4 |

m2 | Maruti | 6 |

m3 | Tata | 4 |

m4 | Ford | 4 |

m6 | Matiz | 6 |

m7 | Bajaj | 4 |

### Boyce/Codd Normal Form (BCNF)

A relation R is in BCNF if and only if every determinant is a candidate key.

Following relation WORK is not in BCNF as it contains overlapping keys.

**WORK**

project-id | person-id | manager | time-spent |

1 | j1 | m1 | 30 |

2 | j1 | m2 | 20 |

1 | j2 | m1 | 11 |

2 | j2 | m2 | 50 |

3 | j2 | m3 | 15 |

1 | j3 | m2 | 5 |

This relation can be decomposed into further simpler relations who are now in BCNF.

**PROJECTS (key: project-id)**

project-id | manager |

1 | m1 |

2 | m2 |

3 | m3 |

**WORK (key: project-id, person-id)**

project-id | person-id | time-spent |

1 | j1 | 30 |

2 | j1 | 20 |

1 | j2 | 11 |

2 | j2 | 50 |

3 | j2 | 15 |

1 | j3 | 5 |

### Fourth Normal Form (4NF)

A relation R is in fourth normal form (4NF) if and only if whenever there exists a Multi Valued dependency in R say A → B then all attributes of R are also functionally dependent on A. Hence a relation in 4NF should not contain independent Multi Valued dependency.

Following two relations in context with the previous one are in 4NF.

**KNOWLEDGE**

person-id | skill |

j1 | computing |

j2 | science |

j3 | physics |

j3 | maths |

j2 | computer |

j1 | chemistry |

**ASSIGNMENTS**

person-id | project-id |

j1 | 1 |

j1 | 2 |

j2 | 1 |

j2 | 2 |

j2 | 3 |

j3 | 1 |