What is Database Normalization?
normalization is the process of organizing the fields and
tables of a relational database by dividing larger tables into smaller and more
manageable tables which are connected to each other using relationships.
through normalization we can reduce Data Redundancy which
leads to anomolies and data corruption and reduce the amount of Data
Dependency between the fields of a table which makes it hard to Add, Update
and Delete fields that depend on each other.
Now, imagine that We are given the below datasheet which
contains data about the employees of a company and we are asked to design a
flexible database which can be easily queried and extended over time to
accommodate more data and more tables.
the below table has a bad structural design but we can
improve it by using the process of Normalization. we will start the process
of nomrmalizing the below table by first bringing it to the first normal
form(1NF).
the table above is breaking the these rules required to
achieve 1NF:
1. No repeating
groups of data are permitted in a column.
- the Department column can contain repeated data. the first row shows that an employee by the name of Zak Ruvalcaba is a company Executive who also works in the Marketing department. this also represents a many-to-many relationship, meaning that a department can have many employees and an employee can be a member of many departments.
2. Each column of
data must contain Atomic values.
- The Name and address column contain non-atomic values. meaning that they must be broken down into more columns so that they can be queried easier. for example the name column can be broken down to FirstName and LastName and the Address column can be broken down to Street,City, State and Zip columns. this will make the querying of the data in our table easier and more efficient.
3. The above table
does not have a Primary Key(PK) to uniquely identify each row.
Lets update our table to 1NF
so by analysing our table we have found its flaws and can
now start our normalization process. lets start by first giving the table a
unique key.
the department column contains repeated data. which is in
violation of the 1NF rule which states that there should be no repeating data
in each column. in the department field of the first row we have Executive and
marketing which are both of type department in a single column.
this also indicates a many-to-many relationship on the table
level, meaning each employee can be a member of many departments and each
department can have many employees.
through normalization we break each table into smaller and
more manageable ones and relate them by using relationships which comprise of
primary keys and foreign keys.
the discovery of the many-to-many relationship between
Employees and Departments is a good indicator that the department column should
be taken out and put into its own table. after doing so we are left with the
following table.
we are still not in 1NF as the Name and Address columns
contain non-atomic values. this means that there is too much related data in a
single column for us to be able to write efficient queries. therefore we will
separate these columns so that we can take full advantage of the data contained
within them.
the table above is now in the first normal form, there are
no duplicate data and each column contains atomic values. we also added a
primary ke to uniquely identify each row.




No comments :
Post a Comment