Saturday, April 10, 2010

Database Design, 1NF, Starting Database Normalization


What is Database Normalization?

normalization is the process of organizing the fields and tables of a relational database by dividing larger tables into smaller and more manageable tables which are connected to each other using relationships.

through normalization we can reduce Data Redundancy which leads to anomolies and data corruption and reduce the amount of Data Dependency between the fields of a table which makes it hard to Add, Update and Delete fields that depend on each other.

Now, imagine that We are given the below datasheet which contains data about the employees of a company and we are asked to design a flexible database which can be easily queried and extended over time to accommodate more data and more tables.

the below table has a bad structural design but we can improve it by using the process of Normalization. we will start the process of nomrmalizing the below table by first bringing it to the first normal form(1NF).




the table above is breaking the these rules required to achieve 1NF:


 1. No repeating groups of data are permitted in a column.


  • the Department column can contain repeated data. the first row shows that an employee by the name of Zak Ruvalcaba is a company Executive who also works in the Marketing department. this also represents a many-to-many relationship, meaning that a department can have many employees and an employee can be a member of many departments.

 2. Each column of data must contain Atomic values.

  • The Name and address column contain non-atomic values. meaning that they must be broken down into more columns so that they can be queried easier. for example the name column can be broken down to FirstName and LastName and the Address column can be broken down to Street,City, State and Zip columns. this will make the querying of the data in our table easier and more efficient.

 3. The above table does not have a Primary Key(PK) to uniquely identify each row.


Lets update our table to 1NF



so by analysing our table we have found its flaws and can now start our normalization process. lets start by first giving the table a unique key.



the department column contains repeated data. which is in violation of the 1NF rule which states that there should be no repeating data in each column. in the department field of the first row we have Executive and marketing which are both of type department in a single column.
this also indicates a many-to-many relationship on the table level, meaning each employee can be a member of many departments and each department can have many employees.

through normalization we break each table into smaller and more manageable ones and relate them by using relationships which comprise of primary keys and foreign keys.
the discovery of the many-to-many relationship between Employees and Departments is a good indicator that the department column should be taken out and put into its own table. after doing so we are left with the following table.




we are still not in 1NF as the Name and Address columns contain non-atomic values. this means that there is too much related data in a single column for us to be able to write efficient queries. therefore we will separate these columns so that we can take full advantage of the data contained within them.





the table above is now in the first normal form, there are no duplicate data and each column contains atomic values. we also added a primary ke to uniquely identify each row.




  

No comments :

Post a Comment