To understand this topic, you should have a basic idea about Functional Dependency & Candidate keys and Normal forms .
Steps to find the highest normal form of relation:
Example 1. Find the highest normal form of a relation R(A,B,C,D,E) with FD set D, B->A, BC->D, AC->BE>
Step 1. As we can see, (AC) + = but none of its subsets can determine all attributes of relation, So AC will be the candidate key. A can be derived from B, so we can replace A in AC with B. So BC will also be a candidate key. So there will be two candidate keys .
Step 2. The prime attribute is those attribute which is part of candidate key in this example and others will be non-prime in this example.
Step 3. The relation R is in 1 st normal form as a relational DBMS does not allow multi-valued or composite attributes.
The relation is not in the 2 nd Normal form because A->D is partial dependency (A which is a subset of candidate key AC is determining non-prime attribute D) and the 2 nd normal form does not allow partial dependency.
So the highest normal form will be the 1 st Normal Form.
Example 2. Find the highest normal form of a relation R(A,B,C,D,E) with FD set as D, AC->BE, B->E>
Step 1. As we can see, (AC) + = but none of its subsets can determine all attributes of relation, So AC will be the candidate key. A or C can’t be derived from any other attribute of the relation, so there will be only 1 candidate key .
Step 2. The prime attribute is those attribute which is part of candidate key in this example and others will be non-prime in this example.
Step 3. The relation R is in 1 st normal form as a relational DBMS does not allow multi-valued or composite attributes.
The relation is in 2 nd normal form because BC->D is in 2 nd normal form (BC is not a proper subset of candidate key AC) and AC->BE is in 2 nd normal form (AC is candidate key) and B->E is in 2 nd normal form (B is not a proper subset of candidate key AC).
The relation is not in 3 rd normal form because in BC->D (neither BC is a super key nor D is a prime attribute) and in B->E (neither B is a super key nor E is a prime attribute) but to satisfy 3 rd normal for, either LHS of an FD should be super key or RHS should be a prime attribute.
So the highest normal form of relation will be the 2 nd Normal form.
Example 3. Find the highest normal form of a relation R(A,B,C,D,E) with FD set A, A->C, BC->D, AC->BE>
Step 1. As we can see, (B) + =, so B will be candidate key. B can be derived from AC using AC->B (Decomposing AC->BE to AC->B and AC->E). So AC will be super key but (C) + = and (A) + =. So A (subset of AC) will be candidate key. So there will be two candidate keys .
Step 2. The prime attribute is those attribute which is part of candidate key in this example and others will be non-prime in this example.
Step 3. The relation R is in 1 st normal form as a relational DBMS does not allow multi-valued or composite attributes.
The relation is in 2 nd normal form because B->A is in 2 nd normal form (B is a superkey) and A->C is in 2 nd normal form (A is super key) and BC->D is in 2 nd normal form (BC is a super key) and AC->BE is in 2 nd normal form (AC is a super key).
The relation is in 3 rd normal form because the LHS of all FD’s is super keys. The relation is in BCNF as all LHS of all FD’s are super keys. So the highest normal form is BCNF.
Improved Data Integrity: Normalizing a relation to the highest possible normal form ensures that all dependencies and constraints are preserved, resulting in improved data integrity.
Reduced Data Redundancy: Normalization reduces data redundancy by breaking down a relation into smaller, more focused relations.
Improved Query Performance: By breaking down a relation into smaller, more focused relations, query performance can be improved.
Easier Maintenance and Updates: The smaller, more focused relations are easier to maintain and update than the original relation, making it easier to modify the database schema and update the data.
Better Flexibility: Normalization can improve the flexibility of the database system by allowing for easier modification of the schema.
Increased Complexity: Normalizing a relation to the highest possible normal form can increase the complexity of the database system, making it harder to understand and manage.
Costly: Normalizing a relation can be costly, especially if the database is large and complex. This can require additional resources, such as hardware and personnel.
Reduced Performance: Although query performance can be improved in some cases, in others, normalization can result in reduced query performance due to the need for additional join operations.
Limited Scalability: Normalization may not scale well in larger databases, as the number of smaller, focused relations can become unwieldy.
Article contributed by Sonal Tuteja.