Database Normalization explained
Thilan Dissanayaka Database Systems January 19, 2020

Database Normalization explained

Database normalization is a systematic approach to organizing data in a relational database.

The primary goal of normalization is to reduce data redundancy and improve data integrity. It involves dividing a database into multiple tables and defining relationships between them.

Why is Database Normalization Important?

  1. Eliminates Data Redundancy: Reduces duplicate data, saving storage space.

  2. Improves Data Integrity: Ensures consistency by storing data in a structured manner.

  3. Simplifies Data Maintenance: Makes it easier to update and manage data.

  4. Enhances Query Efficiency: Optimizes database performance by reducing the need to search large datasets.

Normal Forms in Database Normalization

Normalization typically follows a set of guidelines known as normal forms (NF). Each normal form addresses specific problems related to database design. Here are the most commonly used normal forms:

First Normal Form (1NF)

A table is in 1NF if:

  • Each column contains only atomic (indivisible) values.
  • Each column contains values of the same type.
  • Each row is unique.

To understand 1NF lets take an example. See bellow table.

StudentId Name Subjects
1 Alice Math, Chemistry
2 Bob Physics, English

As we can see the "Subjects" column contains multiple values right?. The values are not atomic.

What we can do is following. Note that this is not the best solution. But only considering the 1NF at the moment.

StudentId Name Subjects
1 Alice Math
1 Alice Chemistry
2 Bob Physics
2 Bob English

Second Normal Form (2NF)

A table is in 2NF if:

  • It is already in 1NF.
  • All non-key columns are fully dependent on the primary key.
OrderId ProductId ProductName Quantity
1 1 Monitor 2
1 2 Mouse 3
2 2 Mouse 4

Problem: ProductName depends only on ProductID, not the entire primary key (OrderID, ProductID).

2NF Solution: Split the table:

OrderId ProductId Quantity
1 1 2
1 2 3
2 2 4
ProductId ProductName  
  1 Monitor
  2 Mouse

Third Normal Form (3NF)

A table is in 3NF if:

  • It is already in 2NF.
  • Non-key columns are not dependent on other non-key columns (No transitive dependency).
EmployeeId Name Department Head
1 Alice IT Trudy
2 Bob Sales Eve

3NF Solution: Split the table:

EmployeeId Name Department
1 Alice IT
2 Bob Sales
Department Head
IT Trudy
Sales Eve

Boyce-Codd Normal Form (BCNF)

A table is in BCNF if:

  • It is already in 3NF.
  • Every determinant is a candidate key (ensures there are no overlapping candidate keys).

Course determines Department`, but it is not a candidate key.

Advantages of Database Normalization

  • Data Consistency Avoids data anomalies.
  • Storage Efficiency Reduces redundant data.
  • Simplified Updates Minimizes the effort to modify records.
  • Improved Data Integrity Ensures accuracy through constraints.

Disadvantages of Database Normalization

  • Complex Queries Joins between multiple tables can slow down retrieval.
  • Performance Overhead Increased time for insert, update, and delete operations.

When to Normalize and Denormalize

  • Normalize For transactional systems where data integrity is critical.
  • Denormalize For analytical or reporting systems where query performance is essential.

Database normalization is an essential technique to maintain data integrity and optimize storage. By understanding and applying normalization forms effectively, you can design robust and scalable databases that efficiently manage large datasets.

ALSO READ
Blockchain 0x000 – Understanding the Fundamentals
May 21, 2020 Web3 Development

Imagine a world where strangers can exchange money, share data, or execute agreements without ever needing to trust a central authority. No banks, no intermediaries, no single point of failure yet...

Identity and Access Management (IAM)
May 11, 2020 Identity & Access Management

Who are you — and what are you allowed to do? That's the fundamental question every secure system must answer. And it's exactly what Identity and Access Management (IAM) is built to solve.

How I built a web based CPU Simulator
May 07, 2020 Pet Projects

As someone passionate about computer engineering, reverse engineering, and system internals, I've always been fascinated by what happens "under the hood" of a computer. This curiosity led me to...

Writing a Shell Code for Linux
Apr 21, 2020 Exploit Development

Shellcode is a small piece of machine code used as the payload in exploit development. In this post, we write Linux shellcode from scratch — starting with a simple exit, building up to spawning a shell, and explaining every decision along the way.

Exploiting a Stack Buffer Overflow on Windows
Apr 12, 2020 Exploit Development

In a previous tutorial we discusses how we can exploit a buffer overflow vulnerability on a Linux machine. I wen through all theories in depth and explained each step. Now today we are going to jump...

Access Control Models
Apr 08, 2020 Identity & Access Management

Access control is one of the most fundamental concepts in security. Every time you set file permissions, assign user roles, or restrict access to a resource, you're implementing some form of access control. But not all access control is created equal...

Exploiting a  Stack Buffer Overflow  on Linux
Apr 01, 2020 Exploit Development

Have you ever wondered how attackers gain control over remote servers? How do they just run some exploit and compromise a computer? If we dive into the actual context, there is no magic happening....

Basic concepts of Cryptography
Mar 01, 2020 Cryptography

Ever notice that little padlock icon in your browser's address bar? That's cryptography working silently in the background, protecting everything you do online. Whether you're sending an email,...

Common Web Application Attacks
Feb 05, 2020 Application Security

Web applications are one of the most targeted surfaces by attackers. This is primarily because they are accessible over the internet, making them exposed and potentially vulnerable. Since these...

Remote Code Execution (RCE)
Jan 02, 2020 Application Security

Remote Code Execution (RCE) is the holy grail of application security vulnerabilities. It allows an attacker to execute arbitrary code on a remote server — and the consequences are as bad as it sounds. In this post, we'll go deep into RCE across multiple languages, including PHP, Java, Python, and Node.js.