Category: Data

Normalised and Denormalised (aka Denormalized)

Normalised and Denormalised (aka Denormalized)

(Picture from The Illusion of Normal: https://everydayaspergers.com/2012/03/19/day-50-the-illusion-of-normal/)

As always, do please correct me if I’ve got anything wrong.

tl;dr: Normalise = make it look normal = separate fields out into separate “normal” entities like Customer / Address, which then link to one another via foreign keys.

In relational databases: A normalised database is one where every possible entity has its own table, typically with an Id column, which other tables will reference using foreign keys. “Normalisation” refers to the fact that each entity has been separated out into its own table, with the links between entities being upheld via Ids and foreign keys. Each separate piece of data exists only once, and can be accessed by navigating a series of relationships.

For instance you might have Address, Customer, Employee, Invoice, etc. A customer may have a Sales Rep which links to the Employee table via EmployeeId. Meanwhile, the invoice links to the customer via CustomerId, and the customer links to an address via AddressId.

Denormalisation is the process of reducing joins in queries, by adding some redundant and duplicate data to tables. A denormalised database has some duplicated or redundant data. But it means that your queries will have fewer joins. Joins are costly.

Two examples of denormalisation:

1. In the above example, in order to get from Invoice to Sales Rep you have to go via customer, to get the employee Id of the sales rep. Instead, you could duplicate the SalesRepId in the Invoice table.

2. You may want a snapshot of the customer name and address, as they were when the invoice was created (they may have changed since then). If the invoice only accesses name and address via a link to the customer table, you can’t recreate the invoice as it was when it was first sent out. The solution is to copy customer name and address into the Invoice table.

Context:

I worked with relational databases via SQL for many years, but as is often the case with binary terms, I frequently forgot which was which out of “normalised” and “denormalised”, and what exactly they meant.

I’m currently learning about graph databases via Neo4J, and the term “denormalise” came up, and I had to remind myself – yet again – what it meant.

This time though, I wrote it down. Therefore I’m hoping it’ll stick.