Concept Behind Database Indexes
Published on
We'll see the fundamental concept behind indexes and how it improves DB query performance. We'll not see any technical detail, not even the command to create an index. We'll focus only on the concept.
If you're an advanced developer, you should definitely check out Markus Winand's SQL Indexing and Tuning e-Book. It's an amazing book with the right level of abstraction for advanced developers to understand and apply DB indexes effectively.
Prerequisites
This post assumes familiarity with the following topics:
- Basic database concepts.
- The concept behind reading and writing data into a database.
Introduction
The more data we read, the slower the DB operation becomes. Why does this happen though? Hasn't hardware advanced to a point where storage device speeds are no longer a concern? Unfortunately, no - and we may never reach that point. To improve DB performance, we must acknowledge that storage devices are inherently slow and reduce the amount of data we access for any operation. Indexes help us do this.
The Problem
You can skip to the next section if you already know why database operations are slow without indexes.
Imagine yourself working in a storage warehouse with thousands of crates of fruits. You're tasked with finding the box with strawberries. How would you find it? Try to find it below (as much as I would like to present thousands of crates to make the problem realistic, I don't want to crash your browser π):
Find the crate with π
How many crates did you open? Regardless of how many you opened, think about the logic you used to decide which crate to open. Spend a few moments thinking before continuing.
There is no logic we can use when all we have is a set of identical crates; we try our luck. As you can imagine, this is not the way to operate a warehouse. We need to know the fruits in each crate so we can effectively cater to requests such as, "Ship 100 crates of strawberries to location A".
Now, think about what information the warehouse could maintain to find the crates easier. Spend a few moments thinking before continuing.
The Solution
Play the following crates game again, but this time you're given a table to help you.
Find the crate with π
Row | Column | Fruit |
---|---|---|
1 | 1 | ... |
1 | 2 | ... |
1 | 3 | ... |
1 | 4 | ... |
2 | 1 | ... |
2 | 2 | ... |
2 | 3 | ... |
2 | 4 | ... |
3 | 1 | ... |
3 | 2 | ... |
3 | 3 | ... |
3 | 4 | ... |
Was it easier? The table you used is the basic structure of an index. We generally see such tables at the end of textbooks as well, which allows us to jump to the topic we're interested in. The only difference here is that we use row and column numbers to identify crates instead of page numbers in a textbook.
An important idea behind indexes is that they have an order. The index behind a book sorts the topics in an ascending order like a dictionary, which helps us find the relevant topic without going through the whole index. The above index doesn't follow any order, we just scan every row for a π. A DB index on the other hand, uses a special data structure to enable quick scans and order is only one aspect of the data structure. I ignored this structure in this demo to keep the concept simple, but you can read more about it in the Anatomy of an SQL Index chapter in Markus Winand's free digital book.
Database Indexes
Suppose we have a table of data in the database:
Memory Address | ID | Name | Phone Number |
---|---|---|---|
0x001 | 1 | John Doe | (123) 456-7890 |
0x002 | 2 | Jane Smith | (987) 654-3210 |
0x003 | 3 | Alice Johnson | (555) 123-4567 |
0x004 | 4 | Bob Brown | (444) 777-8888 |
0x005 | 5 | Eve Davis | (222) 333-4444 |
Each row is stored at a specific location on disk, indicated by the first column. If we ask the DB for the phone number of Bob Brown, for example, it has to look at each row until it finds Bob. This is similar to opening the crates one by one to find the fruit we need. The more data we look at, the slower the operation. You can visualize this in the below demo by stepping through manually or letting it play like a video. Select an option and press Start.
Read row at 0x001
Is it Bob?
No β
Yes β
If we create an index, the DB doesn't have to look at every row when searching for Bob. Instead, it looks up the index and gets the exact location where the row is stored and look only at that row! You can visualize this in the below demo.
Where
is Bob?
At 0x004
Read row at 0x004
Bob found! β
The word look at is doing a lot of work here. When I say the DB looks at a row, it means its reading the row from disk into memory and then checking the row's content to see if that's the row we requested. Moreover, a DB reads data in pages , not one row at a time. I'll cover this in a future post, but you can get into the details in Markus Winand's book.
Conclusion
By using an index, the database can avoid reading the whole table just to find a single row. The indexes uses a special data structure to make the scanning process super fast, which you can read more about in Markus Winand's book.