Like most database developers, you've probably written your fair share of queries that search for that proverbial needle in a haystack of text or binary data. I know I have! Perhaps even more important than the SELECT statements that you write against the database are the indexes that it contains. To that end, an inverted index can go a long way towards making mounds of data accessible in an expeditious manner. In today's blog, we'll learn what inverted indexes are, and how to use them in your databases, using MySQL as an example.
Forward Index versus Inverted Index
Inverted indexes were actually invented decades ago, around the same time that much of the first AI and machine learning algorithms were born. However, it wasn't until recent increases in computing power that it became possible to make use of inverted indexes in traditional relational databases. Inverted indexes allows information in relational databases to be found much faster as well as allow queries to be far more complex and specific.
Unlike a regular (forward) index, that maps table rows to a list of keywords, an inverted index maps the keywords to their respective rows. Here's a side-by-side comparison:
Forward Index | Inverted Index | ||
---|---|---|---|
Row | Keywords | Word | Rows |
row1 row2 row3 |
hello, sky, morning tea, coffee, hi greetings, sky |
hello sky coffee hi greetings |
row1 row1, row3 row2 row2 row3 |
Searching using a forward index is a slower process because the database engine has to look at the entire contents of the index to retrieve all pages related to a word. Meanwhile, searching via an Inverted Index is very fast because there are no duplicate keywords in the index and each word points directly to the relevant row(s).
Inverted Indexes in MySQL
MySQL's InnoDB engine implements Full-text indexes on text-based columns (CHAR, VARCHAR, or TEXT columns) to speed up queries and DML operations on data contained within those columns. Full-text indexes employ an inverted index design so that each keyword in the index points to a list of documents that the word appears in. It also supports proximity searches, whereby two or more words that occur within a certain number of words from each other may also be located, by storing position information for each word.
In Navicat database administration development tools, such as Navicat for MySQL and Navicat Premium, you can view a table's engine in the General Information panel:
Assuming that your table uses the InnoDB engine, you can assign a FULLTEXT index via the Index Type drop-down on the Indexes tab of the Table Designer. Here's an example of the perfect column on which to add a FULLTEXT index - the Description column on the Sakila Sample Database's Film table:
Text fields such as this are good candidates for an Inverted Index because there are so many words and phrases to search on:
Conclusion
Inverted indexes are a great way to speed up your queries while allowing them to be far more complex and specific. Just be aware that the indexing process takes longer than it does for forward indexes.
Interested in Navicat for MySQL or Navicat Premium? You can try both for 14 days completely free of charge for evaluation purposes!
Rob Gravelle resides in Ottawa, Canada, and has been an IT Guru for over 20 years. In that time, Rob has built systems for intelligence-related organizations such as Canada Border Services and various commercial businesses. In his spare time, Rob has become an accomplished music artist with several CDs and digital releases to his credit.