It is common knowledge that judicious use of indexes can help SELECT queries execute significantly faster. This can tempt some database admins (DBAs) to try to milk as much performance gains as possible by adding indexes to every column that might possibly be included in a query. The downside to adding indexes to a table is that they affect the performance of writes. Moreover, improperly created indexes can even adversely affect SELECT queries! Any table configuration where performance suffers due to excessive, improper, or missing indexes is considered to be poor indexing. In today's blog, we'll learn about the ramifications of poor indexing, as well as cover how to choose which columns to include as part of a clustered index.
The Effects of Poor Indexing
A poor index can be an index created on a column that doesn't provide easier data manipulation or an index created on multiple columns which, rather than speed up queries, slows them down.
If indexes are not created properly, the database has to go through more records in order to retrieve the data requested by a query. Therefore, it uses more hardware resources (processor, memory, disk, and network) and makes fetching the data take longer.
A table without a clustered index can also be considered as a poor indexing practice in some cases. Execution of a SELECT statement, inserting, updating, and deleting records is in most cases slower on a heap table (i.e. a table without a clustered index) than on a clustered one.
Choosing Columns For Clustered Indexes
When you create a table with a primary key (PK) in a relational database, a unique clustered index is automatically created on the primary key column. Although this default action is perfectly acceptable in most cases, this might not be the optimal index for your data.
The columns that make up a clustered index should form a unique, identity, primary key, or any combination where values are increased for each new entry. As clustered indexes sort the records based on the value, using a column already ordered ascending, such as an identity column, is a good choice.
A column whose value changes frequently should not be used for a clustered index. The reason is that each change of the column used for the clustered index requires the records to be reordered. This re-ordering can easily be avoided by using a column that is updated less frequently, or ideally, not updated at all.
Likewise, columns that store large data, such as BLOB columns (text, nvarchar(max), image, etc.), and GUID columns are not ideal for clustered indexes. This is because sorting large values is highly inefficient, and in case of GUID and image columns, doesn't make much sense.
Finally, a clustered index should not be built on a column already used in a unique index.
Conclusion
In today's blog, we learned about the ramifications of poor indexing, as well as how to choose which columns to include as part of a clustered index. In an up-coming article, we'll cover how the same indexes that provide better performance for some operations, can add overhead for others.
Rob Gravelle resides in Ottawa, Canada, and has been an IT Guru for over 20 years. In that time, Rob has built systems for intelligence-related organizations such as Canada Border Services and various commercial businesses. In his spare time, Rob has become an accomplished music artist with several CDs and digital releases to his credit.