Strings as Primary Keys
In this third and final installment of this series on choosing a Primary Key for relational databases we'll be examining some of the reasons for employing string data as a Primary Key (PK). Recall that, in Part 1, we covered Natural and Surrogate Primary Keys and considered why one might choose one over the other. Part 2 explored String and Numeric data types as Primary Keys in an effort to ascertain whether one is preferable to the other. Now it's time to set the record straight and conclude whether or not string - or alphabetic - data can make a suitable PK.
Ready-made Keys
There is often a sensible natural primary key for your data which has a universal meaning and may not be an integer. If so, then adding an artificial key just for the sake of an integer type adds nothing but redundancy. Performance may be hampered slightly, but it's slightly less important than correctness, integrity, and appropriate modelling, in the estimation of many database developers.
A non-obvious benefit of alphabetic keys is that a short symbolic string can simplify debugging by being immediately human-readable in data dumps (without additional joins). For example, US states have an alphabetic code which is unique and is meaningful as a key outside the schema. Also, countries have alphabetic ISO codes. And there are an endless number of other examples, such as vehicle VIN numbers, invoice IDs, etc.
Using GUIDs as Primary Keys
If you have more than one database, auto-incrementing keys can cause redundant records. One way around this problem is to use GUIDs. Short for "Globally Unique IDentifier", GUID is a 16 byte binary data type that is guaranteed to be unique across tables, databases, and even servers.
How you create a GUID varies across different databases, but, in SQL Server, the NEWID() function is used as shown below:
SELECT NEWID()
Here's a statement for creating a table with the UNIQUEIDENTIFIER data type. To set a default value for the column we will use the default keyword and set the default value as the value returned by the NEWID() function:
USE EngDB GO CREATE TABLE EnglishStudents1 ( Id UNIQUEIDENTIFIER PRIMARY KEY default NEWID(), StudentName VARCHAR (50) ) GO INSERT INTO EnglishStudents1 VALUES (default,'Shane') INSERT INTO EnglishStudents1 VALUES (default,'Jonny')
This will ensure that whenever a new record is inserted in the EngStudents1 table, by default, the NEWID() function generates a unique value for the Id column. When inserting the records, we simply have to specify "default" as value for the first column. This will insert a default unique value to the Id column.
GUIDs vs. UUIDs
While GUIDs (as used by Microsoft) and UUIDs (as defined by RFC4122) look similar and serve similar purposes, there are subtle-but-occasionally-important differences. First, let's establish what UUIDs are.
UUIDs are 128 bits values, with textual representation in hex digits. It bears repeating, because many people think that UUIDs are stored as text. UUIDv4 values being random, you don't have a guaranteed uniqueness. However, the probability of a collision is rather small. Also, keep in mind that in the extremely unlikely case of colliding UUIDs, it will be caught by the DB thanks to the primary key constraint. You'll also be happy to know that UUIDv4 are perfectly well indexed by most relational DBs.
Some Microsoft GUID docs allow GUIDs to contain any hex digit in any position, while RFC4122 requires certain values for the version and variant fields. Also, GUIDs should be all-upper case, whereas UUIDs should be "output as lower case characters and are case insensitive on input". This can lead to incompatibilities between code libraries.
You could say that GUID is Microsoft's implementation of the UUID standard. Treat them as a 16 byte (128 bits) value that is used as a unique value. In Microsoft-speak they are called GUIDs, but call them UUIDs when not using Microsoft-speak.
The Verdict
In this third and final installment of this series on choosing a Primary Key for relational databases, we set out to decisively conclude whether or not string - or alphabetic - data can make a suitable PK. The answer is YES, but with some caveats. You should expect to take a slight performance hit. If that prospect does not deter you, then you've got some options to work with, from short symbolic strings to GUIDS and UUIDs. Finally, try to use fixed length strings rather than varchars, as fixed length strings perform better.