String vs. Numeric Data Types as Primary Keys
Welcome back to this series on choosing a Primary Key for relational databases. In Part 1, we covered Natural and Surrogate Primary Keys and considered why one might choose one over the other. Today's instalment will explore String and Numeric data types as Primary Keys in an effort to ascertain whether one is preferable to the other.
String and Numeric Data Types in Relational Databases
Both string and numeric nomenclatures are actually umbrella terms that encapsulate several different data types. For starters, the string data type is a generic IT term that traditionally refers a sequence of characters, either as a literal constant or as some kind of variable. With regards to databases, single characters, represented by the CHAR type, are also grouped with Strings. Other DB string data types include VARCHAR, BINARY, VARBINARY, BLOB, TEXT, ENUM, and SET. Numeric data types include both exact numeric data types such as INTEGER, SMALLINT, DECIMAL, and NUMERIC, as well as the approximate numeric data types like FLOAT, REAL, and DOUBLE PRECISION.
The Great Debate
Advice on what data type works best for primary keys (PKs) abounds on the Internet. Some sites state outright that numeric keys are almost always superior to character-based ones, while an equal number of sites promote the use of string types. Meanwhile, DB vendors themselves don't suggest one type over the other. What they do offer, are instructions regarding the PRIMARY KEY Constraint. It uniquely identifies each record in a table and posits that:
- Primary keys must contain UNIQUE values, and cannot contain NULL values.
- A table can have only ONE primary key; and in the table, this primary key can consist of single or multiple columns (fields).
- PK values should not be changed over time.
So long as your PK satisfies the above criteria, then you're good to go, as far as DB vendors are concerned. But that doesn't mean that one type may offer some advantages over the other. Let's dive into those now.
Say Aye for Numeric Types
Back when I was first learning about database development, I was instructed that numeric types are best for PKs because they are both faster and memory efficient. This opinion was reinforced by my first employer, the Federal Government, who utilized numeric PKs, even if that meant adding a surrogate key.
There are plenty of reputable reference sites who echo that sentiment. Speaking about MySQL, Mysqltutorial.org states:
Because MySQL works faster with integers, the data type of the primary key column should be the integer e.g., INT, BIGINT. And you should ensure sure that value ranges of the integer type for the primary key are sufficient for storing all possible rows that the table may have.
MySQL is far from unique on its handling of numeric data; another page on Primary Keys in Oracle states that "primary keys typically are numeric because Oracle typically processes numbers faster than any other data types."
They even go so far as to say that PK data should be "meaningless":
Sometimes, you may want use meaningful data, which considers being unique, for the primary keys e.g., social security number (SSN), vehicle identification number (VIN), email, and phone number. However, you don’t know when the email or phone number changes or is reused by another person. In such cases, it will create many data problems. In the database world, the artificial keys are known as surrogate keys which are as opposed to natural primary keys.
Coming Up Next Week...
So far, it would seem that Numeric primary keys are best. However, we have not yet heard from the pro-string side. Perhaps they can offer some very good reasons for using strings instead.