One of the database design practices that I have seen more and more frequently in recent years is the use of GUIDs for primary keys. Unless you are implementing merge replication, this is rarely useful and can be both a minor annoyance and a drag on database performance. (Even if you are using replication, a GUID primary key is still an annoyance, but something of a necessary one.)
In the off chance that you don’t know what a primary key is and you’re still reading, a primary key is a piece of data that uniquely defines a particular row in a table. Typically, primary keys are either an auto-generated number, a combination of a series number and a foreign key (e.g. page 15 of document 1234), or a piece of meaningful data that is guaranteed to be unique (e.g. social security number). In database theory, “primary key” and “unique constraint” mean the same thing, though most of the popular databases treat them differently. In SQL Server, for instance, the primary key will be used to create a clustered index if you don’t select something else. It’s not especially uncommon to have both a primary key and a unique constraint – for example, a web contact system might use email addresses to uniquely identify customers (a unique constraint), but have a separate customer ID as the primary key.
So let’s first consider the reasons that you might want to use a GUID as your primary key, aside from replication:
- A GUID gives you a unique ID across all tables. If someone references record 66746E1D-8C89-4A72-BED2-4A8F5587DB92, you can know definitively whether that record is a user, a transaction, or anything else, whereas if they can only reference record #57, that same ID is likely used across multiple tables.
- You are not reliant on the database to generate the record ID – you could potentially insert related records before inserting the main record. For example, you might be creating a large tree structure that will take several minutes. Inserting the main record and making it immediately available to users could be confusing. But having a GUID for the primary key allows you to generate your own IDs for the higher-level nodes without having to insert them into the database until all of the children have been added.
- GUIDs are not guessable. If you are displaying records to a user (and not using URL rewriting to create SEO-friendly URLs, a user can guess that if they are on document.aspx?id=123 that they could also get to document.aspx?id=122 and document.aspx?id=124.
Now, all of these are reasons you might consider a GUID for a primary key. But let’s look at the drawbacks that, in my estimation make GUIDs just not worth it:
- GUIDs are 16 bytes instead of four bytes. Remember that this size doesn’t just affect the table itself – it affects indexes, foreign keys, and any other time you need to reference the column.
- A GUID used in a clustered index causes pain and misery (or, at least, a loss of efficiency). This can be mitigated by using NEWSEQUENTIALID() to generate your GUIDs, but that takes away most of your reasons for using a GUID in the first place.
- GUIDs do not imply any kind of ordering. This is a big deal for diagnosing problems – if you know that you are looking for a missing document #12345, you know the approximate age of that document (it was created sometime between #12344 and #12346). Just last week, I had a customer who knew that four rows had been added to a table, but had no way of knowing which four they were (for some odd reason, the table did not include a column for the creation date).
- Similarly, if you are performing a function like BETWEEN, an integer primary key lets you use the clustered index to perform this operation (much, much faster – basically the purpose of clustered indexes).
Can we help with your database design? Contact our technology team.
image credit: Simon Speed