One of the database design practices that I have seen more and more frequently in recent years is the use of GUIDs for primary keys.  Unless you are implementing merge replication, this is rarely useful and can be both a minor annoyance and a drag on database performance.  (Even if you are using replication, a GUID primary key is still an annoyance, but something of a necessary one.)

In the off chance that you don’t know what a primary key is and you’re still reading, a primary key is a piece of data that uniquely defines a particular row in a table.  Typically, primary keys are either an auto-generated number, a combination of a series number  and a foreign key (e.g. page 15 of document 1234), or a piece of meaningful data that is guaranteed to be unique (e.g. social security number).  In database theory, “primary key” and “unique constraint” mean the same thing, though most of the popular databases treat them differently. In SQL Server, for instance, the primary key will be used to create a clustered index if you don’t select something else.  It’s not especially uncommon to have both a primary key and a unique constraint – for example, a web contact system might use email addresses to uniquely identify customers (a unique constraint), but have a separate customer ID as the primary key.

So let’s first consider the reasons that you might want to use a GUID as your primary key, aside from replication:

  • A GUID gives you a unique ID across all tables.  If someone references record 66746E1D-8C89-4A72-BED2-4A8F5587DB92, you can know definitively whether that record is a user, a transaction, or anything else, whereas if they can only reference record #57, that same ID is likely used across multiple tables.
  • You are not reliant on the database to generate the record ID – you could potentially insert related records before inserting the main record.  For example, you might be creating a large tree structure that will take several minutes.  Inserting the main record and making it immediately available to users could be confusing.  But having a GUID for the primary key allows you to generate your own IDs for the higher-level nodes without having to insert them into the database until all of the children have been added.
  • GUIDs are not guessable.  If you are displaying records to a user (and not using URL rewriting to create SEO-friendly URLs, a user can guess that if they are on document.aspx?id=123 that they could also get to document.aspx?id=122 and document.aspx?id=124.

Now, all of these are reasons you might consider a GUID for a primary key.  But let’s look at the drawbacks that, in my estimation make GUIDs just not worth it:

  • GUIDs are 16 bytes instead of four bytes.  Remember that this size doesn’t just affect the table itself – it affects indexes, foreign keys, and any other time you need to reference the column.
  • A GUID used in a clustered index causes pain and misery (or, at least, a loss of efficiency).  This can be mitigated by using NEWSEQUENTIALID() to generate your GUIDs, but that takes away most of your reasons for using a GUID in the first place.
  • GUIDs do not imply any kind of ordering.  This is a big deal for diagnosing problems – if you know that you are looking for a missing document #12345, you know the approximate age of that document (it was created sometime between #12344 and #12346).  Just last week, I had a customer who knew that four rows had been added to a table, but had no way of knowing which four they were (for some odd reason, the table did not include a column for the creation date).
  • Similarly, if you are performing a function like BETWEEN, an integer primary key lets you use the clustered index to perform this operation (much, much faster – basically the purpose of clustered indexes).

 

Can we help with your database design?  Contact our technology team.

Crew Keysimage credit: Simon Speed

5 Comments

  1. Hildeberto Mendonça

    Some comments:

    You cannot simply know the table based on the ID. If the ID is generated (most cases) then it contains nothing about database metadata. To include database metadata would require additional processing, which is undesarible.

    GUID is not only useful for replication, but also for interoperability with other systems. If you take a close look, most data provided by Google, Facebook and other public APIs use GUID as id. We can simply reuse these ids to store these data in our database.

    Since sequential and GUID ids are meaningless, there is no point on using operators like ‘between’, and others. Order by date of insertion without cost? Come on! This might be 0,0001% of all use cases. Users actually want to order by several columns.

    This sounds more like a database-driven development, not a domain-driven development. An an isolated application is not useful anymore. Data are more and more coming from several datasources. GUID plays a strategic role on that. 😉

  2. Richard

    Good points.

    One thing I would say is about your final bullet point. If you are performing BETWEEN operations on a primary key that is a surrogate key, be it an integer or a GUID, you should be questioning your design.

  3. Developer Dude

    I agree with Hildeberto. A GUID is useful for interoperability. I’ve seen it used where data was exported/imported as metadata for various schemas/ontologies/taxonomies/etc. between various systems, and the only reliable way of uniquely identifying the metadata and maintaining the integrity of the metadata was to use a GUID – especially where uniqueness was an important constraint.

    This sounds to me like a premature optimization. Also, I’ve seen a LOT of databases that used sequential integers for keys that later clashed when they needed to be merged or cross-referenced in some manner.

    Then there is the problem of bugs; if some server code asks for a customer account with the an integer ID and it is one off (for whatever reason), then you can easily be returning somebody else’s account data.

    I recently read a good programming recommendation; do not use numbers for properties that are not numbers. I won’t go into why – it shouldn’t take most of us too long to see why that is a good idea.

Comments are closed.