One of the questions that gives engineers heartburn is the question of how to store unstructured content for their application. When we speak of “unstructured content”, we are talking about text, HTML, images, Word docs, etc. Unstructured content may be complete files or may be segments of a file, for example, advertising content that gets inserted into HTML files.
There are two basic ways to store an unstructured content object – it may be stored as a file on the file system or in the database as a large object, or “LOB”. Most database management systems distinguish between text-based, or “character large objects”, or CLOBs and “binary large objects”, or BLOBs.
In SQL Server, CLOBs formerly used the “text” or “ntext” datatype (ntext is for Unicode text) and BLOBs formerly used the “image” datatype, though these three types have all been deprecated and will be removed at some point in the future. (As of SQL Server Denali, they are still present, so don’t worry – they aren’t going away tomorrow.) The (relatively) new data types for SQL Server that were introduced in 2005 are varchar(max), nvarchar(max), and binary(max). Each of these structures support data of up to 2147483647 bytes (231-1).
Let’s look at the considerations when choosing between file-based and BLOB-based storage.
- Security. If all of your content is public, then there is no downside to storing it on the file system. But if security is a concern, storing it in the database allows your database system to largely handle security for you. (Obviously, it is still necessary to design your system correctly so that it cannot be exploited.)
- Backups. Depending on the kind of unstructured content you are storing, it can potentially grow to be very large. For example, color images from a digital camera could be 10 megabytes each, depending on quality, and can quickly dwarf everything else. If your application is only storing content that never changes (e.g., a photography blog), then storing images in the database only makes backups pointlessly large. (This can be mitigated somewhat by using multiple filegroups and performing full backups of the LOB filegroup less frequently.)
- Speed. With web applications that serve very large files, storing them in the database causes the pointless extra step of retrieving them from the database when requested by a user.
SQL Server 2008 introduced an alternative to file or BLOB storage called FILESTREAM. With FILESTREAM storage, BLOBs are physically stored as separate files, though they can be accessed through SQL as though they were database objects. This storage method potentially relieves some of the limitations of file-based and BLOB storage. Programmatically, FILESTREAM objects through SQL or they can use the API function OpenSqlFilestream and treat the object as though it were a file.
Microsoft has an outstanding blog entry on FILESTREAM best practices. As might be expected, accessing FILESTREAM objects through SQL is always slower than accessing internally stored BLOB objects in SQL. But as objects start to group in size, FILESTREAM storage becomes the significantly faster option so long as you use the Win32 API (not SQL) to access them. The “break even” point in their study, is approximately 1.5 MB – files below this size are more efficiently stored as varbinary(max) and files larger than this size are more efficiently stored using FILESTREAM.
Keep an eye on our blog for more SQL Server related tips and tricks to help keep your applications performing optimally. Subscribing to the technology blog is easy – click here to ensure you don’t miss the latest.
image credit: arjecahn