Deprecated SQL Server Data Access Technologies

I hope the warning excerpt below from the SQL Server Books Online is not a surprise to you (emphasis mine):

Warning:
SQL Server Native Client (SNAC) is not supported beyond SQL Server 2012
. Avoid using SNAC in new development work, and plan to modify applications that currently use it. The Microsoft ODBC Driver for SQL Server provides native connectivity from Windows to Microsoft SQL Server and Microsoft Azure SQL Database.

EDIT:
The above warning was removed from the SQL Server documentation shortly after this article was first published. However, I still think it’s prudent to use the stand-alone ODBC driver instead of OLE DB for new native applications. The OLE DB story continues to evolve and I hope we see a roadmap and guidance from Microsoft soon.

Let me first mention that the Books Online should have made it clear that this warning applies only to SNAC data access by applications. The warning does not apply to Microsoft SQL Server features and products that use SNAC as part of the OLE DB stack, like Linked Servers, SSIS, and SSAS. Microsoft SQL Server 2012 Native Client is still installed and used as a component dependency to support these features in SQL Server 2014 and later versions, although that may change in the future.

More concerning is OLE DB in general. Even before SQL Server 2012 was released, Microsoft announced in August, 2011 that ODBC was the preferred technology for relational database access and that OLE DB was deprecated. Below is an excerpt from the Microsoft is Aligning with ODBC for Native Relational Data Access FAQ which should be alarming to folks who continue to use Microsoft OLE DB providers to access SQL Server 2014 and later databases (note Denali is the code name for SQL Server 2012 and, again, emphasis mine):

Question6: If I have an OLE DB application that I write for Denali, will it be supported on a post Denali version of SQL Server that is released during the life of Denali?

Answer: No, in fact we may explicitly block the OLE DB applications on post-Denali versions of SQL Server. It is recommended that you plan your migration soon to ODBC, if you want to start using newer versions of SQL Server as soon as they release.

The explicit block of OLE DB data access has yet not happened (as of this writing) but I think it’s prudent to heed the announcement and warning, at least for applications that currently use SQL Server 2014 and later versions as well as SQL Azure Database, or may need to use newer SQL versions in the future. Microsoft has a history of going out of their way to provide backwards compatibility in the SQL Server space but it’s been 3 SQL Server versions (soon to be 4 versions) since the deprecation announcement.

The bottom line is that native Windows applications (e.g. unmanaged C++, VB6, VBA, Classic ASP, etc.) should generally use ODBC and the latest Microsoft ODBC Driver for SQL Server. This stand-alone ODBC driver is a free download from https://docs.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server. Not only should SNAC not be used against SQL Server 2014 and later versions, the deprecated SQL Server OLE DB provider and ODBC driver that ship with Windows should also be avoided. This practice will facilitate more seamless upgrades to new versions of SQL Server and Azure SQL Database.

Preparing for the Future
The message is loud and clear that ODBC is the supported and preferred path for native applications going forward. The Data Access Technologies Road Map provides an overview and history of Microsoft data access technologies, which I recommend you peruse to ensure you are not inadvertently using deprecated or unsupported technologies for new development and, for existing applications, consider moving from legacy data access technologies to current ones when practical.

The current Microsoft ODBC Driver for SQL Server as of this writing is ODBC Driver 13 for SQL Server. Note that that both the 13.0 and 13.1 versions of this driver have the same “ODBC Driver 13 for SQL Server” display name listed under installed programs and ODBC Data Source Administrator. If installed, the driver will be listed under installed programs along with the corresponding driver version (when viewed detail mode). The 13.1 version adds support for the Always Encrypted feature. These ODBC Drivers are available from the link I mentioned earlier.

The remainder of this article reviews common data access technologies for native (unmanaged) Windows applications that are explicitly identified as unsupported, deprecated, or may have a limited future along with remediation considerations. This isn’t to say these technologies won’t work, just that using them may block upgrades to new versions of SQL Server and Azure SQL Database as well as prohibit using new SQL Server features. Applications that are aligned with current technologies are much better positioned for seamless SQL Server upgrades, both on-prem and in the cloud.

Deprecated Windows Data Access Components
Windows Data Access Components (WDAC), which was formally known as Microsoft Data Access Components (MDAC) in older Windows versions, is included with Windows to provide data access infrastructure for ODBC, OLE DB, ADO Classic, and managed ADO.NET out-of-the box. WDAC includes a mix of deprecated, mature, and modern components. WDAC is part of the operating system and maintained by Windows Update.

Both the SQL Server ODBC driver and OLE DB provider included with WDAC are deprecated; these are provided only for legacy application backwards compatibility and should not be used for new application development. The WDAC ODBC driver named “SQL Server” and OLE DB provider named “Microsoft OLE DB Provider for SQL Server” (SQLOLEDB) were both deprecated in favor of SQL Server Native Client when SQL Server 2005 was released for over 10 years ago. Not only do these old components not support new data types introduced after SQL Server 2000 directly, newer features like Availability Groups, MARS, Always Encrypted, and idle connection resiliency are not supported either. Native applications should install and use a separately installed SQL Server driver rather instead of the WDAC “SQL Server ODBC driver, SQLOLEDB provider, or SNAC ODBC driver/OLE DB provider.

WDAC also includes the Microsoft OLE DB Provider for ODBC Drivers (MSDASQL), which acts as a bridge to allow OLE DB applications to use an ODBC driver instead of an OLE DB provider for low-level data access. The main purpose of MSDASQL was to allow OLE DB programs to access RDBMS products other than SQL Server that had an ODBC driver but no OLE DB provider, with the expectation that vendors would eventually jump on the OLE DB bandwagon. However, that didn’t happen widely because OLE DB relies on Windows-only COM interfaces and many vendors preferred to support only ODBC interfaces. Microsoft recognized the value of ODBC for cross-platform SQL Server data access, which is one of the reasons called out for OLE DB deprecation. Although MSDASQL is an OLE DB provider, it is not deprecated for SQL Server relational data access because it uses ODBC to access the data store. MSDASQL can facilitate transitioning to ODBC in existing OLE DB applications. I’ll discuss this in more detail shortly.

ADO (ActiveX Data Objects, not to be confused with ADO.NET) components are included with WDAC. Although not deprecated, ADO a mature OLE DB technology that hasn’t been enhanced since ADO 6.0 was released with WDAC in Windows Vista over 10 years ago. ADO types do not directly support data types introduced after SQL Server 2000, regardless of the underlying provider/driver used. I’ll leave speculation on the future of ADO as an exercise for the reader.

Changing Existing ODBC Applications
Changing an existing ODBC application (ADO or direct ODBC function calls) to use the Microsoft ODBC Driver for SQL Server is often simply a matter of installing the driver and changing the connection string or DSN to use it. Unmanaged C++ applications that use the ODBC call level interface directly may need to be recompiled using the header files included with the SDK install of the ODBC driver. It’s often trivial to switch from SNAC or the WDAC SQL Server ODBC driver. Application testing should be done to ensure compatibility but ODBC driver upgrades are usually transparent.

Changing from OLE DB to ODBC in ADO Classic
There is still quite a bit of OLE DB SQL Server access in the wild in my experience, varying from C++, VBA, Classic ASP, and even in VB.NET apps migrated from VB6 that were never retrofitted to use to SqlClient (which should be done as it provides high-performance managed data access for .NET applications). From unmanaged code, ADO provides an easy to use object-oriented COM interface on top of the 100+ complex OLE DB interfaces. Low-level data access is performed by the OLE DB provider specified in the connection string or connection object. MSDASQL (Microsoft OLE DB Provider for ODBC Drivers) is the default provider in ADO so ODBC is used when no OLE DB provider is specified. When MSDASQL is used explicitly or by default, the connection string must specify either a DSN or ODBC driver.

Below are DSN-less ADO ODBC connection string examples that use the Microsoft ODBC Driver for SQL Server with and without a trusted connection. “Provider=MSDASQL” could have been specified in the connection string but is not technically required since it’s the default provider:

DataSource=YourServer;Driver={ODBC Driver 13 for SQL Server};Database=YourDatabase;Trusted_Connection=Yes
DataSource=YourServer;Driver={ODBC Driver 13 for SQL Server};Database=YourDatabase;Trusted_Connection=Yes;UID=YourUser;PWD=YourPassword

The examples below use an ODBC DSN with the default database specified in the DSN configuration. The first uses Integrated Windows authentication (specified in the DSN configuration) and the second example is functionally identical except using SQL Server authentication. Be aware that the Microsoft ODBC Driver for SQL Server does not store user credentials in the DSN configuration so those must be specified by the application:

DSN=YourOdbcDataSource
DSN=YourOdbcDataSource;UID=YourUser;PWD=YourPassword

ADO applications use a handful of objects to interact with SQL Server (mostly connection, command, recordset, record, transaction, parameter, and field objects). These objects abstract the implementation details such that it is possible to use ADO with any DBMS product, OLE DB provider, or ODBC driver as long as the low-level driver/provider supports the requested functionality. One can theoretically transition from OLE DB to ODBC with only a connection string change. However, the devil is in the details when switching from OLE DB to ODBC via MSDASQL.

MSDASQL converts OLE DB method calls into their equivalent ODBC function calls. Commonly used fast-forward read-only client-side cursors (a.k.a. firehose cursor, same as the only type ADO.NET uses) tend to work well in my experience. However, ADO provides a plethora of options for cursor location, execution, cursor types, and locking modes that ADO and MSDASQL may or may not translate well to ODBC equivalents. If you run into problems during testing with ODBC, make sure the requested ADO options are appropriate for the task at hand. It is common for ADO applications to use advanced options and pessimistic locking inappropriately. A firehose cursor with optimistic locking is often best for the task at hand and will perform better too.

Like ADO, MSDASQL is a mature technology that hasn’t gotten much love lately. I could be wrong but I don’t expect patches that are not security related. If you run into issues with MSDASQL, you may be better off developing a work-around rather than wait for a fix.

Changing OLE DB Interface Calls to ODBC
C++ applications that use OLE DB interfaces directly rather than the higher-level ADO API can be converted to ODBC by using the MSDASQL provider similarly as discussed in the ADO topic. The same considerations apply.

Note that MSDASQL does add another layer, which may be an issue for applications that are especially performance-sensitive. The best approach for C++ applications that are expected to be around for a while might be to move from OLE DB to ODBC function calls (or via MFC). This will provide the highest level of performance and alignment with ODBC.

Remediation Summary
Inasmuch as I recommend staying aligned with product technology lifecycles, whether or not one should modify an existing application to use currently data access technologies is ultimately a business decision. Business benefits include not only supportability but also agility in making functional enhancements that leverage newer features as well as non-functional ones, such as support for Availability Groups. The costs of the development and testing needed to use the Microsoft ODBC Driver for SQL Server will vary considerably depending on application design. Those costs need to be weighed against the benefits of the effort.

That being said, I always recommend one avoid deprecated technologies in new development work even if that requires moving outside one’s comfort zone. Technical debt is a hidden cost that often needs to be paid back with interest.

p.s. For history buffs, this post by Hal Berenson, retired Microsoft Distinguished Engineer and General Manager, provides additional background on OLE DB and data access technologies.

SQL Server 2016 and Azure SQL Database V12 Breaking Change

This post is to get the word out about a breaking change to datetime conversion and comparison behavior in SQL Server 2016 and Azure SQL Database V12. This change hasn’t been documented as of this writing in the Breaking Changes to Database Engine Features in SQL Server 2016 topic in the SQL Server Books Online.

In short, conversion from datetime to a higher precision temporal data type (datetime2, datetimeoffset, or time) may yield a different, but more accurate, time value than in prior versions. Also, predicates involving datetime consider the full precision of raw datetime internal value instead of the time value rounded to the nearest millisecond. These changes in conversion and comparison behavior may affect existing applications and is not intuitive unless one understands the underlying datetime data type implementation.

Background
You may be aware that the accuracy of datetime is limited to 1/300 of a second. This is because values are internally an 8-byte structure consisting of 2 separate 32-bit integers, one with the number of day units since 1900-01-01 and the other with the number of 1/300 second interval units since midnight. The 1/300 second unit interval limits the time accuracy to 3.33333… milliseconds and the milliseconds value will be a repeating decimal when time interval units are not evenly divisible by 3. The raw decimal value is rounded to a scale of 3 in accordance with the fixed datetime precision of 3, resulting in a millisecond value of 0, 3, or 7 for all datetime values.

Pre-SQL Server 2016 Behavior
Before SQL Server 2016, conversion from datetime to another temporal type used the source datetime value after it was rounded to the nearest millisecond, which truncated repeating decimal fractional milliseconds. The rounded value was then rounded again according to the target type precision. When the target type precision was greater than 3, the time was extended to the target type precision with insignificant trailing zeros, resulting in zero for the sub-millisecond value.

DECLARE @DateTime datetime = '2016-01-01T00:00:00.007';
SELECT CAST(@DateTime AS datetime2(0)); --2016-01-01 00:00:00
SELECT CAST(@DateTime AS datetime2(1)); --2016-01-01 00:00:00.0
SELECT CAST(@DateTime AS datetime2(2)); --2016-01-01 00:00:00.01
SELECT CAST(@DateTime AS datetime2(3)); --2016-01-01 00:00:00.007
SELECT CAST(@DateTime AS datetime2(4)); --2016-01-01 00:00:00.0070
SELECT CAST(@DateTime AS datetime2(5)); --2016-01-01 00:00:00.00700
SELECT CAST(@DateTime AS datetime2(6)); --2016-01-01 00:00:00.007000
SELECT CAST(@DateTime AS datetime2(7)); --2016-01-01 00:00:00.0070000

Also, when datetime was compared to another temporal type, the rounded value was used. This script shows the result of the equality predicate is true after the datetime value is converted to datetime2.

--This script prints EQUAL predicate is true
DECLARE @DateTime datetime = '2016-01-01T00:00:00.003';
DECLARE @DateTime2 datetime2(7) = @DateTime;
IF @DateTime = @DateTime2 PRINT 'EQUAL predicate is true' ELSE PRINT 'EQUAL predicate is not true';
IF @DateTime < @DateTime2 PRINT 'LESS THAN predicate is true' ELSE PRINT 'LESS THAN predicate is not true';
IF @DateTime > @DateTime2 PRINT 'GREATER THAN predicate is true' ELSE PRINT 'GREATER THAN predicate is not true';
GO
--This script prints EQUAL predicate is true
DECLARE @DateTime datetime = '2016-01-01T00:00:00.007';
DECLARE @DateTime2 datetime2(7) = @DateTime;
IF @DateTime = @DateTime2 PRINT 'EQUAL predicate is true' ELSE PRINT 'EQUAL predicate is not true';
IF @DateTime < @DateTime2 PRINT 'LESS THAN predicate is true' ELSE PRINT 'LESS THAN predicate is not true';
IF @DateTime > @DateTime2 PRINT 'GREATER THAN predicate is true' ELSE PRINT 'GREATER THAN predicate is not true';
GO

SQL Server 2016 Behavior Change
SQL Server 2016 and Azure SQL Database V12 use the raw datetime internal value without rounding during conversion to another temporal type. The value is rounded only once during conversion, to the target type precision. The end result will be the same as before SQL Server 2016 when the target type precision is 3 or less. However, the converted value will be different when the target type precision is greater than 3 and the internal time unit interval is not evenly divisible by 3 (i.e. rounded source datetime millisecond value is 3 or 7). Note the non-zero microseconds and nanoseconds in the script results below and that rounding is based on the target type precision rather than the source.

DECLARE @DateTime datetime = '2016-01-01T00:00:00.003';
SELECT CAST(@DateTime AS datetime2(0)); --2016-01-01 00:00:00
SELECT CAST(@DateTime AS datetime2(1)); --2016-01-01 00:00:00.0
SELECT CAST(@DateTime AS datetime2(2)); --2016-01-01 00:00:00.03
SELECT CAST(@DateTime AS datetime2(3)); --2016-01-01 00:00:00.003
SELECT CAST(@DateTime AS datetime2(4)); --2016-01-01 00:00:00.0033
SELECT CAST(@DateTime AS datetime2(5)); --2016-01-01 00:00:00.00333
SELECT CAST(@DateTime AS datetime2(6)); --2016-01-01 00:00:00.003333
SELECT CAST(@DateTime AS datetime2(7)); --2016-01-01 00:00:00.0033333
GO
DECLARE @DateTime datetime = '2016-01-01T00:00:00.007';
SELECT CAST(@DateTime AS datetime2(0)); --2016-01-01 00:00:00
SELECT CAST(@DateTime AS datetime2(1)); --2016-01-01 00:00:00.0
SELECT CAST(@DateTime AS datetime2(2)); --2016-01-01 00:00:00.01
SELECT CAST(@DateTime AS datetime2(3)); --2016-01-01 00:00:00.007
SELECT CAST(@DateTime AS datetime2(4)); --2016-01-01 00:00:00.0067
SELECT CAST(@DateTime AS datetime2(5)); --2016-01-01 00:00:00.00667
SELECT CAST(@DateTime AS datetime2(6)); --2016-01-01 00:00:00.006667
SELECT CAST(@DateTime AS datetime2(7)); --2016-01-01 00:00:00.0066667
GO

This behavior change provides a more accurate converted value but may break applications that expect the converted value to be the same as the rounded datetime value as was the case before SQL Server 2016.

Be aware than the full raw datetime precision (instead of the rounded value) is also used when evaluating predicates involving a datetime type. The full precision of both arguments are used, resulting in the equality compare predicate to evaluate to false in both scripts below. The greater than predicate is true in the first script and the less than predicate is true in the second:

--This script prints GREATER THAN predicate is true
DECLARE @DateTime datetime = '2016-01-01T00:00:00.003';
DECLARE @DateTime2 datetime2(7) = @DateTime;
IF @DateTime = @DateTime2 PRINT 'EQUAL predicate is true' ELSE PRINT 'EQUAL predicate is not true';
IF @DateTime < @DateTime2 PRINT 'LESS THAN predicate is true' ELSE PRINT 'LESS THAN predicate is not true';
IF @DateTime > @DateTime2 PRINT 'GREATER THAN predicate is true' ELSE PRINT 'GREATER THAN predicate is not true';
GO
--This script prints LESS THAN predicate is true
DECLARE @DateTime datetime = '2016-01-01T00:00:00.007';
DECLARE @DateTime2 datetime2(7) = @DateTime;
IF @DateTime = @DateTime2 PRINT 'EQUAL predicate is true' ELSE PRINT 'EQUAL predicate is not true';
IF @DateTime < @DateTime2 PRINT 'LESS THAN predicate is true' ELSE PRINT 'LESS THAN predicate is not true';
IF @DateTime > @DateTime2 PRINT 'GREATER THAN predicate is true' ELSE PRINT 'GREATER THAN predicate is not true';
GO

To provide insight into why the comparisons result in greater than and less than respectively, the script below shows the nanoseconds value of the compared data types:

DECLARE @DateTime datetime = '2016-01-01T00:00:00.003';
DECLARE @DateTime2 datetime2(7) = @DateTime;
SELECT DATEPART(nanosecond, @DateTime); --3333333
SELECT DATEPART(nanosecond, @DateTime2); --3333300
GO
DECLARE @DateTime datetime = '2016-01-01T00:00:00.007';
DECLARE @DateTime2 datetime2(7) = @DateTime;
SELECT DATEPART(nanosecond, @DateTime); --6666666
SELECT DATEPART(nanosecond, @DateTime2); --6666700
GO

The datetime2 type is accurate only to 100 nanosecond whereas datetime includes values to the nanosecond (and beyond) because the theoretical precision of repeating decimal values is unlimited. The implication is that a datetime type with a repeating decimal value will never compare equally with any temporal type except datetime.

Datetime conversion and comparison behavior is controlled by the database compatibility level. Databases in SQL Server 2016 level (130) use the new behavior and the legacy behavior is used with other levels.

Summary
These datetime behavior changes have the benefit of improved accuracy and performance of datetime conversion/comparison. Affected applications can use a pre-SQL Server 2016 database compatibility level until they can be remediated.

I recommend one avoid comparing datetime directly to other temporal types going forward. Instead convert the datetime value to the type being compared and use the converted value in the predicate. It’s generally best practice to match data types whenever possible for performance and to avoid ambiguity.

SQL Server and Hyper-threading

I had a client ask whether or not SQL Server could take advantage of all 32 logical processors if they enabled hyper-threading on their server. They were running SQL Server 2012 Standard Edition on a server with 4-sockets, each would 4 cores. As in nearly all things related to SQL Server, the answer is “it depends”.

The compute capacity of SQL Server 2012 Standard Edition (and later versions as of this writing) is the lesser of 4 sockets or 16 cores regardless of the number of logical processors. With hyper-threading enabled and running SQL Server on bare metal, the number of logical processors doubles from 16 to 32. SQL Server 2012 SE can use all 32 of these logical processors because the compute limit is based on physical sockets/cores, not logical processors.

However, SQL Server 2012 SE can use only 16 of the 32 logical processors when running virtualized on the same physical server. This is because the physical processor architecture isn’t exposed to the guest OS. Logical processors available to the virtualization host are mapped to guest VMs and these appear as virtual sockets and cores to both the OS and SQL Server with no notion of hyper-threading. The end result is that SQL Server 2012 SE will use no more than 16 logical processors when virtualized with or without hyper-threading enabled.

Other Hyper-threading Considerations
Regardless of the SQL Server edition, one should enable hyper-threading only if the OS and SQL Server can utilize the additional logical processors. This practice guarantees logical processors and physical cores are one and the same, providing the best performance possible. When additional logical processors provided by hyper-threading can be used, the SQL Server performance benefits, or even detriments, are very work-load dependent. The only way to accurately ascertain HT performance is with your actual production workload.

My past experience is that OLTP workloads benefit the most from hyper-threading. In the early days of HT technology (before Windows and SQL Server were HT-aware), I observed a 15-20% performance improvement for an OLTP workload. Unfortunately, reporting workload performance could worsen with HT enabled. Slava Oks’s blog post may explain why.

The world has changed a lot since then. SQL Server is now both HT and NUMA aware, with improved locking primitives and memory-optimized structures available that avoid locking entirely. With current SQL Server versions running bare-metal on modern hardware, I suggest you enable HT unless your actual workload experience dictates otherwise. Premature optimization is evil; don’t disable HT preemptively without justification with your own workload or you may be wasting free CPU cycles. Importantly, limit MAXDOP per this support article.

See Linchi’s Shea’s post and also this one for reporting workload tests with and without HT enabled. Also, see Joe Chang’s post. The first graph is especially interesting in that it shows an actual production OLTP workload running with and without HT. The remainder of the post shows results of the TPC-H benchmark tests, a synthetic decision support (reporting) workload.

Again, your workload is unique so only you can ascertain if HT is actually beneficial.

Table Partitioning Best Practices

SQL Server table partitioning has a number of gotchas without proper planning.  This article demonstrates those that commonly cause grief and recommends best practices to avoid them.

Implications of the Partition Function Range Specification

One needs a good understanding of how the RANGE LEFT/RIGHT specification affects partition setup and management.  The RANGE specification determines:

  • The partition created by SPLIT
  • The partition removed with MERGE
  • The permanent partition that can never be removed from partition schemes

Below are nuances of the RANGE specification that commonly surprise people.

The Partition Created by SPLIT

New partitions are created by splitting a partition function.  A partition function SPLIT splits an existing partition into 2 separate ones, changing all of the underlying partition schemes, tables, and indexes.  Below are the actions performed when a LEFT or RIGHT partition is SPLIT, with important the differences in bold:

The actions performed by a SPLIT of a RANGE LEFT partition function:

  • Identify existing partition to be split, which is the one that contains the new boundary (or the last partition if no existing boundaries are higher than the one being added)
  • Add the new boundary to the partition function, maintaining boundary order and incrementing subsequent partition numbers
  • Create a new partition to the left of the existing one on the NEXT USED filegroup of each partition scheme that uses the function
  • For each table/index using the affected partition scheme(s), move rows from the existing split partition that are less than or equal to the new boundary into the newly created partition on the left

The actions performed by a SPLIT of a RANGE RIGHT partition function:

  • Identify existing partition to be split, which is the one that contains the new boundary (or the first partition if no existing boundaries are less than the one being added)
  • Add the new boundary to the partition function, maintaining boundary order and incrementing subsequent partition numbers
  • Create a new partition to the right of the existing one on the NEXT USED filegroup for each partition scheme that uses the function
  • For each table/index using the affected partition scheme(s), move rows from the existing split partition that are greater than or equal to the new boundary into the newly create partition on the right

I generally suggest one use RANGE RIGHT instead because it is more natural, and helps avoid common pitfalls when adding incremental partition boundaries.  RANGE LEFT behavior is not intuitive and trips up many DBAs but I will discuss for completeness.  Consider this example of a RANGE LEFT partition function on a datetime column, where the initial setup is for 2 years of data (2013 and 2014):

CREATE DATABASE PartitioningDemo ON PRIMARY
        ( NAME = N'Primary', FILENAME = N'C:\SqlDataFiles\PartitioningDemo.mdf' , SIZE = 100MB , MAXSIZE = UNLIMITED, FILEGROWTH = 10MB ),
    FILEGROUP FG_2013
        ( NAME = N'FG_2013_1', FILENAME = N'C:\SqlDataFiles\PartitioningDemo_FG_2013_1.ndf' , SIZE = 10MB , MAXSIZE = UNLIMITED, FILEGROWTH = 10MB ),
    FILEGROUP FG_2014
        ( NAME = N'FG_2014_1', FILENAME = N'C:\SqlDataFiles\PartitioningDemo_FG_2014_1.ndf' , SIZE = 10MB , MAXSIZE = UNLIMITED, FILEGROWTH = 10MB )
    LOG ON
        ( NAME = N'Log_1', FILENAME = N'C:\SqlLogFiles\PartitioningDemo_Log_1.ldf' , SIZE = 100MB , MAXSIZE = 10GB , FILEGROWTH = 10MB);
GO

USE PartitioningDemo;
GO

CREATE PARTITION FUNCTION PF_DateTimeLeft(datetime) AS
    RANGE LEFT FOR VALUES(
        N'2013-12-31T23:59:59.997'
    );

CREATE PARTITION SCHEME PS_DateTimeLeft AS
    PARTITION PF_DateTimeLeft TO(
        FG_2013
        , FG_2014
    );

CREATE TABLE dbo.PartitionedTableLeft(
    DateTimeColumn datetime
    ) ON PS_DateTimeLeft(DateTimeColumn);

INSERT INTO dbo.PartitionedTableLeft VALUES
      (N'2013-01-01T01:02:03.340')
    , (N'2013-02-03T04:05:06.780')
    , (N'2014-01-01T01:02:03.340')
    , (N'2014-02-03T04:05:06.780')
    , (N'2014-03-04T05:06:07.890');
GO

ALTER DATABASE PartitioningDemo
    ADD FILEGROUP FG_2015;

ALTER DATABASE PartitioningDemo
    ADD FILE
        ( NAME = N'FG_2015_1', FILENAME = N'C:\SqlDataFiles\PartitioningDemo_FG_2015_1.ndf' , SIZE = 10MB , MAXSIZE = UNLIMITED, FILEGROWTH = 10MB )
    TO FILEGROUP FG_2015;

ALTER PARTITION SCHEME PS_DateTimeLeft
    NEXT USED FG_2015;

ALTER PARTITION FUNCTION PF_DateTimeLeft()
    SPLIT RANGE(N'2014-12-31T23:59:59.997');

INSERT INTO dbo.PartitionedTableLeft VALUES
    (N'2015-01-01T01:02:03.340');
GO

This initial setup results in 2 partitions and data properly mapped to the 2 yearly filegroups of the scheme.  Now, we need to prepare for year 2015 so we add a new filegroup, set the NEXT USED filegroup, SPLIT the function for year 2015, and insert data for 2015:

ALTER DATABASE PartitioningDemo
    ADD FILEGROUP FG_2015;

ALTER DATABASE PartitioningDemo
    ADD FILE
        ( NAME = N'FG_2015_1', FILENAME = N'C:\SqlDataFiles\PartitioningDemo_FG_2015_1.ndf' , SIZE = 10MB , MAXSIZE = UNLIMITED, FILEGROWTH = 10MB )
    TO FILEGROUP FG_2015;

ALTER PARTITION SCHEME PS_DateTimeLeft
    NEXT USED FG_2015;

ALTER PARTITION FUNCTION PF_DateTimeLeft()
    SPLIT RANGE(N'2014-12-31T23:59:59.997');

INSERT INTO dbo.PartitionedTableLeft VALUES
    (N'2015-01-01T01:02:03.346');
GO

This SPLIT results in adding the new boundary to the function as expected:

CREATE PARTITION FUNCTION PF_DateTimeLeft(datetime) AS
    RANGE LEFT FOR VALUES(
          N'2013-12-31T23:59:59.997'
        , N'2014-12-31T23:59:59.997'
);

However, the partition scheme is not as desired.  The resultant partition scheme is:

CREATE PARTITION SCHEME PS_DateTimeLeft AS
    PARTITION PF_DateTimeLeft TO(
          FG_2013
        , FG_2015
        , FG_2014
    );

As you can see, the filegroup for 2015 was inserted into the scheme before the 2014 one, resulting in 2014 data in the FG_2015 filegroup and 2015 data in the FG_2014 filegroup.  Furthermore, not only is the filegroup mapping now wrong, data movement was required to move the entire year of 2014 data into the new partition.  No big deal here since only 3 rows were moved by this demo script but in a production table, this movement could be a show stopper.  Logging during SWITCH/MERGE data movement during is about 4 times that of normal DML, which is especially costly when working with large tables containing millions or billions of rows as is commony used in table partitioning.

Remember that the new partition by a SPLIT is the one that includes the specified boundary, which is to the left of the existing split partition with a RANGE LEFT function (the 2014 partition here).  Data are moved from the existing SPLIT partition into the newly created partition according to the boundaries of the new partition (greater than ‘2013-12-31T23:59:59.997’ and less than or equal to ‘2014-12-31T23:59:59.997’).

Again, I recommend using a RANGE RIGHT function to avoid this non-intuitive behavior.  Below is the equivalent RANGE RIGHT script for yearly partitions, which results in the desired filegroup mappings as well as no data movement.  Also, note that the datetime boundaries are exact date specifications with RANGE RIGHT, which is also more intuitive when working with temporal datetime, datetime2, and datetimeoffset data types that include a time component.

CREATE PARTITION FUNCTION PF_DateTimeRight(datetime) AS
    RANGE RIGHT FOR VALUES(
        N'2014-01-01T00:00:00.000'
    );

CREATE PARTITION SCHEME PS_DateTimeRight AS
    PARTITION PF_DateTimeRight TO(
          FG_2013
        , FG_2014
    );

CREATE TABLE dbo.PartitionedTableRight(
    DateTimeColumn datetime
    ) ON PS_DateTimeRight(DateTimeColumn);

INSERT INTO dbo.PartitionedTableRight VALUES
      (N'2013-01-01T01:02:03.340')
    , (N'2013-02-03T04:05:06.780')
    , (N'2014-01-01T01:02:03.340')
    , (N'2014-02-03T04:05:06.780')
    , (N'2014-03-04T05:06:07.890');
GO

ALTER PARTITION SCHEME PS_DateTimeRight
    NEXT USED FG_2015;

ALTER PARTITION FUNCTION PF_DateTimeRight()
    SPLIT RANGE(N'2015-01-01T00:00:00.000');

INSERT INTO dbo.PartitionedTableRight VALUES
    (N'2015-01-01T01:02:03.346');

The resultant partition function and scheme definitions are as desired with this RANGE RIGHT partition function:

CREATE PARTITION FUNCTION PF_DateTimeRight (datetime) AS
    RANGE LEFT FOR VALUES(
          N'2014-01-01T00:00:00.000'
        , N'2015-01-01T00:00:00.000'
    );

CREATE PARTITION SCHEME PS_DateTimeRight AS
    PARTITION PF_DateTimeRight TO(
          FG_2013
        , FG_2014
        , FG_2015
    );

The Partition Removed by MERGE

When a partition is removed with MERGE, the dropped partition is the one that includes the specified boundary.  If the dropped partition is not empty, all data will be moved into the adjacent remaining partition.  Like SPLIT, costly data movement during partition maintenance should be avoided so it is best to plan such than only empty partitions are removed.  A MERGE should typically done after a purge/archive of data using SWITCH.

Below is a summary of a RANGE LEFT partition function MERGE:

  • Identify existing partition to be removed, which is the one that includes (to the left of) the specified existing boundary
  • For each table/index using the affected partition scheme(s), move rows from this partition into the adjacent right partition
  • Remove boundary from the partition function, maintaining boundary order and decrementing subsequent partition numbers
  • Remove the filegroup from each partition scheme that uses the function, unless the filegroup is mapped to another partition or is the NEXT USED filegroup

Below is a summary of a RANGE RIGHT partition function MERGE:

  • Identify existing partition to be removed, which is the one that includes (to the right of) the specified existing boundary
  • For each table/index using the affected partition scheme(s), move rows from this partition into the adjacent left partition
  • Remove boundary from the partition function, maintaining boundary order and decrementing subsequent partition numbers
  • Remove the filegroup from each partition scheme that uses the function, unless the filegroup is mapped to another partition or is the NEXT USED filegroup

I suggest explicit partition boundaries for expected data ranges to facilitate using both MERGE and SPLIT, and an additional one for the permanent partition (which I’ll detail shortly).  This practice helps ensure data are both logically and physically aligned, providing more natural partition management.

The Permanent Partition

You might not be aware that each partition scheme has a permanent partition that can never be removed.  This is the first partition of a RANGE RIGHT function and the last partition of a RANGE LEFT one.  Be mindful of this permanent partition when creating a new partition scheme when multiple filegroups are involved because the filegroup on which this permanent partition is created is determined when the partition scheme is created and cannot be removed from the scheme.

My recommendation is that one create explicit partition boundaries for all expected data ranges plus a lower and upper boundary for data outside the expected range, and map these partitions to appropriately named filegroups.  This practice will make the purpose of each partition/filegroup clear and help avoid accidentally placing data on the wrong filegroup.

Consider mapping partitions containing data outside the expected range to a dummy filegroup with no underlying files.  This will guarantee data integrity much like a check constraint because data outside the allowable range cannot be inserted.  If you must accommodate errant data rather than rejecting it outright, instead map these partitions to a generalized filegroup like DEFAULT or one designated specifically for that purpose.

I suggest specifying a NULL value for the first boundary of a RANGE RIGHT partition function.  This NULL boundary serves as the upper boundary of the permanent first partition as well as the lower boundary for the second partition containing data outside the expected range.  No rows are less than NULL so the first partition will always be empty.  It is therefore safe to map the first partition to the previously mentioned dummy filegroup even if you need to house data outside the expected range.  That being said, there is no harm in mapping the first partition to another filegroup other than lack of clarity.

For the last boundary of a RANGE RIGHT function, I suggest specifying the lowest value outside the expected range and also mapping the partition to either the dummy filegroup, or one designated to contain unexpected data.  The boundaries between the first boundary (NULL) and this one designate partitions for expected data.

Summary

In summary, I recommend a RANGE RIGHT function with the following setup:

  • First boundary value NULL
  • Subsequent boundary values for expected data partitions
  • A final boundary value of greater than the expected range
  • Map first, second, and last partitions to either a dummy filegroup or one designated for unexpected data
  • Map remaining expected data partitions to appropriately named filegroups

A similar RANGE LEFT function can be setup as follows.  I’m including this for only for completeness as the RANGE RIGHT setup above is a best practice, in my humble opinion.

  • First boundary for data less than the expected range
  • Subsequent boundaries for expected data partitions
  • A final boundary value of the maximum allowable value for the partitioning data type (which is another kludge that bolsters the case for RANGE RIGHT)
  • Map first, second from last, and last partitions to either a dummy filegroup or one designated for unexpected data
  • Map remaining expected data partitions to appropriately named filegroups

Below is an example script of applying these techniques with a RANGE RIGHT function, including adding an incremental partition for a new year.  Partitions for data outside the expected data range are mapped to the FG_NoData filegroup (which contains no files) so any attempt to insert data outside the expected range will fail.

--*********************************
--*** initial setup for 2 years ***
--*********************************

--add a dummy filegroup with no files
ALTER DATABASE PartitioningDemo
    ADD FILEGROUP FG_NoData;
GO

CREATE PARTITION FUNCTION PF_DateTimeRightWithExplictBoundaries(datetime) AS
    RANGE RIGHT FOR VALUES(
          NULL --prior to year 2013 (outside expected range)
        , N'2013-01-01T00:00:00.000' --year 2013 (within expected range)
        , N'2014-01-01T00:00:00.000' --year 2014 (within expected range)
        , N'2015-01-01T00:00:00.000' --after year 2014 (outside expected range)
    );

CREATE PARTITION SCHEME PS_DateTimeRightWithExplictBoundaries AS
    PARTITION PF_DateTimeRightWithExplictBoundaries TO(
          FG_NoData --first partition permanent partition and cannot be removed
        , FG_NoData --second partition contains data prior to year 2013 and is outside expected range
        , FG_2013 --year 2013 (within expected range)
        , FG_2014 --year 2014 (within expected range)
        , FG_NoData --after year 2014 (outside expected range)
    );

CREATE TABLE dbo.PartitionedTableRightWithExplictBoundaries(
        DateTimeColumn datetime
    ) ON PS_DateTimeRightWithExplictBoundaries(DateTimeColumn);

INSERT INTO dbo.PartitionedTableRightWithExplictBoundaries VALUES
      (N'2013-01-01T01:02:03.346')
    , (N'2013-02-03T04:05:06.780')
    , (N'2014-01-01T01:02:03.346')
    , (N'2014-02-03T04:05:06.780')
    , (N'2014-03-04T05:06:07.890');
GO

--**********************
--*** add a new year ***
--**********************

--specify FG_NoData filegroup to scheme as next used
ALTER PARTITION SCHEME PS_DateTimeRightWithExplictBoundaries
    NEXT USED FG_NoData;

--add 2016 boundary as outside expected range, mapped to FG_NoData
ALTER PARTITION FUNCTION PF_DateTimeRightWithExplictBoundaries()
    SPLIT RANGE(N'2016-01-01T00:00:00.000');

--remove 2015 boundary, currently mapped to FG_NoData
ALTER PARTITION FUNCTION PF_DateTimeRightWithExplictBoundaries()
    MERGE RANGE(N'2015-01-01T00:00:00.000');

--specify FG_2015 filegroup to scheme as next used
ALTER PARTITION SCHEME PS_DateTimeRightWithExplictBoundaries
    NEXT USED FG_2015;

--re-add 2015 boundary as within expected range, now mapped to FG_2015
ALTER PARTITION FUNCTION PF_DateTimeRightWithExplictBoundaries()
    SPLIT RANGE(N'2015-01-01T00:00:00.000');

INSERT INTO dbo.PartitionedTableRightWithExplictBoundaries VALUES
    (N'2015-01-01T01:02:03.346');

 

Always Use Semicolon Statement Terminators

ANSI-standard semicolon statement terminators are often omitted in T-SQL queries and many developers are unaware that this is syntax is deprecated.  Omitting statement terminators is a dangerous practice because, even if the batch compiles, you may get unexpected results.  Consider the insidious examples below pointed out by SQL Server MVP Erland Sommarskog:

BEGIN TRY
	BEGIN TRAN
	SELECT 1/0 AS CauseAnException
	COMMIT
END TRY
BEGIN CATCH
	SELECT ERROR_MESSAGE()
	THROW
END CATCH

Few of us will catch (no pun intended) the bug in the above script.  What results do you expect after running the above script under SQL Server 2012 or later versions?  Rather than leave this as an exercise for the reader, I’ll spoil the fun and mention that no run-time error is raised at all.  Instead, the THROW statement is interpreted as a column alias for the ERROR_MESSAGE() column.  This sort of coding error is especially nasty because catch blocks are rarely unit tested and this catch block coding mistake hides the run-time error entirely without raising an exception.

Similarly, the absence of statement terminators in the script below causes another problem.  Can you spot it?

BEGIN TRY
	BEGIN TRAN
	SELECT 1/0 AS CauseAnException
	COMMIT
END TRY
BEGIN CATCH
    IF @@TRANCOUNT > 0 ROLLBACK TRAN
    THROW
END CATCH

At least an error is raised in this case, albeit not the one you might expect.  The resultant error is “Cannot roll back THROW. No transaction or savepoint of that name was found”.  This coding bug obfuscates the preceding divide by zero error and prevents the THROW statement from being executed.

Semicolons Will Become Mandatory
Microsoft announced with the SQL Server 2008 release that semicolon statement terminators will become mandatory in a future version so statement terminators other than semicolons (whitespace) are currently deprecated.  This deprecation announcement means that you should always use semicolon terminators in new development.  I honestly don’t expect SQL Server to strictly enforce mandatory semicolons in the near future but it is still a best practice to use semicolon statement to avoid issues like those mentioned earlier as well as facilitate code maintainability.  I suggest specifying statement terminators in all new development and perhaps adding terminators to existing code as you perform maintenance.

Transact-SQL does not currently enforce the ANSI semicolon statement terminator requirement.  Instead, semicolon statement terminators are optional and any whitespace (spaces, tabs, newline) may be used instead.  The exception to this rule is that many of the statements introduced in SQL Server 2005 and later require the preceding statement to be properly terminated in order for the batch to compile.

Below are some guidelines I suggest on when to, and when not to, use semicolon statement terminators.

Suggested Guidelines
The Transact-SQL parser is quite lax, allowing any whitespace (e.g. space, tab, newline) to be used.  This laxness results in ambiguity like the examples at the beginning of this article demonstrate.  Similarly, statement terminators may not only be omitted, they may also be used in inappropriately.  I strongly suggest you adhere to the T-SQL syntax documented in the Books Online even if the parser allows otherwise.  This practice will help future-proof your code since relying on undocumented behavior is inherently risky.

Don’t precede a statement with a semicolon
Remember that the purpose of semicolons is to terminate SQL statements, not begin them.  A common mistake I see is throwing a semicolon in front of statements in order to get a batch of statements to compile, especially with newer statements like WITH (CTE expression) that require previous statement termination.  Although the T-SQL parser currently ignores extraneous and misplaced semi-colons, I suggest they be specified in the appropriate place according statement syntax documented in the SQL Server Books Online.

Specify semicolons at the end of each stand-alone SQL statement
Not only will this conform to the ANSI standard, your intent will be clearer and the code easier to read.

Terminate control-of-flow statement blocks at the end of the control-of-flow scope
Control-of-flow statements are not covered by the ANSI SQL standard because these are proprietary SQL extensions.  The SQL Server Books Online is sketchy on the subject and many of the examples (as of this writing) are inconsistent and do not always include statement terminators.  Furthermore, control-of-flow statement blocks are confusing due to the many variations, nesting, and optional BEGIN/END specifications.

Below are examples illustrating what I believe to be proper use of statement terminators control-of-flow block terminators using IF statements in SQL 2008 and later versions.  The same concepts apply to other control-of-flow constructs like WHILE and TRY/CATCH blocks.  I should add that this batch example will not compile under SQL 2005 because an explicit BEGIN/END block is required to execute a common table expression conditionally in that version.  T-SQL parser enhancements eliminated that requirement in SQL 2008 and later.

IF 1 = 1
  WITH a AS (SELECT 1 a) SELECT a FROM a; --terminate statement and IF

IF 1 = 1
  WITH a AS (SELECT 1 a) SELECT a FROM a; --terminate statement
ELSE
  WITH a AS (SELECT 1 a) SELECT a FROM a; --terminate statement and IF

IF 1 = 1
  BEGIN
    WITH a AS (SELECT 1 a) SELECT a FROM a; --terminate statement
  END; --terminate IF

IF 1 = 1
  BEGIN
    WITH a AS (SELECT 1 a) SELECT a FROM a; --terminate statement
  END
ELSE
  BEGIN
    WITH a AS (SELECT 1 a) SELECT a FROM a; --terminate statement
  END; --terminate IF

IF 1 = 1
  WITH a AS (SELECT 1 a) SELECT a FROM a; --terminate statement
ELSE
  BEGIN
    WITH a AS (SELECT 1 a) SELECT a FROM a; --terminate statement
  END; --terminate IF

IF 1 = 1
  BEGIN
    WITH a AS (SELECT 1 a) SELECT a FROM a; --terminate statement
  END
ELSE
  WITH a AS (SELECT 1 a) SELECT a FROM a; --terminate statement and IF

IF 1 = 1
  BEGIN
    WITH a AS (SELECT 1 a) SELECT a FROM a; --terminate statement
	IF 1 = 1
		WITH a AS (SELECT 1 a) SELECT a FROM a; --terminate statement
	ELSE
	BEGIN
		WITH a AS (SELECT 1 a) SELECT a FROM a; --terminate statement
	END; --terminate inner nested IF
	WITH a AS (SELECT 1 a) SELECT a FROM a; --terminate statement
  END
ELSE
  WITH a AS (SELECT 1 a) SELECT a FROM a; --terminate statement and outer IF

Summary
Consistent user of semicolons helps avoid bugs in code that might otherwise go undetected.  Code with statement terminators can also be more easily modified without introducing compile errors and make code easier to maintain because the end of each statement is readily apparent to subsequent developers.  Importantly, you’ll be better positioned for future SQL Server versions by consistently using semicolon statement terminators.

Help Stop SQL Injection Madness

SQL injection continues to be the most exploited security vulnerability on the internet. A SQL injection vulnerability can put companies out of business and expose innocent people to sensitive data disclosure and damage, all due to developer ignorance or carelessness. Sadly, although there is much information available on SQL injection exploits and mitigation, SQL injection exploits continue because developers use flawed database access techniques. This article details the best, and simplest, method to prevent SQL injection in SQL Server: parameterized SQL statements.

Parameterized SQL queries and stored procedure calls (with CommandType.StoredProcedure and no dynamic SQL within proc code) prevent unintended SQL from being executed. I’ll discuss exactly why shortly. Parameterized SQL also provides many benefits besides security. Notably:

  • allows quotes in data without programmatic escaping
  •  avoids the need to format date strings according to DATEFORMAT session settings
  • handles decimal separators automatically
  •  improves performance due to plan reuse
  •  provides cleaner code

Most articles on SQL injection focus on validating user input rather than parameterized queries.  Input validation certainly has benefits related to data integrity and user-friendly validation messages but it is at best a secondary defense against injection compared to parameterized queries.  Input validation using a blacklist of prohibited keywords and characters is especially weak.

SQL Injection Overview
I am puzzled by the number of examples in articles and forum posts that show building SQL statements with literals using a string concatenation technique.  I suspect the main reason for this practice is that developers write code that generates the exact same ad-hoc SQL statement they would write using a query tool like SQL Server Management Studio or Visual Studio.  This leads to the nasty habit of not using parameters.  For example, a developer might develop and test a query like this in SSMS:

SELECT FirstName, LastName, EmailAddress
FROM dbo.Customer
WHERE CustomerID = 12345;

After testing the query, the C# developer includes the SQL statement in the application code but, instead of a hard-coded value for CustomerID, the desired value is concatenated with the remainder of the SQL statement as a literal.

var command = new SqlCommand(
             "SELECT FirstName, LastName, EmailAddress" +
            " FROM dbo.Customer " +
            " WHERE CustomerID = '" + txtBoxCustomerID.Text + "';"
            ,connection);

The query runs perfectly well and returns the expected result in testing.  It is not until after production deployment that a script kiddie or hacker finds the SQL injection vulnerability.  A malicious person can easily manipulate this SQL statement in ways unintended by the developer.  For example, a hacker could enter CustomerID value “1 OR 1 = 1” in the text box to harvest the names and email addresses of all customers in the database.  This vulnerability can also be exploited to execute additional SQL statements in the same batch, which is often used to update data with malicious html script that is subsequently returned and executed by unsuspecting client browsers.  There are many ways to exploit a SQL injection vulnerability that result in sensitive data disclosure and/or database manipulation.  I won’t detail those here but rather recommend a very simple defense; use a parameterized statement instead of literals.

Mitigating SQL Injection
Below is a functionally identical parameterized query.

SELECT FirstName, LastName, EmailAddress
FROM dbo.Customer
WHERE CustomerID = @CustomerID;

This query can be incorporated into the C# application code with the parameter and corresponding value added to the parameterized SQL command as a strongly-typed parameter:

var command = new SqlCommand(
             "SELECT LastName" +
            " FROM dbo.Customer " +
            " WHERE CustomerID = @CustomerID;"
            ,connection);
command.Parameters.Add("@CustomerID", SqlDbType.Int).Value = int.Parse(textBoxCustomerID.Text);

As you can see, the effort needed to do the same job with a parameterized command is about the same as the string concatenation method, yet the security benefits are immeasurable.

Why Parameters Prevent SQL Injection
Parameterized SQL is more secure for a couple of reasons.  In the case of an ad-hoc query, the SQL statement with parameter markers is hard-coded in the application (or generated via an ORM framework) and passed to SQL Server as-is.  Parameter values are passed separately rather than inside the SQL statement itself.  This ensures the SQL statement cannot be modified by the values provided.

Similarly, a stored procedure call (using CommandType.StoredProcedure) sends only the stored procedure name to SQL Server with separately passed parameter values.  Using stored procedures also allow one to more strictly adhere to the security principal of least privilege.  By granting only stored procedure execute permissions, permissions on indirectly referenced objects are not needed as long as the ownership chain is unbroken.  This practices limits ad-hoc access to underlying data should an attacker find a back door into the database under the application security context.  Personally, I’m a staunch advocate of stored procedures for this and other reasons too.  Stored procedures provide a well-defined database interface, allowing queries and database schema to be refactored without breaking changes.

I should add that it is perfectly acceptable to build a SQL statement dynamically in either application code or stored procedures, but only if parameterized and actually needed for the task at hand.  A common use case is a dynamic search query that is constructed based on user criteria. Instead of coding a separate query for each possible search permutation, one can add parameterized WHERE clauses as needed.  Under no circumstances should a SQL statement string be built with concatenation of untrusted values.

Additional Precautions
Use Windows Authentication for SQL Server database access.  This practice eliminates the need to store security credentials in application configuration files.  If SQL authentication must be used, protect the credentials with encryption.  This is easily done in .NET by encrypting the connectionStrings section of the app.config file.  See http://msdn.microsoft.com/en-us/library/ms254494.aspx.

Adhere to Principal of least privilege.  Never use a privileged account, such as a sysadmin role member, for routine application access.  Grant permissions only on database objects which are used directly by applications.  Use stored procedures so that one can leverage ownership chaining, eliminating the need to grant permissions on tables directly.

Specify strongly-typed parameters.  The parameter data type can be inferred (e.g. using SqlCommand.Parameters.AddWithValue method) but a best practice is to explicitly specify a parameter data type that matches the underlying column data type and maximum length.  These practices will minimize the size of the SQL Server procedure cache and avoid implicit conversions that can negatively affect performance.

Improving Uniqueidentifier Performance

A common anti-pattern I run into is the random primary key, commonly a GUID. This design is insidious because the performance implications of random access aren’t immediately obvious and exacerbated when the primary key index is clustered. It is often only after the table grows to a larger size that the performance problems become apparent. Symptoms include slowly degrading performance over time, with increased blocking and deadlocking as a side effect.

Figure 1 shows the performance profile of a random inserts with a random GUID (SQL Server uniqueidentifier data type) clustered primary key. The red line indicates the rate of batch requests per second (inserts) while the blue line shows the total number of rows in the table, scaled such that the top of the graph represents 3M rows. Only about 700, 000 rows could be inserted during this 15 minute single-threaded random key insert test, even though the insert rate was fast initially.

Figure 1: Random key insert performance
Random insert performance graph

Incremental Primary Keys

As you might guess, the cure for the random primary key anti-pattern is an incremental key pattern. With a uniqueidentifier data type, a sequential value can be assigned by SQL Server using the NEWSEQUENTIALID function (in a default constraint expression) or in application code using the UuidCreateSequential Win32 API call along with some byte swapping (code example below). Alternatively, one can use an integral data type (int, bigint, etc.) along with a value generated by an IDENTITY property or a SEQUENCE object. The advantage of an integral type is the reduced space requirements compared to a 16-byte uniqueidentifier. The advantage of a uniqueidentifier is that it can easily be generated in application code before database persistence without a database round trip, which is desirable for distributed applications and when keys of related tables are assigned in application code before writing to the database.

Figure 2 shows the same test using a sequential key value. Over 2.2M rows were inserted in 15 minutes. As you can see, significant performance improvement is achieved with this trivial application change.

Figure 2: Incremental key insert performance
Random insert performance graph

Listing 1 shows the T-SQL code I used for these performance tests and listing 2 contains the C# code (with the random GUID commented out). I generated the uniqueidentifier value via application code in the tests but performance with NEWID() is comparable to the first test and NEWSEQUENTIALID() is similar to the second test.

Listing 1: T-SQL scripts for test table and stored procedure

CREATE TABLE dbo.TestTable(
	TestKey uniqueidentifier NOT NULL
		CONSTRAINT PK_TestTable PRIMARY KEY CLUSTERED
	,TestData char(8000) NOT NULL
);
GO

CREATE PROC dbo.InsertTestTable
	@TestKey uniqueidentifier
	,@TestData char(8000)
AS
SET NOCOUNT ON;
DECLARE @TotalRows int;

--insert row
INSERT INTO dbo.TestTable (TestKey, TestData) 
	VALUES(@TestKey, @TestData);

--update pmon counter for rowcount
SELECT @TotalRows = rows 
FROM sys.partitions 
WHERE object_id = OBJECT_ID(N'TestTable') AND index_id = 1;
EXEC sys.sp_user_counter1 @TotalRows; --for pmon row count

RETURN @@ERROR;
GO

Listing 2: C# insert test console application

using System;
using System.Data;
using System.Data.SqlClient;
using System.Runtime.InteropServices;

namespace UniqueIdentifierPerformanceTest
{
    class Program
    {
        [DllImport("rpcrt4.dll", CharSet = CharSet.Auto)]
        public static extern int UuidCreateSequential(ref Guid guid);

        static string connectionString = @"Data Source=MyServer;Initial Catalog=MyDatabase;Integrated Security=SSPI";
        static int rowsToInsert = 10000000;
        static SqlConnection connection;
        static SqlCommand command;

        static void Main(string[] args)
        {

            int rowsInserted = 0;

            using (connection = new SqlConnection(connectionString))
            {
                using (command = new SqlCommand("dbo.InsertTestTable", connection))
                {
                    command.Connection = connection;
                    command.CommandType = CommandType.StoredProcedure;
                    command.Parameters.Add("@TestKey", SqlDbType.UniqueIdentifier);
                    command.Parameters.Add("@TestData", SqlDbType.Char, 8000);

                    connection.Open();

                    while (rowsInserted < rowsToInsert)
                    {
                        //random guid
                        //command.Parameters["@TestKey"].Value = Guid.NewGuid();

                        //sequential guid
                        command.Parameters["@TestKey"].Value = NewSequentialGuid();
                        command.Parameters["@TestData"].Value = "Test";
                        command.ExecuteNonQuery();
                        ++rowsInserted;

                        //display progress every 1000 rows
                        if (rowsInserted % 1000 == 0)
                        {
                            Console.WriteLine(string.Format(
                                "{0} of {1} rows inserted"
                                , rowsInserted.ToString("#,##0")
                                , rowsToInsert.ToString("#,##0")));
                        }
                    }
                }

                connection.Close();
            }

        }

        /// 
        /// call UuidCreateSequential and swap bytes for SQL Server format
        /// 
        /// sequential guid for SQL Server
        private static Guid NewSequentialGuid()
        {
            const int S_OK = 0;
            const int RPC_S_UUID_LOCAL_ONLY = 1824;

            Guid oldGuid = Guid.Empty;

            int result = UuidCreateSequential(ref oldGuid);
            if (result != S_OK && result != RPC_S_UUID_LOCAL_ONLY)
            {
                throw new ExternalException("UuidCreateSequential call failed", result);
            }

            byte[] oldGuidBytes = oldGuid.ToByteArray();
            byte[] newGuidBytes = new byte[16];
            oldGuidBytes.CopyTo(newGuidBytes, 0);

            // swap low timestamp bytes (0-3)
            newGuidBytes[0] = oldGuidBytes[3];
            newGuidBytes[1] = oldGuidBytes[2];
            newGuidBytes[2] = oldGuidBytes[1];
            newGuidBytes[3] = oldGuidBytes[0];

            // swap middle timestamp bytes (4-5)
            newGuidBytes[4] = oldGuidBytes[5];
            newGuidBytes[5] = oldGuidBytes[4];

            // swap high timestamp bytes (6-7)
            newGuidBytes[6] = oldGuidBytes[7];
            newGuidBytes[7] = oldGuidBytes[6];

            //remaining 8 bytes are unchanged (8-15) 

            return new Guid(newGuidBytes);

        }

    }
}

Why Random Keys Are Bad

I think it’s important for one to understand why random keys have such a negative impact on performance against large tables. DBAs often cite fragmentation and page splits as the primary causes of poor performance with random keys. Although it is true random inserts do cause fragmentation and splits, the primary cause of bad performance with random keys is poor temporal reference locality (http://en.wikipedia.org/wiki/Locality_of_reference), which I’ll detail shortly. Note that there were no real page splits in these insert performance tests because the nearly 8K row size allowed only one row per page. Although significant extent fragmentation occurred, this didn’t impact these single-row requests; extent fragmentation is mostly an issue with sequential scans against spinning media. So neither splits nor fragmentation explain the poor performance of the random inserts.

Temporal reference locality basically means that once data is used (e.g. inserted or touched in any way), it is likely to be used again in the near future. This is why SQL Server uses a LRU-2 algorithm to manage the buffer cache; data most recently touched will remain in memory while older, less often referenced data are aged out. The impact of random key values on temporal locality (i.e. buffer efficiency) is huge. Consider that inserts are basically rewrites of existing pages. When a new row is inserted into a table, SQL Server first reads the page where the row belongs (by key value if the table has a clustered index) and then either adds the row to the existing data page or allocates a new one if there’s not enough space available in the existing page for the new row. With a random key value, the new key value is unlikely to be adjacent to the last one inserted (which is probably still in memory) so the needed page often must be read from storage.

All things being equal, single-row performance will be roughly the same with both sequential and random keys as long as data are memory resident. This is why the random and sequential key insert tests show the same good performance initially. But once the table size exceeded the size of the buffer pool, the random key test showed a precipitous drop in throughput and steady degradation thereafter. In short, random keys diminish temporal reference locality because there is no correlation between time (most recently accessed data) and the key value.

Why Incremental Keys Good

An incremental key value naturally improves temporal reference locality; the next key value is adjacent to the last one inserted and is likely still in memory. An incremental key provides excellent insert performance regardless of table size as the insert performance test shows. Also, applications typically use recently inserted data more often than older data. This allows the same amount of work to done with much less physical I/O than a random key value.

Random Notes about GUIDs

According to the Globally unique identifier Wiki (http://en.wikipedia.org/wiki/Globally_unique_identifier), the random 122 bits of a GUID can generate 2122 unique values. That’s an incomprehensibly large 5.3 x 1036 (or 5,300,000,000,000,000,000,000,000,000,000,000,000) number unique values.

The value returned by NEWSEQUENTIALID and UuidCreateSequential is guaranteed to be unique on a given computer. Furthermore, it is globally unique if the computer has a network card because the MAC address is used as part of the GUID generation algorithm.

Calendar Table and Date/Time Functions

I frequently see questions in the forums and newsgroups about how to best query date/time data and perform date manipulation.  Let me first say that a permanent calendar table that materializes commonly used DATEPART values along with time periods you frequently use is invaluable.  I’ve used such a table for over a decade with great success and strongly recommend you implement one on all of your database servers.  I’ve included a sample calendar table (and numbers table) later in this post and you can find other variations of such a table via an internet search.

Removing the Time Portion

A common requirement I have is to remove the time portion from a date/time value.  This is easy in SQL 2008 since you can simply “CAST(SomeDateTimeValue AS date)”.  But the date data type is not available in older SQL Server versions so you need an alternate method.  In SQL 2005 and earlier versions, I recommend the DATEADD…DATEDIFF method below with an arbitrary base date value specified in a format that is independent of the session DATAFORMAT setting:

SELECT CAST(GETDATE() AS date); --SQL 2008 and later

SELECT DATEADD(day, DATEDIFF(day, '19000101', GETDATE()), '19000101'); --SQL 2005 and earlier

I often see a variation of the DATEADD…DATEDIFF technique with the integer zero (no quotes) specified as the base date.  Although this may provide the expected results (I’ve done it myself), I caution against it because it relies on implicit conversion from the internal SQL Server integer date/time storage format.  If you want to be concise, a better approach is to specify an empty string for the base date value since the default value is ‘1900-01-01 00:00:00’.  In my opinion, an explicit data value is more intuitive, though.

SELECT DATEADD(day, DATEDIFF(day, '', GETDATE()), '');

 

I also sometimes see code that extracts the year, month and day date parts and concatenates with separators.  However, that method is dependent on session DATEFORMAT settings and slower than other methods.  See Tibor Karaszi’s The ultimate guide to the datetime datatypes article for details.

First and Last Day of Period

Another common task is to determine the first or last day of a given period.  The script below shows how to accomplish this of you don’t have a calendar table with the calculated values available.

DECLARE @Date date = GETDATE();

SELECT 'First day of year' [DateDescription], DATEADD(year, DATEDIFF(year,'19000101',@Date), '19000101') AS [CalendarDate]
UNION ALL
SELECT 'Last day of year', DATEADD(day,-1,DATEADD(year,0,DATEADD(year,DATEDIFF(year,'19000101',@Date)+1,'19000101')))
UNION ALL
SELECT 'First day of month', DATEADD(month, DATEDIFF(month,'19000101',@Date), '19000101')
UNION ALL
SELECT 'Last day of month', DATEADD(day,-1,DATEADD(month,0,DATEADD(month,DATEDIFF(month,'19000101',@Date)+1,'19000101')))
UNION ALL
SELECT 'First day week (based on DATEFIRST setting)', DATEADD(day,-(DATEPART(weekday ,@Date)-1),DATEDIFF(day,'19000101', @Date))
UNION ALL
SELECT 'Last day of week (based on DATEFIRST setting)', DATEADD(day,-(DATEPART(weekday ,@Date)-1)+6,DATEDIFF(day,'19000101', @Date));

With a calendar table like the one later in this post:

DECLARE @Date date = GETDATE();
SELECT 'First day of year' [DateDescription],(SELECT FirstDateOfYear FROM dbo.Calendar WHERE CalendarDate = @Date)
UNION ALL
SELECT 'Last day of year',(SELECT LastDateOfYear FROM dbo.Calendar WHERE CalendarDate = @Date)
UNION ALL
SELECT 'First day of month',(SELECT FirstDateOfMonth FROM dbo.Calendar WHERE CalendarDate = @Date)
UNION ALL
SELECT 'Last day of month',(SELECT LastDateOfMonth FROM dbo.Calendar WHERE CalendarDate = @Date)
UNION ALL
SELECT 'First day week (based on DATEFIRST setting)',(SELECT FirstDateOfWeek FROM dbo.Calendar WHERE CalendarDate = @Date)
UNION ALL
SELECT 'Last day of week (based on DATEFIRST setting)',(SELECT LastDateOfWeek FROM dbo.Calendar WHERE CalendarDate = @Date);

Calendar and Numbers Table

I think auxiliary calendar and number tables are a must-have on every database server.  These objects allow you to easily perform set-based processing in a number of scenarios.  In fact, the calendar table population script below uses a numbers table to populate the calendar table with several thousand rows in under a second.  This is much more efficient that a WHILE loop.

This calendar table population script also updates the table with most US holidays and adjusts business/non-business days accordingly.  In addition to customizing the script for holidays as observed by your organization, you might add fiscal period start/end dates to facilitate querying based on those cycles.  Also consider creating user-defined functions or stored procedures to encapsulate frequently used code that uses the calendar table.  For example, here is a function that returns the date that is a specified number of business days from the date provided:

CREATE FUNCTION dbo.udf_AddBusinessDays
(@Date date, @BusinessDays int)
RETURNS date
AS
BEGIN
    RETURN (
        SELECT TOP (1) CalendarDate AS BusinessDate
        FROM (SELECT TOP (@BusinessDays) CalendarDate
        FROM dbo.Calendar
        WHERE
            CalendarDate > @Date
            AND BusinessDay = 1
        ORDER BY CalendarDate) AS BusinessDays
        ORDER BY CalendarDate DESC
    );
END;
GO

Script 1: Example calendar table utility function

--auxiliary number table
CREATE TABLE dbo.Numbers(
    Number int NOT NULL
        CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED
);

--load Numbers table with 1,000,000 numbers
WITH t1 AS (SELECT 0 AS n UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0)
    ,t2 AS (SELECT 0 AS n FROM t1 t1a, t1 t1b, t1 t1c, t1 t1d)
    ,t3 AS (SELECT 0 AS n FROM t2 t2a, t2 t2b, t2 t2c)
    ,numbers AS (SELECT ROW_NUMBER() OVER(ORDER BY n) - 1 AS number FROM t3)
INSERT INTO dbo.Numbers WITH (TABLOCKX) (Number)
SELECT number
FROM numbers;

Script 2: Create and populate numbers table.

CREATE TABLE dbo.Calendar(
	CalendarDate date NOT NULL
		CONSTRAINT PK_Calendar PRIMARY KEY CLUSTERED
	,CalendarYear int NOT NULL
	,CalendarMonth int NOT NULL
	,CalendarDay int NOT NULL
	,DayOfWeekName varchar(10) NOT NULL
	,FirstDateOfWeek date NOT NULL
	,LastDateOfWeek date NOT NULL	
	,FirstDateOfMonth date NOT NULL
	,LastDateOfMonth date NOT NULL
	,FirstDateOfQuarter date NOT NULL
	,LastDateOfQuarter date NOT NULL
	,FirstDateOfYear date NOT NULL
	,LastDateOfYear date NOT NULL
	,BusinessDay bit NOT NULL
	,NonBusinessDay bit NOT NULL
	,Weekend bit NOT NULL
	,Holiday bit NOT NULL
	,Weekday bit NOT NULL
	,CalendarDateDescription varchar(50) NULL
);
GO

--load dates from 2000-01-01 through 2099-12-31
WITH t1 AS (SELECT 0 AS n UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0)
	,t2 AS (SELECT 0 AS n FROM t1 t1a, t1 t1b, t1 t1c, t1 t1d)
	,t3 AS (SELECT 0 AS n FROM t2 t2a, t2 t2b)
	,numbers AS (SELECT ROW_NUMBER() OVER(ORDER BY n) - 1 AS number FROM t3)
INSERT INTO dbo.Calendar WITH (TABLOCKX) (
	CalendarDate
	,CalendarYear
	,CalendarMonth
	,CalendarDay
	,DayOfWeekName
	,FirstDateOfWeek
	,LastDateOfWeek
	,FirstDateOfMonth
	,LastDateOfMonth
	,FirstDateOfQuarter
	,LastDateOfQuarter
	,FirstDateOfYear
	,LastDateOfYear
	,BusinessDay
	,NonBusinessDay
	,Weekend
	,Holiday
	,Weekday
	,CalendarDateDescription
	) 
SELECT
	CalendarDate = DATEADD(day, number, '20000101')
	,CalendarYear = DATEPART(year, DATEADD(day, number, '20000101'))
	,CalendarMonth = DATEPART(month, DATEADD(day, number, '20000101'))
	,CalendarDay = DATEPART(day, DATEADD(day, number, '20000101'))
	,DayOfWeekName = DATENAME(weekday, DATEADD(day, number, '20000101'))
	,FirstDateOfWeek = DATEADD(day,-(DATEPART(weekday ,DATEADD(day, number, '20000101'))-1),DATEADD(day, number, '20000101'))
	,LastDateOfWeek = DATEADD(day,-(DATEPART(weekday ,DATEADD(day, number, '20000101'))-1)+6,DATEADD(day, number, '20000101'))
	,FirstDateOfMonth = DATEADD(month, DATEDIFF(month,'20000101',DATEADD(day, number, '20000101')), '20000101')
	,LastDateOfMonth = DATEADD(day,-1,DATEADD(month,0,DATEADD(month,DATEDIFF(month,'20000101',DATEADD(day, number, '20000101'))+1,'20000101')))
	,FirstDateOfQuarter = DATEADD(quarter, DATEDIFF(quarter,'20000101',DATEADD(day, number, '20000101')), '20000101')
	,LastDateOfQuarter = DATEADD(day, -1, DATEADD(quarter, DATEDIFF(quarter,'20000101',DATEADD(day, number, '20000101'))+1, '20000101'))
	,FirstDateOfYear = DATEADD(year, DATEDIFF(year,'20000101',DATEADD(day, number, '20000101')), '20000101')
	,LastDateOfYear = DATEADD(day,-1,DATEADD(year, DATEDIFF(year,'20000101',DATEADD(day, number, '20000101'))+1, '20000101'))
	--initially set all weekdays are business days
	,BusinessDay = CASE WHEN DATENAME(weekday, DATEADD(day, number, '20000101')) IN('Monday','Tuesday','Wednesday','Thursday','Friday') THEN 1 ELSE 0 END
	--all weekends are non-business days
	,NonBusinessDay = CASE WHEN DATENAME(weekday, DATEADD(day, number, '20000101')) IN('Saturday','Sunday') THEN 1 ELSE 0 END
	,Weekend = CASE WHEN DATENAME(weekday, DATEADD(day, number, '20000101')) IN('Saturday','Sunday') THEN 1 ELSE 0 END
	,Holiday = 0 --initially no holidays
	,Weekday = CASE WHEN DATENAME(weekday, DATEADD(day, number, '20000101')) IN('Monday','Tuesday','Wednesday','Thursday','Friday') THEN 1 ELSE 0 END
	,CalendarDateDescription = NULL
FROM numbers
WHERE number < DATEDIFF(day, '20000101', '20991231') + 1;

--New Year's Day
UPDATE dbo.calendar
SET
	BusinessDay = 0
	,NonBusinessDay = 1
	,Holiday = 1
    ,CalendarDateDescription = 'New Year''s Day'
WHERE
    CalendarMonth = 1
    AND CalendarDay = 1;

--New Year's Day celebrated on Friday, December 31 when January 1 falls on Saturday
UPDATE dbo.Calendar
SET
	BusinessDay = 0
	,NonBusinessDay = 1
    ,CalendarDateDescription = 'New Year''s Day Celebrated'
WHERE
    CalendarMonth = 12
    AND CalendarDay = 31
    AND DayOfWeekName = 'Friday';

--New Year's Day celebrated on Monday, January 2 when January 1 falls on Sunday
UPDATE dbo.Calendar
SET
	BusinessDay = 0
	,NonBusinessDay = 1
    ,CalendarDateDescription = 'New Year''s Day Celebrated'
WHERE
    CalendarMonth = 1
    AND CalendarDay = 2
    AND DayOfWeekName = 'Monday';    

--Martin Luther King Day - 3rd Monday in January
UPDATE dbo.Calendar
SET
	BusinessDay = 0
	,NonBusinessDay = 1
	,Holiday = 1
    ,CalendarDateDescription = 'Martin Luther King Day'
WHERE
    CalendarMonth = 1
    AND DayOfWeekName = 'Monday'
    AND (SELECT COUNT(*) 
		FROM dbo.Calendar c2 
        WHERE 
            c2.CalendarDate <= Calendar.CalendarDate
            AND c2.CalendarYear = Calendar.CalendarYear
            AND c2.CalendarMonth = Calendar.CalendarMonth
            AND c2.DayOfWeekName = 'Monday'
        ) = 3;

--President's Day - 3rd Monday in February
UPDATE dbo.Calendar
SET
	BusinessDay = 0
	,NonBusinessDay = 1
	,Holiday = 1
    ,CalendarDateDescription = 'President''s Day'
WHERE
    CalendarMonth = 2
    AND DayOfWeekName = 'Monday'
    AND (SELECT COUNT(*) 
		FROM dbo.Calendar c2 
        WHERE 
            c2.CalendarDate <= Calendar.CalendarDate
            AND c2.CalendarYear = Calendar.CalendarYear
            AND c2.CalendarMonth = Calendar.CalendarMonth
            AND c2.DayOfWeekName = 'Monday'
        ) = 3;

--Easter - first Sunday after the full moon following the vernal (March 21) equinox 
WITH
	t4 AS (SELECT 0 AS n UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0)
    ,t256 AS (SELECT 0 AS n FROM t4 t4a, t4 t4b, t4 t4c)
    ,years AS (SELECT ROW_NUMBER() OVER(ORDER BY n) + 1999 AS year FROM t256)
	,n AS (SELECT years.year, years.year - (19 * (years.year / 19)) AS n FROM years)
	,century AS (SELECT years.year, years.year / 100 AS century FROM years)
	,i AS (SELECT century.year, century.century - (century.century / 4) - ((century.century - ((century.century - 17) / 25)) / 3) + (19 * n.n) + 15  AS i
		FROM century 
		JOIN n ON n.year = century.year)
	,i2 AS (SELECT i.year, i.i - (30 * (i.i / 30 ) ) AS i2
		FROM i)
	,i3 AS (SELECT i2.year, i2.i2 - ((i2.i2 / 28) * (1 - (i2.i2 / 28) * (29 / (i2.i2 + 1)) * ((21 - n.n) / 11)) ) AS i3 
		FROM i2
		JOIN n ON n.year = i2.year)
	,j AS (SELECT i3.year, i3.year + (i3.year / 4) + i3.i3 + 2 - century.century + (century.century / 4 ) AS j 
		FROM i3
		JOIN century ON century.year = i3.year)
	,j2 AS (SELECT j.year, j.j - (7 * (j.j / 7) ) AS j2 
		FROM j)
	,month AS (SELECT j2.year, 3 + (((i3.i3 - j2.j2) + 40) / 44 ) AS month 
		FROM j2
		JOIN i3 ON i3.year = j2.year)
	,day AS (SELECT month.year, month.month, i3.i3 - j2.j2 + 28 - (31 * ( month.month / 4 ) ) AS day 
		FROM i3
		JOIN j2 ON j2.year = i3.year
		JOIN month ON month.year = j2.year)
	,easter AS (SELECT CAST(DATEADD(year, month.year-1900, DATEADD(day, day.day-1, DATEADD(month, month.month-1, ''))) AS date) AS easter
		FROM month
		JOIN day ON day.month = month.month AND day.year = month.year)
UPDATE dbo.Calendar
SET
	Holiday = 1
    ,CalendarDateDescription = 'Easter'
WHERE
    CalendarDate IN(
		SELECT easter
		FROM easter
		);

--Good Friday - 2 days before Easter Sunday
UPDATE dbo.Calendar
SET
	BusinessDay = 0
	,NonBusinessDay = 1
	,Holiday = 1
    ,CalendarDateDescription = 'Good Friday'
WHERE
    CalendarDate IN(
        SELECT DATEADD(day, -2, c2.CalendarDate)
        FROM dbo.Calendar c2
        WHERE c2.CalendarDateDescription = 'Easter'
        );

--Memorial Day - last Monday in May
UPDATE dbo.Calendar
SET
	BusinessDay = 0
	,NonBusinessDay = 1
	,Holiday = 1
    ,CalendarDateDescription = 'Memorial Day'
WHERE
    CalendarMonth = 5
    AND DayOfWeekName = 'Monday'
    AND CalendarDate IN(
        SELECT MAX(c2.CalendarDate)
        FROM dbo.Calendar c2
        WHERE
            c2.CalendarYear = Calendar.CalendarYear
            AND c2.CalendarMonth = 5
            AND c2.DayOfWeekName = 'Monday'
        );

--Independence Day - July 4th
UPDATE dbo.Calendar
SET
	BusinessDay = 0
	,NonBusinessDay = 1
	,Holiday = 1
    ,CalendarDateDescription = 'Independence Day'
WHERE
    CalendarMonth = 7
    AND CalendarDay = 4;

--Independence Day celebrated on Friday, July 3 when July 4 falls on a Saturday
UPDATE dbo.Calendar
SET
	BusinessDay = 0
	,NonBusinessDay = 1
    ,CalendarDateDescription = 'Independence Day Celebrated'
WHERE
    CalendarMonth = 7
    AND CalendarDay = 3
    AND DayOfWeekName = 'Friday';

--Independence Day celebrated on Friday, July 3 when July 4 falls on a Saturday
UPDATE dbo.Calendar
SET
	BusinessDay = 0
	,NonBusinessDay = 1
    ,CalendarDateDescription = 'Independence Day Celebrated'
WHERE
    CalendarMonth = 7
    AND CalendarDay = 5
    AND DayOfWeekName = 'Monday';

--Labor Day - first Monday in September
UPDATE dbo.Calendar
SET
	BusinessDay = 0
	,NonBusinessDay = 1
	,Holiday = 1
    ,CalendarDateDescription = 'Labor Day'
WHERE
    CalendarMonth = 9
    AND DayOfWeekName = 'Monday'
    AND CalendarDate IN(
        SELECT MIN(c2.CalendarDate)
        FROM dbo.Calendar c2
        WHERE
            c2.CalendarYear = calendar.CalendarYear
            AND c2.CalendarMonth = 9
            AND c2.DayOfWeekName = 'Monday'
        );

--Columbus Day - second Monday in October
UPDATE dbo.Calendar
SET
	Holiday = 1
    ,CalendarDateDescription = 'Columbus Day'
WHERE
    CalendarMonth = 10
    AND DayOfWeekName = 'Monday'
    AND (SELECT COUNT(*) 
		FROM dbo.Calendar c2 
        WHERE 
            c2.CalendarDate <= Calendar.CalendarDate
            AND c2.CalendarYear = Calendar.CalendarYear
            AND c2.CalendarMonth = Calendar.CalendarMonth
            AND c2.DayOfWeekName = 'Monday'
        ) = 2;

--Veteran's Day - November 11
UPDATE dbo.Calendar
SET
	Holiday = 1
    ,CalendarDateDescription = 'Veteran''s Day'
WHERE
    CalendarMonth = 11
    AND CalendarDay = 11;

--Thanksgiving - fourth Thursday in November
UPDATE dbo.Calendar
SET
	BusinessDay = 0
	,NonBusinessDay = 1
	,Holiday = 1
    ,CalendarDateDescription = 'Thanksgiving'
WHERE
    CalendarMonth = 11
    AND DayOfWeekName = 'Thursday'

    AND (SELECT COUNT(*) FROM 
		dbo.Calendar c2 
        WHERE 
            c2.CalendarDate <= Calendar.CalendarDate
            AND c2.CalendarYear = Calendar.CalendarYear
            AND c2.CalendarMonth = Calendar.CalendarMonth
            AND c2.DayOfWeekName = 'Thursday'
        ) = 4;

UPDATE dbo.Calendar
SET
	BusinessDay = 0
	,NonBusinessDay = 1
	,Holiday = 1
    ,CalendarDateDescription = 'Day after Thanksgiving'
WHERE
    CalendarDate IN(
        SELECT DATEADD(day, 1, c2.CalendarDate)
        FROM dbo.Calendar c2
        WHERE c2.CalendarDateDescription = 'Thanksgiving'
        );

--Christmas Day - December 25th
UPDATE dbo.Calendar
SET
	BusinessDay = 0
	,NonBusinessDay = 1
	,Holiday = 1
    ,CalendarDateDescription = 'Christmas Day'
WHERE
    CalendarMonth = 12
    AND CalendarDay = 25;

--Christmas day celebrated on Friday, December 24 when December 25 falls on a Saturday
UPDATE dbo.Calendar
SET
	BusinessDay = 0
	,NonBusinessDay = 1
    ,CalendarDateDescription = 'Christmas Day Celebrated'
WHERE
    CalendarMonth = 12
    AND CalendarDay = 24
    AND DayOfWeekName = 'Friday';

--Christmas day celebrated on Monday, December 24 when December 25 falls on a Sunday
UPDATE dbo.Calendar
SET
	BusinessDay = 0
	,NonBusinessDay = 1
    ,CalendarDateDescription = 'Christmas Day Celebrated'
WHERE
    CalendarMonth = 12
    AND CalendarDay = 26
    AND DayOfWeekName = 'Monday';

Script 3: Create and populate calendar table and update with holidays