Table Partitioning Best Practices

SQL Server table partitioning has a number of gotchas without proper planning.  This article demonstrates those that commonly cause grief and recommends best practices to avoid them.

Implications of the Partition Function Range Specification

One needs a good understanding of how the RANGE LEFT/RIGHT specification affects partition setup and management.  The RANGE specification determines:

  • The partition created by SPLIT
  • The partition removed with MERGE
  • The permanent partition that can never be removed from partition schemes

Below are nuances of the RANGE specification that commonly surprise people.

The Partition Created by SPLIT

New partitions are created by splitting a partition function.  A partition function SPLIT splits an existing partition into 2 separate ones, changing all of the underlying partition schemes, tables, and indexes.  Below are the actions performed when a LEFT or RIGHT partition is SPLIT, with important the differences in bold:

The actions performed by a SPLIT of a RANGE LEFT partition function:

  • Identify existing partition to be split, which is the one that contains the new boundary (or the last partition if no existing boundaries are higher than the one being added)
  • Add the new boundary to the partition function, maintaining boundary order and incrementing subsequent partition numbers
  • Create a new partition to the left of the existing one on the NEXT USED filegroup of each partition scheme that uses the function
  • For each table/index using the affected partition scheme(s), move rows from the existing split partition that are less than or equal to the new boundary into the newly created partition on the left

The actions performed by a SPLIT of a RANGE RIGHT partition function:

  • Identify existing partition to be split, which is the one that contains the new boundary (or the first partition if no existing boundaries are less than the one being added)
  • Add the new boundary to the partition function, maintaining boundary order and incrementing subsequent partition numbers
  • Create a new partition to the right of the existing one on the NEXT USED filegroup for each partition scheme that uses the function
  • For each table/index using the affected partition scheme(s), move rows from the existing split partition that are greater than or equal to the new boundary into the newly create partition on the right

I generally suggest one use RANGE RIGHT instead because it is more natural, and helps avoid common pitfalls when adding incremental partition boundaries.  RANGE LEFT behavior is not intuitive and trips up many DBAs but I will discuss for completeness.  Consider this example of a RANGE LEFT partition function on a datetime column, where the initial setup is for 2 years of data (2013 and 2014):

This initial setup results in 2 partitions and data properly mapped to the 2 yearly filegroups of the scheme.  Now, we need to prepare for year 2015 so we add a new filegroup, set the NEXT USED filegroup, SPLIT the function for year 2015, and insert data for 2015:

This SPLIT results in adding the new boundary to the function as expected:

However, the partition scheme is not as desired.  The resultant partition scheme is:

As you can see, the filegroup for 2015 was inserted into the scheme before the 2014 one, resulting in 2014 data in the FG_2015 filegroup and 2015 data in the FG_2014 filegroup.  Furthermore, not only is the filegroup mapping now wrong, data movement was required to move the entire year of 2014 data into the new partition.  No big deal here since only 3 rows were moved by this demo script but in a production table, this movement could be a show stopper.  Logging during SWITCH/MERGE data movement during is about 4 times that of normal DML, which is especially costly when working with large tables containing millions or billions of rows as is commony used in table partitioning.

Remember that the new partition by a SPLIT is the one that includes the specified boundary, which is to the left of the existing split partition with a RANGE LEFT function (the 2014 partition here).  Data are moved from the existing SPLIT partition into the newly created partition according to the boundaries of the new partition (greater than ‘2013-12-31T23:59:59.997’ and less than or equal to ‘2014-12-31T23:59:59.997’).

Again, I recommend using a RANGE RIGHT function to avoid this non-intuitive behavior.  Below is the equivalent RANGE RIGHT script for yearly partitions, which results in the desired filegroup mappings as well as no data movement.  Also, note that the datetime boundaries are exact date specifications with RANGE RIGHT, which is also more intuitive when working with temporal datetime, datetime2, and datetimeoffset data types that include a time component.

The resultant partition function and scheme definitions are as desired with this RANGE RIGHT partition function:

The Partition Removed by MERGE

When a partition is removed with MERGE, the dropped partition is the one that includes the specified boundary.  If the dropped partition is not empty, all data will be moved into the adjacent remaining partition.  Like SPLIT, costly data movement during partition maintenance should be avoided so it is best to plan such than only empty partitions are removed.  A MERGE should typically done after a purge/archive of data using SWITCH.

Below is a summary of a RANGE LEFT partition function MERGE:

  • Identify existing partition to be removed, which is the one that includes (to the left of) the specified existing boundary
  • For each table/index using the affected partition scheme(s), move rows from this partition into the adjacent right partition
  • Remove boundary from the partition function, maintaining boundary order and decrementing subsequent partition numbers
  • Remove the filegroup from each partition scheme that uses the function, unless the filegroup is mapped to another partition or is the NEXT USED filegroup

Below is a summary of a RANGE RIGHT partition function MERGE:

  • Identify existing partition to be removed, which is the one that includes (to the right of) the specified existing boundary
  • For each table/index using the affected partition scheme(s), move rows from this partition into the adjacent left partition
  • Remove boundary from the partition function, maintaining boundary order and decrementing subsequent partition numbers
  • Remove the filegroup from each partition scheme that uses the function, unless the filegroup is mapped to another partition or is the NEXT USED filegroup

I suggest explicit partition boundaries for expected data ranges to facilitate using both MERGE and SPLIT, and an additional one for the permanent partition (which I’ll detail shortly).  This practice helps ensure data are both logically and physically aligned, providing more natural partition management.

The Permanent Partition

You might not be aware that each partition scheme has a permanent partition that can never be removed.  This is the first partition of a RANGE RIGHT function and the last partition of a RANGE LEFT one.  Be mindful of this permanent partition when creating a new partition scheme when multiple filegroups are involved because the filegroup on which this permanent partition is created is determined when the partition scheme is created and cannot be removed from the scheme.

My recommendation is that one create explicit partition boundaries for all expected data ranges plus a lower and upper boundary for data outside the expected range, and map these partitions to appropriately named filegroups.  This practice will make the purpose of each partition/filegroup clear and help avoid accidentally placing data on the wrong filegroup.

Consider mapping partitions containing data outside the expected range to a dummy filegroup with no underlying files.  This will guarantee data integrity much like a check constraint because data outside the allowable range cannot be inserted.  If you must accommodate errant data rather than rejecting it outright, instead map these partitions to a generalized filegroup like DEFAULT or one designated specifically for that purpose.

I suggest specifying a NULL value for the first boundary of a RANGE RIGHT partition function.  This NULL boundary serves as the upper boundary of the permanent first partition as well as the lower boundary for the second partition containing data outside the expected range.  No rows are less than NULL so the first partition will always be empty.  It is therefore safe to map the first partition to the previously mentioned dummy filegroup even if you need to house data outside the expected range.  That being said, there is no harm in mapping the first partition to another filegroup other than lack of clarity.

For the last boundary of a RANGE RIGHT function, I suggest specifying the lowest value outside the expected range and also mapping the partition to either the dummy filegroup, or one designated to contain unexpected data.  The boundaries between the first boundary (NULL) and this one designate partitions for expected data.


In summary, I recommend a RANGE RIGHT function with the following setup:

  • First boundary value NULL
  • Subsequent boundary values for expected data partitions
  • A final boundary value of greater than the expected range
  • Map first, second, and last partitions to either a dummy filegroup or one designated for unexpected data
  • Map remaining expected data partitions to appropriately named filegroups

A similar RANGE LEFT function can be setup as follows.  I’m including this for only for completeness as the RANGE RIGHT setup above is a best practice, in my humble opinion.

  • First boundary for data less than the expected range
  • Subsequent boundaries for expected data partitions
  • A final boundary value of the maximum allowable value for the partitioning data type (which is another kludge that bolsters the case for RANGE RIGHT)
  • Map first, second from last, and last partitions to either a dummy filegroup or one designated for unexpected data
  • Map remaining expected data partitions to appropriately named filegroups

Below is an example script of applying these techniques with a RANGE RIGHT function, including adding an incremental partition for a new year.  Partitions for data outside the expected data range are mapped to the FG_NoData filegroup (which contains no files) so any attempt to insert data outside the expected range will fail.


Always Use Semicolon Statement Terminators

ANSI-standard semicolon statement terminators are often omitted in T-SQL queries and many developers are unaware that this is syntax is deprecated.  Omitting statement terminators is a dangerous practice because, even if the batch compiles, you may get unexpected results.  Consider the insidious examples below pointed out by SQL Server MVP Erland Sommarskog:

Few of us will catch (no pun intended) the bug in the above script.  What results do you expect after running the above script under SQL Server 2012 or later versions?  Rather than leave this as an exercise for the reader, I’ll spoil the fun and mention that no run-time error is raised at all.  Instead, the THROW statement is interpreted as a column alias for the ERROR_MESSAGE() column.  This sort of coding error is especially nasty because catch blocks are rarely unit tested and this catch block coding mistake hides the run-time error entirely without raising an exception.

Similarly, the absence of statement terminators in the script below causes another problem.  Can you spot it?

At least an error is raised in this case, albeit not the one you might expect.  The resultant error is “Cannot roll back THROW. No transaction or savepoint of that name was found”.  This coding bug obfuscates the preceding divide by zero error and prevents the THROW statement from being executed.

Below is another example where the absence of the semi-colon terminator obfuscates the root cause of the error. As you may know, GO is not a T-SQL statement but a batch terminator command recognized by SSMS and other SQL Server tools and utilities. This script executes as expected from an SSMS query window because SSMS parses the script and executes each batch individually when GO commands are encountered:

However, running the same script with PowerShell (or any other client application) fails with the error “CREATE VIEW must be the first statement in a query batch”:

In this case, SQL Server interprets the GO as a column alias in the first SELECT query and the batch errs on the CREATE VIEW statement during compilation. If you a semi-colon is added to the end of the first SELECT statement, the correct error message results: “Incorrect syntax near ‘GO'”.

As a side note, one can execute scripts containing GO terminators programmatically using the SMO API, which is also used by some SQL Server tools. See this Stackoverflow answer. Another approach I’ve used is to parse scripts in code using the Transact-SQL script DOM and execute each batch individually. I’ll follow up with a separate article detailing that method and add the link here.

Semicolons Will Become Mandatory
Microsoft announced with the SQL Server 2008 release that semicolon statement terminators will become mandatory in a future version so statement terminators other than semicolons (whitespace) are currently deprecated.  This deprecation announcement means that you should always use semicolon terminators in new development.  I honestly don’t expect SQL Server to strictly enforce mandatory semicolons in the near future but it is still a best practice to use semicolon statement to avoid issues like those mentioned earlier as well as facilitate code maintainability.  I suggest specifying statement terminators in all new development and perhaps adding terminators to existing code as you perform maintenance.

Transact-SQL does not currently enforce the ANSI semicolon statement terminator requirement.  Instead, semicolon statement terminators are optional and any whitespace (spaces, tabs, newline) may be used instead.  The exception to this rule is that many of the statements introduced in SQL Server 2005 and later require the preceding statement to be properly terminated in order for the batch to compile.

Below are some guidelines I suggest on when to, and when not to, use semicolon statement terminators.

Suggested Guidelines
The Transact-SQL parser is quite lax, allowing any whitespace (e.g. space, tab, newline) to be used.  This laxness results in ambiguity like the examples at the beginning of this article demonstrate.  Similarly, statement terminators may not only be omitted, they may also be used in inappropriately.  I strongly suggest you adhere to the T-SQL syntax documented in the Books Online even if the parser allows otherwise.  This practice will help future-proof your code since relying on undocumented behavior is inherently risky.

Don’t precede a statement with a semicolon
Remember that the purpose of semicolons is to terminate SQL statements, not begin them.  A common mistake I see is throwing a semicolon in front of statements in order to get a batch of statements to compile, especially with newer statements like WITH (CTE expression) that require previous statement termination.  Although the T-SQL parser currently ignores extraneous and misplaced semi-colons, I suggest they be specified in the appropriate place according statement syntax documented in the SQL Server Books Online.

Specify semicolons at the end of each stand-alone SQL statement
Not only will this conform to the ANSI standard, your intent will be clearer and the code easier to read.

Terminate control-of-flow statement blocks at the end of the control-of-flow scope
Control-of-flow statements are not covered by the ANSI SQL standard because these are proprietary SQL extensions.  The SQL Server Books Online is sketchy on the subject and many of the examples (as of this writing) are inconsistent and do not always include statement terminators.  Furthermore, control-of-flow statement blocks are confusing due to the many variations, nesting, and optional BEGIN/END specifications.

Below are examples illustrating what I believe to be proper use of statement terminators control-of-flow block terminators using IF statements in SQL 2008 and later versions.  The same concepts apply to other control-of-flow constructs like WHILE and TRY/CATCH blocks.  I should add that this batch example will not compile under SQL 2005 because an explicit BEGIN/END block is required to execute a common table expression conditionally in that version.  T-SQL parser enhancements eliminated that requirement in SQL 2008 and later.

Consistent user of semicolons helps avoid bugs in code that might otherwise go undetected.  Code with statement terminators can also be more easily modified without introducing compile errors and make code easier to maintain because the end of each statement is readily apparent to subsequent developers.  Importantly, you’ll be better positioned for future SQL Server versions by consistently using semicolon statement terminators.

Help Stop SQL Injection Madness

SQL injection continues to be the most exploited security vulnerability on the internet. A SQL injection vulnerability can put companies out of business and expose innocent people to sensitive data disclosure and damage, all due to developer ignorance or carelessness. Sadly, although there is much information available on SQL injection exploits and mitigation, SQL injection exploits continue because developers use flawed database access techniques. This article details the best, and simplest, method to prevent SQL injection in SQL Server: parameterized SQL statements.

Parameterized SQL queries and stored procedure calls (with CommandType.StoredProcedure and no dynamic SQL within proc code) prevent unintended SQL from being executed. I’ll discuss exactly why shortly. Parameterized SQL also provides many benefits besides security. Notably:

  • allows quotes in data without programmatic escaping
  •  avoids the need to format date strings according to DATEFORMAT session settings
  • handles decimal separators automatically
  •  improves performance due to plan reuse
  •  provides cleaner code

Most articles on SQL injection focus on validating user input rather than parameterized queries.  Input validation certainly has benefits related to data integrity and user-friendly validation messages but it is at best a secondary defense against injection compared to parameterized queries.  Input validation using a blacklist of prohibited keywords and characters is especially weak.

SQL Injection Overview
I am puzzled by the number of examples in articles and forum posts that show building SQL statements with literals using a string concatenation technique.  I suspect the main reason for this practice is that developers write code that generates the exact same ad-hoc SQL statement they would write using a query tool like SQL Server Management Studio or Visual Studio.  This leads to the nasty habit of not using parameters.  For example, a developer might develop and test a query like this in SSMS:

After testing the query, the C# developer includes the SQL statement in the application code but, instead of a hard-coded value for CustomerID, the desired value is concatenated with the remainder of the SQL statement as a literal.

The query runs perfectly well and returns the expected result in testing.  It is not until after production deployment that a script kiddie or hacker finds the SQL injection vulnerability.  A malicious person can easily manipulate this SQL statement in ways unintended by the developer.  For example, a hacker could enter CustomerID value “1 OR 1 = 1” in the text box to harvest the names and email addresses of all customers in the database.  This vulnerability can also be exploited to execute additional SQL statements in the same batch, which is often used to update data with malicious html script that is subsequently returned and executed by unsuspecting client browsers.  There are many ways to exploit a SQL injection vulnerability that result in sensitive data disclosure and/or database manipulation.  I won’t detail those here but rather recommend a very simple defense; use a parameterized statement instead of literals.

Mitigating SQL Injection
Below is a functionally identical parameterized query.

This query can be incorporated into the C# application code with the parameter and corresponding value added to the parameterized SQL command as a strongly-typed parameter:

As you can see, the effort needed to do the same job with a parameterized command is about the same as the string concatenation method, yet the security benefits are immeasurable.

Why Parameters Prevent SQL Injection
Parameterized SQL is more secure for a couple of reasons.  In the case of an ad-hoc query, the SQL statement with parameter markers is hard-coded in the application (or generated via an ORM framework) and passed to SQL Server as-is.  Parameter values are passed separately rather than inside the SQL statement itself.  This ensures the SQL statement cannot be modified by the values provided.

Similarly, a stored procedure call (using CommandType.StoredProcedure) sends only the stored procedure name to SQL Server with separately passed parameter values.  Using stored procedures also allow one to more strictly adhere to the security principal of least privilege.  By granting only stored procedure execute permissions, permissions on indirectly referenced objects are not needed as long as the ownership chain is unbroken.  This practices limits ad-hoc access to underlying data should an attacker find a back door into the database under the application security context.  Personally, I’m a staunch advocate of stored procedures for this and other reasons too.  Stored procedures provide a well-defined database interface, allowing queries and database schema to be refactored without breaking changes.

I should add that it is perfectly acceptable to build a SQL statement dynamically in either application code or stored procedures, but only if parameterized and actually needed for the task at hand.  A common use case is a dynamic search query that is constructed based on user criteria. Instead of coding a separate query for each possible search permutation, one can add parameterized WHERE clauses as needed.  Under no circumstances should a SQL statement string be built with concatenation of untrusted values.

Additional Precautions
Use Windows Authentication for SQL Server database access.  This practice eliminates the need to store security credentials in application configuration files.  If SQL authentication must be used, protect the credentials with encryption.  This is easily done in .NET by encrypting the connectionStrings section of the app.config file.  See

Adhere to Principal of least privilege.  Never use a privileged account, such as a sysadmin role member, for routine application access.  Grant permissions only on database objects which are used directly by applications.  Use stored procedures so that one can leverage ownership chaining, eliminating the need to grant permissions on tables directly.

Specify strongly-typed parameters.  The parameter data type can be inferred (e.g. using SqlCommand.Parameters.AddWithValue method) but a best practice is to explicitly specify a parameter data type that matches the underlying column data type and maximum length.  These practices will minimize the size of the SQL Server procedure cache and avoid implicit conversions that can negatively affect performance.

Maximizing Performance with Table-Valued Parameters

I’ve seen TVPs improve performance by orders of magnitude when used appropriately.  This feature isn’t used as often as it should be so I’ll show how to implement a TVP using a C# application example and discuss TVP considerations.

Why Table-Valued Parameters Improve Performance
TVPs allow one to pass multiple rows or values at once to a query or stored procedure.  By doing so, the query can leverage set-based operations to greatly improve performance compared to executing the query/proc many times.  TVPs decrease network latency by reducing network round trips.  Prior to TVPs, one had to employ workarounds like passing a string containing a delimited list of records (or XML) and parsing on the SQL side.  Those techniques are at best a kludge.  Instead TVPs are the right tool for the job in SQL Server 2008 and later.

TVPs (and table variables) are temp tables.  TVPs and table variables are persisted in tempdb much the same way as regular temp tables (# prefix).  The big differences are that TVPs and table variables are limited in scope to the current batch and do not have statistics.  Also, table parameters/variables allow only indexes declared via primary key nor unique constraints; non-unique indexes are not supported.

When a TVP Is passed to SQL Server from a client application, the data is bulk-inserted into tempdb where it is made available to the SQL statement or stored procedure using the declared TVP.  This bulk insert is handled by the client API transparently to the application and allows a large number of rows to be passed to SQL Server efficiently.  In T-SQL, one can declare and load a table variable, where it can be used directly in a SQL statement or passed as a TVP to a stored procedure or query (invoked with sp_executesql).

The exact threshold where TVPs outperform single-row operations is much lower than you might expect.  Your mileage may vary but, at least in my experience, it’s only a few rows.  The performance tests I ran here show that the threshold where a TVP outperforms individual requests with a simple list of customers is only 4 rows.  That said, I would not recommend TVPs for single-row operations due to the start-up overhead but it’s a good choice when the most common use case is multiple rows.

How to Use TVPs
In order to pass a TVP to a SQL statement or stored procedure, one must first create a user-defined table type in SQL Server.  The table type defines the schema of the table SQL Server expects for the TVP.  Listing 1 shows the CREATE DDL for the table type I use in the AdventureWorks2012 database to pass a list of customers to the stored procedure in Listing 2 that returns all orders for these customers.  Note that the stored procedure TVP must be declared as READONLY.

Listing 1: Table type DDL

Listing 2: Stored procedure to return orders for multiple customers

TVPs are passed from a .NET application using parameter data type DbType.Structured.  The actual parameter value can be an object of type DataTable, DbDataReader or IEnumberable<SqlDataRecord>.  A DataTable is appropriate when the required TVP data is already in a DataTable object needed for other purposes.  A DbDataReader is a good choice when the source data is the result of a query (note that any DbDataReader may be used, not just a SqlDataReader).   For other cases, I recommend an IEnumerable<SqlDataRecord>.  It is very easy to implement IEnumerable<SqlDataRecord>, especially if you already have an enumerable object like an array or collection containing the values you need to pass.  See the code at the end of this article for an example class that implements IEnumerable<SqlDataRecord> for a TVP.

A benefit with DbDataReader and IEnumerable<SqlDataRecord> is that the source data can be streamed to SQL server without first loading all the TVP data into memory in either the client application or in SQL Server.  This is a big consideration when a lot of data needs to be passed because you are not constrained by memory on ether the client or SQL Server side.

Listing 3 shows the stored procedure I use to compare performance with the TVP method.  This proc takes only a single @CustomerID parameter so it must be called once for each customer.

Listing 3: Stored procedure to return orders for a single customer

The complete C# console application I used for the performance tests is in Listing 4.  The code demonstrates the 3 ways to pass a TVP from C# and logs timings for each invocation.  As you can see in Figure 1, the TVP outperformed individual calls once more than 3 customers were passed.  Note that I had to use a logarithmic y-axis scale because of the huge performance difference.  For example, 10,000 individual calls averaged about 1.5 seconds whereas passing 10,000 customers via a TVP took less to 100 milliseconds.

Figure 1:  TVP versus individual call performance
TVP performance graph

Listing 4: Passing TVPs in C#

Improving Uniqueidentifier Performance

A common anti-pattern I run into is the random primary key, commonly a GUID. This design is insidious because the performance implications of random access aren’t immediately obvious and exacerbated when the primary key index is clustered. It is often only after the table grows to a larger size that the performance problems become apparent. Symptoms include slowly degrading performance over time, with increased blocking and deadlocking as a side effect.

Figure 1 shows the performance profile of a random inserts with a random GUID (SQL Server uniqueidentifier data type) clustered primary key. The red line indicates the rate of batch requests per second (inserts) while the blue line shows the total number of rows in the table, scaled such that the top of the graph represents 3M rows. Only about 700, 000 rows could be inserted during this 15 minute single-threaded random key insert test, even though the insert rate was fast initially.

Figure 1: Random key insert performance
Random insert performance graph

Incremental Primary Keys

As you might guess, the cure for the random primary key anti-pattern is an incremental key pattern. With a uniqueidentifier data type, a sequential value can be assigned by SQL Server using the NEWSEQUENTIALID function (in a default constraint expression) or in application code using the UuidCreateSequential Win32 API call along with some byte swapping (code example below). Alternatively, one can use an integral data type (int, bigint, etc.) along with a value generated by an IDENTITY property or a SEQUENCE object. The advantage of an integral type is the reduced space requirements compared to a 16-byte uniqueidentifier. The advantage of a uniqueidentifier is that it can easily be generated in application code before database persistence without a database round trip, which is desirable for distributed applications and when keys of related tables are assigned in application code before writing to the database.

Figure 2 shows the same test using a sequential key value. Over 2.2M rows were inserted in 15 minutes. As you can see, significant performance improvement is achieved with this trivial application change.

Figure 2: Incremental key insert performance
Random insert performance graph

Listing 1 shows the T-SQL code I used for these performance tests and listing 2 contains the C# code (with the random GUID commented out). I generated the uniqueidentifier value via application code in the tests but performance with NEWID() is comparable to the first test and NEWSEQUENTIALID() is similar to the second test.

Listing 1: T-SQL scripts for test table and stored procedure

Listing 2: C# insert test console application

Why Random Keys Are Bad

I think it’s important for one to understand why random keys have such a negative impact on performance against large tables. DBAs often cite fragmentation and page splits as the primary causes of poor performance with random keys. Although it is true random inserts do cause fragmentation and splits, the primary cause of bad performance with random keys is poor temporal reference locality (, which I’ll detail shortly. Note that there were no real page splits in these insert performance tests because the nearly 8K row size allowed only one row per page. Although significant extent fragmentation occurred, this didn’t impact these single-row requests; extent fragmentation is mostly an issue with sequential scans against spinning media. So neither splits nor fragmentation explain the poor performance of the random inserts.

Temporal reference locality basically means that once data is used (e.g. inserted or touched in any way), it is likely to be used again in the near future. This is why SQL Server uses a LRU-2 algorithm to manage the buffer cache; data most recently touched will remain in memory while older, less often referenced data are aged out. The impact of random key values on temporal locality (i.e. buffer efficiency) is huge. Consider that inserts are basically rewrites of existing pages. When a new row is inserted into a table, SQL Server first reads the page where the row belongs (by key value if the table has a clustered index) and then either adds the row to the existing data page or allocates a new one if there’s not enough space available in the existing page for the new row. With a random key value, the new key value is unlikely to be adjacent to the last one inserted (which is probably still in memory) so the needed page often must be read from storage.

All things being equal, single-row performance will be roughly the same with both sequential and random keys as long as data are memory resident. This is why the random and sequential key insert tests show the same good performance initially. But once the table size exceeded the size of the buffer pool, the random key test showed a precipitous drop in throughput and steady degradation thereafter. In short, random keys diminish temporal reference locality because there is no correlation between time (most recently accessed data) and the key value.

Why Incremental Keys Good

An incremental key value naturally improves temporal reference locality; the next key value is adjacent to the last one inserted and is likely still in memory. An incremental key provides excellent insert performance regardless of table size as the insert performance test shows. Also, applications typically use recently inserted data more often than older data. This allows the same amount of work to done with much less physical I/O than a random key value.

Random Notes about GUIDs

According to the Globally unique identifier Wiki (, the random 122 bits of a GUID can generate 2122 unique values. That’s an incomprehensibly large 5.3 x 1036 (or 5,300,000,000,000,000,000,000,000,000,000,000,000) number unique values.

The value returned by NEWSEQUENTIALID and UuidCreateSequential is guaranteed to be unique on a given computer. Furthermore, it is globally unique if the computer has a network card because the MAC address is used as part of the GUID generation algorithm.

Remediating a Table Partitioning Mess

Partitioning often involves exceptionally large tables, which are quite unforgiving when it comes to remediating a mess due to improper partition design or maintenance.  In this post, I’ll discuss the common design mistake of using a RANGE LEFT partition function containing exact date boundaries with a date type that may include a time component (datetime, datetime2 or datetimeoffset) and show how to change it to RANGE RIGHT with minimal data movement.

The most common use case for table partitioning is an incremental value.  Partitioning on a date, datetime, datetime2 or datetimeoffset value that occurs naturally in the data allows incremental load using SPLIT and SWITCH operations and purge/archive using SWITCH and MERGE.  SWITCH is nearly instantaneous regardless of the amount of underlying data involved because only meta-data changes are needed to move partitions from one table to another.  SPLIT and MERGE are also very fast meta-data only operations as long as no data movement is required.  For this reason, it is critical to setup boundaries correctly and plan as to perform SPLIT and MERGE against empty partitions whenever possible.

Note that RANGE LEFT or RANGE RIGHT specification determines:

  • The partition that contains exact boundary value matches
  • The partition is created by SPLIT
  • The partition is removed by MERGE

A common design mistake is using exact date boundaries in a RANGE LEFT function with datetime, datetime2 or datetimeoffset data type.  This error can result in rows in the wrong partition because these data types include a time component and the implications are not obvious, especially to partitioning newbies.  Rows that exactly match the date boundary (i.e. midnight on boundary date) are inserted into the lower partition instead of the higher one as desired.  For example, consider a RANGE LEFT function is a monthly sliding window scenario and boundaries of ‘2013-01-01T00:00:00’ and ‘2013-02-01T00:00:00’:

This setup looks reasonable at first glance but consider the actual partitions when a table is created on the partition scheme:

  • Partition 1 (December2012 filegroup):  <= ‘2013-01-01T00:00:00’
  • Partition 2 (January2013 filegroup):  > ‘2013-01-01T00:00:00’ AND <= ‘2013-01-02T00:00:00’
  • Partition 3 (February2013 filegroup):  > ‘2013-02-01T00:00:00’

A row with the January datetime of ‘2013-01-01T00:00:00’ will end up in the first partition (December2012 filegroup).  Sliding window maintenance, which purges the oldest month (December2012), will include the ‘2013-01-01T00:00:00’ row(s) inadvertently.  Similarly, a row with value ‘2013-02-01T00:00:00’ will end up in the second partition (January2013 fielgroup) and require both the second and third partitions to be accessed when data from all of February are queried.

There are two ways to address this problem.  One is to change the boundary values of the existing RANGE LEFT function to the maximum inclusive value instead of an exact date. For example, instead of ‘2013-01-01T00:00:00’ for January, 2013, change the boundary values to the maximum value according to the partition function data type as follows:

  • *datetime: ‘2013-01-31T23:59:59.997’
  • datetime2(0): ‘2013-01-31T23:59:59’
  • datetime2(1): ‘2013-01-31T23:59:59.9’
  • datetime2(2): ‘2013-01-31T23:59:59.99’
  • datetime2(3): ‘2013-01-31T23:59:59.999’
  • datetime2(4): ‘2013-01-31T23:59:59.9999’
  • datetime2(5): ‘2013-01-31T23:59:59.99999’
  • datetime2(6): ‘2013-01-31T23:59:59.999999’
  • datetime2(7): ‘2013-01-31T23:59:59.9999999’
  • **datetimeoffset (0): ‘2013-01-31T23:59:59-05:00’’
  • **datetimeoffset (1): ‘2013-01-31T23:59:59.9-05:00’’
  • **datetimeoffset (2): ‘2013-01-31T23:59:59.99-05:00’’
  • **datetimeoffset (3): ‘2013-01-31T23:59:59.999-05:00’’
  • **datetimeoffset (4): ‘2013-01-31T23:59:59.9999-05:00’’
  • **datetimeoffset (5): ‘2013-01-31T23:59:59.99999-05:00’’
  • **datetimeoffset (6): ‘2013-01-31T23:59:59.999999-05:00’’
  • **datetimeoffset(7): ‘2013-01-31T23:59:59.9999999-05:00’’

*Importantly, one needs to specify 997 as the milliseconds value for datetime because a higher millisecond value will be rounded to the next exact date (e.g. ‘2013-02-01T00:00:00.000’)

**The actual offset for datetimeoffset partitioning depends on your environment but is typically the same for each boundary.

Rather than changing the boundary specification, I suggest instead changing the function from LEFT to RIGHT and continue using an exact date boundary.  The inclusive date boundary is more intuitive (in my opinion) and allows the same date boundaries to be used regardless of data type.  The same amount of work is required for a mass change of existing boundaries as is required to change from RANGE LEFT to RANGE RIGHT.

The script below converts a table partitioned using a RANGE LEFT function with exact date boundaries to one partitioned RANGE RIGHT as follows:

  1. create a staging table exactly like the source table and partitioned using the same scheme
  2. create a non-partitioned holding table with the same schema source but without constraints and indexes
  3. create a new RANGE RIGHT function with the same exact date boundaries as the existing RANGE LEFT scheme
  4. create a new partition scheme with the same filegroups as the source table partition scheme but based on the new RANGE RIGHT function
  5. create a new table exactly like the source but partitioned using the new RANGE RIGHT scheme
  6. for each existing partition
    1. switch source partition to staging table
    2. move rows with exact boundary matches to holding table
    3. create check constraint on staging table matching the target right boundaries
    4. switch staging partition to new table
    5. drop check constraint
    6. after all partitions are switched to the new table, insert exact boundary matching rows to new table
    7. drop the old table and staging tables
    8. rename the new table, constraints and indexes to the old names

Since this script uses SWITCH to accomplish the bulk of the work, the table must be aligned; data and indexes must be partitioned using the same (or functionally identical) partition function.  The SWITCH operations are nearly instantaneous and the only data movement required is moving rows that exactly match the boundary values to/from the holding table, which shouldn’t take long unless you have a lot of matching rows.  The long pole in the tent from a performance perspective is likely creating the needed check constraint on the staging table because all rows must be scanned.  However, this method is still considerably faster and requires less space than repartitioning the table by recreating indexes using the new partition scheme.

Here’s the setup script for the problem table and sample data:

And the remediation script to change to RANGE RIGHT:

Service Broker External Activator Example

The Service Broker External Activator (SBEA) allows one to launch any command-line program asynchronously using T-SQL.  I haven’t found many complete end-to-end examples on how to create an SBEA application so this article will demonstrate with a common use-case; launch a SSIS package on demand with the DTEXEC command-line utility.  The package can be run on any machine with SBEA and SSIS installed, which allows you to run the package remotely on a different server if you so desire.

Service Broker External Activator Overview

The latest version of the tool, officially named Microsoft® SQL Server® Service Broker External Activator for Microsoft® SQL Server® 2012, is a free download (  Service Broker External Activator is also included in the SQL Server 2008 and SQL Server 2008 R2 Feature Packs.

SBEA runs as a Windows service and leverages the SQL Server Service Broker event notification infrastructure to run a command-line application upon receipt of an event notification message.  When an event notification message is received, it launches the command-line program associated with the queue identified in the notification event message.  There is some learning curve if you are not already familiar with Service Broker and event notification concepts but it is a worthwhile investment, not just for SBEA, but for any application that needs asynchronous or message-based processing.

SBEA is an alternative to over other methods that can launch an external program via T-SQL, such as xp_cmdshell, a SQL Agent Job (sp_start_job) or a custom SQLCLR proc.  The xp_cmdshell extended stored procedure allows any ad-hoc command to be run on the database server, limited only by SQL Server service or proxy account permissions.  You need to be mindful of the security implications and take precautions to ensure it can be used only in the way intended.  Also, xp_cmdshell runs the launched program synchronously so the calling application must wait for the application to finish.  You could take special measures to emulate asynchronous behavior with xp_cmdshell but that’s a kludge at best.

A SQL Agent job allows you to run a command asynchronously but the limitation is only one instance of a job to run at a time.  In contrast, SBEA allows one to run multiple application instances concurrently and does so automatically.  SBEA allows a configurable maximum number of application instances so that you can scale automatically to meet peak demands.

SQL Server 2012 introduces the SSISDB catalog, which provides stored procedures to run SSIS packages asynchronously via T-SQL.  However, packages must be stored in the SSISDB database in order to use this feature.  SBEA is an alternative if you are pre SQL Server 2012, need to run packages not stored in the SSIS catalog, or need to run command-line programs besides DTEXEC.EXE.

SBEA Database Objects

As with all Service Broker applications, the minimal database objects needed are an initiator service and queue as well as a target service and queue.  These are used to pass messages to and from the target application (SSIS package in this case).  Additionally, the SBEA needs a notification service, notification queue and event notification so that it is notified when messages are waiting in the target queue for processing.

Stored procedures are a great way to encapsulate Service Broker functionality and implementation details so I use procs here as the application database interface.  Stored procs also simply security since the only database permissions necessary are execute permissions on the procs.  Within the procs, I also log to a table to facilitate monitoring progress, troubleshooting and auditing.  The complete T-SQL script is included at the end of this article.

Below is a summary of the database objects I created for this example:

Service Broker Objects

  • BatchProcessInitiatorService:  SB service that requests package execution (requesting application)
  • BatchProcessInitiatorQueue:  SB queue for success or failure result messages (returned by package after execution)
  • BatchProcessTargetService:  SB service that executes the request (SSIS package)
  • BatchProcessTargetQueue:  SB queue for execution requests, with message containing parameters needed by SSIS package (or an empty message if no parameters are needed)
  • BatchProcessNotificationService:  SB service that is the target of the event notifications (SBEA service)
  • BatchProcessNotificationQueue:  SB queue for event notification messages
  • BatchProcessEventNotification:  SB event notification that monitors the BatchProcessTargetQueue

Stored Procedures

  • dbo.usp_LaunchBatchProcess (called by invoking application): begins conversation to target service and sends message with parameters needed by package
  • dbo.usp_GetBatchProcessParemeters (called by SSIS package at start): receives next message from target queue and logs to table
  • dbo.usp_CompleteBatchProcess (called by SSIS package at completion): returns result message to initiator using END CONVERSATION (or END CONVERSATION WITH ERROR)
  • dbo.usp_LogBatchProcessResult (initiator queue activated proc): receives result messages from initiator queue, logs results to table and ends the conversation


  • dbo.BatchProcessLog: Each received message is logged to this table

SBEA Configuration File

Service Broker External Activator is configured using the xml configuration file (EAService.config) in the application Config folder (e.g. C:\Program Files\Service Broker\External Activator\Config\).  This configuration file specifies:

  • The name of the notification service
  • Connection string used to connect to the database
  • A list of one or more Service Broker services along with the command-line application and arguments to launch for each

Below is the example I use for this sample with the relevant values in bold.  As a side note, SBEA currently supports only one notification service even though the parent element name (NotificationServiceList) implies one can have many.  However, I haven’t personally run into a case where this is an issue since, in a distributed environment, one can designate a single database for the SBEA messaging infrastructure.

Service Broker External Activator Processing

When the SBEA service starts, it validates the application service list against Service Broker meta-data to ensure the specified services and queues exist and are properly associated.  Then SBEA starts monitoring the notification queue associated with the specified notification service.  When a queue notification event message is received, it matches the server, database, schema and queue contained in the message payload against the application services listed.  When matched, it launches the command-line application associated with that service.  A warning message () is logged if a queue notification message is received that doesn’t have a corresponding application.

SBEA logs trace messages to a file named EATrace.log in the Log folder (C:\Program Files\Service Broker\External Activator\Log).  Logging levels are configurable.  I suggest you start with verbose logging initially (as in the same config) and adjust to less logging once your application is stable.  I included the config schema reference and various logging options as comments in the sampe config for your convenience.  Be aware that trace messages are appended to the file so you’ll want to periodically delete (or archive) the file.

SBEA Application Design and Message Flow

There are a many different patterns one can generally employ with Service Broker bi-directional message processing.  Here, I use a single-message-per-conversation pattern to pass parameters needed by the package as well-formed XML.  The target (SSIS package) does its work and ends its side of the conversation upon completion, indicating success or failure on the END CONVERSATION statement, which generates an EndDialog or Error message that is returned via the initiator queue.

Service Broker automatically executes the initiator queue activated proc when a message is waiting on the queue.  Acting on behalf of the initiating application, which might not be running due to asynchronous execution of the package, the activated proc ends the other side of the conversation upon receipt of the EndDialog or Error message.  Both sides of the conversation are ended, a best practice with this single-message-per-conversation pattern.

The following sequence describes the interaction and flow of messages between components:

  1. Initiating application executes dbo.usp_LaunchBatchProcess to launch console application.  Proc executes BEGIN DIALOG and SEND to insert the parameter message into the target queue.
  2. SBEA executes RECEIVE of the event notification message, indicating a message is waiting in the target queue
  3. SBEA launches the console application associated with the event notification
  4. The console application executes dbo.usp_GetBatchProcessParameters to get parameter values.  Proc executes RECEIVE of message from target queue and logs the message.
  5. The console application executes dbo.usp_CompleteBatchProcess at completion.  Proc executes END CONVERSATION to signal successful completion (or END CONVERSATION WITH ERROR to indicate an error occurred).
  6. Service Broker launches activated stored procedure dbo.usp_LogBatchProcessResult in response to the EndDialog message.  Proc execute RECEIVE from initiator queue and logs the result to a table.

The diagram below shows this message flow.SBEAApplicationMessageFlow


Example Script


The Service Broker External Activator is a powerful way to launch command-line programs asynchronously.  There is a learning curve, which can be steep if you know nothing about the SQL Server Service Broker feature.  But I hope this doesn’t discourage you from exploring and using SBEA.  It may be the right tool for the job in cases where you need asynchronous batch processing, avoiding custom programming or kludge design.  It’s always a good thing to have another tool in your toolbox.




Secrets of Foreign Key Index Binding

You might be surprised to learn that foreign keys bind to physical indexes when they are created. Furthermore, a foreign key does not necessarily bind to the primary key index of the referenced table; SQL Server allows a foreign key to refer to any column(s) that are guaranteed to be unique as enforced by a primary key constraint, unique constraint or unique index.

In this post, I’ll discuss the undocumented rules SQL Server uses to bind foreign key constraints to referenced table indexes so that you can achieve performance goals and protect yourself against unexpected errors in DDL modification scripts.


Typically, one references the primary key in foreign key relationships. I’ve seen a foreign key (deliberately) reference columns other than the primary key only a couple of times in my career. The foreign key referenced an alternate key with a unique constraint in those cases. Why one would create such a relationship is an exercise for the reader. I’ll focus on the primary key here, although the same considerations apply to foreign keys referencing alternate keys.

As I mentioned earlier, SQL Server binds a foreign key to a physical unique index. This binding performance implications because it determines the index SQL Server uses to enforce referential integrity as child table rows are inserted or updated. Also, SQL Server will not allow the index bound to a foreign key to be dropped since that could allow duplicate rows in the parent table and thus break the unique side of the relationship. This must be considered when developing scripts that drop unique indexes (including primary key and unique constraints) that may be bound to foreign keys.

A foreign key referencing the primary key will always be bound to the primary key index when that is the only unique index on the foreign key column(s). However, you might have additional unique indexes on the primary key column(s) for performance reasons. For example, consider the case of a clustered primary key. Performance of a frequently executed query may be improved with a covering non-clustered index:

–create parent table

CREATE TABLE dbo.ParentTable(

       ParentTableID int NOT NULL IDENTITY


       ,Column1 int NOT NULL

       ,Column2 varchar(100) NOT NULL




–create a non-clustered covering index


       ON dbo.ParentTable(ParentTableID) INCLUDE(Column1);



INSERT INTO dbo.ParentTable VALUES(1, ‘some data’);

INSERT INTO dbo.ParentTable VALUES(2, ‘some data’);

INSERT INTO dbo.ParentTable VALUES(3, ‘some data’);



–create child table

CREATE TABLE dbo.ChildTable(

       ChildTableID int NOT NULL IDENTITY


       ,ParentTableID int NOT NULL

              CONSTRAINT FK_ChildTable_ParentTable

                     FOREIGN KEY REFERENCES dbo.ParentTable(ParentTableID)




INSERT INTO dbo.ChildTable VALUES(1);

INSERT INTO dbo.ChildTable VALUES(1);

INSERT INTO dbo.ChildTable VALUES(1);

INSERT INTO dbo.ChildTable VALUES(1);

INSERT INTO dbo.ChildTable VALUES(2);

INSERT INTO dbo.ChildTable VALUES(2);

INSERT INTO dbo.ChildTable VALUES(2);

INSERT INTO dbo.ChildTable VALUES(2);

INSERT INTO dbo.ChildTable VALUES(3);

INSERT INTO dbo.ChildTable VALUES(3);

INSERT INTO dbo.ChildTable VALUES(3);

INSERT INTO dbo.ChildTable VALUES(3);







–show the foreign key index binding


FROM sys.foreign_keys AS f

JOIN sys.indexes AS fki ON

      fki.object_id = f.referenced_object_id     

      AND fki.index_id = f.key_index_id


      f.object_id = OBJECT_ID(N’dbo.FK_ChildTable_ParentTable’);



–this query uses the covering index instead of clustered PK index

SELECT p.ParentTableID, p.Column1

FROM dbo.ParentTable AS p

WHERE p.ParentTableID IN(1,2,3);




The SELECT query in the above script uses the covering idx_ParentTable_ParentTableID index. While this is good for performance, it introduces ambiguity regarding index binding to the foreign key. Again, any primary key constraint, unique constraint or index on the referenced column(s) may be referenced by a foreign key. With two candidate unique indexes (PK_ParentTable and idx_ParentTable_ParentTableID), you have little control which index is bound to the foreign key. 

SQL Server chooses the index binding based on rules that vary by version so you will get different binding depending on your version of SQLServer. SQL Server 2005 chooses the clustered index when possible and, if no suitable clustered index exists, the first (lowest index_id) unique non-clustered index on the referenced column(s) is used. The sample script above binds the foreign key to the PK_WideTable index under SQL Server 2005 because it is the clustered index, not because it is the primary key.

In later versions (SQL 2008, SQL 2008R2 and SQL 2012), the foreign key is bound to the unique non-clustered index on the referenced column(s) with the lowest index_id when possible. Only when no suitable unique non-clustered index exists is the unique clustered index chosen. So the foreign key in the above script is bound to idx_ParentTable_ParentTableID in SQL 2008 and later versions instead of the primary key index as one might expect.

Why Foreign Key Index Binding is Important

There are two reasons why it is important to control the index bound to a foreign key. One is performance. As I mentioned earlier, the index bound to the foreign key constraint is used at execution time to enforce the constraint as child table rows are inserted or the foreign key column(s) updated. If the parent table is large and not queried often but rows are inserted into the child table heavily, a unique non-clustered index that “covers” the referential integrity check may be more desirable than the clustered index. This can improve buffer efficiency and page life expectancy compared to using a clustered index (e.g. primary key). My assumption is that this is why SQL Server 2008 and later versions prefer the unique non-clustered index over the clustered index for constraint enforcement.

Another reason one should control the index bound to the foreign key is to facilitate index changes. If you try to drop an index bound to a foreign key, you’ll get an error like “An explicit DROP INDEX is not allowed on index ‘dbo.ParentTable.idx_ParentTable_ParentTableID ‘. It is being used for FOREIGN KEY constraint enforcement.” You’ll need to drop the foreign key first and recreate after dropping the index.

Since one can’t specify the bound foreign key index declaratively, the only guaranteed way to control the binding is to create the foreign key when only the desired unique index exists and create additional indexes afterward. This isn’t to say you can’t rely on the rules described earlier but you need to be aware that such rules vary depending on the SQL Server version and could change in the future. 



Microsoft decided to undeprecate OLE DB 6 years after the deprecation announcement mentioned in this post. A new OLE DB provider, Microsoft OLE DB Driver for SQL Server (MSOLEDBSQL), is now available. Keep in mind the legacy Microsoft OLE DB Provider for SQL Server (SQLOLEDB) that ships with Windows remains deprecated. If you continue down the OLE DB path for data access in native code, download and use MSOLEDBSQL going forward.


I was very surprised when Microsoft announced deprecation of OLE DB provider for SQL Server data access last week on the Data Access Blog and MSDN Forums Announcement. The next release of SQL Server, code-named “Denali”, will be the last to ship a new SQL Server Native Client OLE DB provider. The SQL Server Native Client OLE DB driver will continue to be supported for 7 years after the Denali release so we have plenty of time to plan accordingly.

The other Microsoft-supplied OLE DB driver for SQL Server, SQLOLEDB, has been deprecated for many years now. The deprecated SQLOLEDB driver (and deprecated SQLSRV32.DLL ODBC driver) is part of the older MDAC package and is currently included in Windows operating systems as part of Windows Data Access Components for backwards compatibility. Windows 7 is the last Windows version that will include a SQL Server OLE DB and ODBC driver out of the box. Microsoft recommends that we use the SQL Server Native Client ODBC driver as the SQL Server data access technology of choice from native code going forward.

What This Means to You

Avoid using OLE DB for new SQL Server application development. Update your technology roadmap to move towards migrating existing SQL Server applications that use the SQLNCLI, SQLNCLI10, SQLNCLI11 or SQLOLEDB OLE DB providers to the SQL Server Native Client ODBC driver.

Note that much is still unknown since current versions of SQL Server rely heavily on OLE DB. Although this is purely speculation on my part, it stands to reason that we will see improved ODBC support across all Microsoft products and SQL Server features that currently rely on OLE DB for relational data access.

New SQL Server Development

Use one of the following SQL Server relational database access technologies for new development:

·         Managed code (e.g. C#, VB.NET, managed C++): Use Sysem.Data SqlClient. SqlClient is part of the .NET framework and is the preferred way to access SQL Server from managed code (C#, VB.NET, managed C++). The only reason I can think why not to use SqlClient from managed code is if an application needs to also support other DBMS products using the same interface without coding an additional abstraction layer. In that case accessing different database products Sysem.Data.Odbc is an alternative.

·         Native code (e.g. unmanaged C++): Use ODBC with the Server Native Client driver. The ODBC call-level interface can be used directly or via the higher-level ADO API. The SQL Server Native Client ODBC driver is included with SQL Server and also available as a separate download. 

Migrating Existing Applications

I sometimes see existing managed applications use ADO (e.g. ADODB.Connection) instead of SqlClient. ADO is a COM-based API primarily intended to be used from native code rather than managed code. Typically, these applications were either converted from VB 6 or the developer used ADO instead of ADO.NET due to unfamiliarity with the ADO.NET object model.  This is a good opportunity to convert such code to use System.Data.SqlClient, which will perform better than OLE DB or ODBC from managed code. 

If you have an ADO application where performance is not a concern or the conversion is not worth the effort, an alternative is to simply change the provider to MSDASQL (OLE DB Provider for ODBC Drivers) and add the SQL Server Native Client ODBC driver specification. This can be done with a simple connection string change and the MSDASQL provider will translate the ADO OLE DB calls to ODBC. For example, to use the SQL Server 2008 SNAC ODBC driver:

Old OLE DB connection string: “Provider=SQLNCLI10.1;Data Source=MyServer;Integrated Security=SSPI”

New ODBC connection string: “Provider=MSDASQL;Driver={SQL Server Native Client 10.0};Server=MyServer;Trusted_Connection=Yes”


The same connection string change can be used for any ADO application, including ASP classic, legacy VB 6 or unmanaged C++.

Perhaps the biggest challenge will be native code that uses the OLE DB COM interfaces directly instead of going through higher level APIs like ADO. I’ve seen this most commonly done for performance sensitive applications in C++. The best approach here will be to convert the application to use the ODBC call-level interface directly. This will provide the highest SQL Server data access performance from native code. The difficulty of such a change will depend much on the application object model and design. Ideally, data access libraries are shared and abstracted so that low-level data access code changes only need to be made in one place.

Why SQLOLEDB and SQLNCLI Was Deprecated

If you’ve used SQL Server for a long time like me, you’ve seen a number of APIs come and go ( APIs are largely driven by changes in development and platform technologies that change over time. It is possible for Microsoft to support legacy APIs indefinitely but doing so would waste precious development resources on maintenance instead of adding new features that are important to us. COM-based APIs like OLE DB are complex and it just doesn’t make sense to have many APIs that basically do the same thing. 

So we now have the short list of SQL Server relational data access APIs going forward:

·         SqlClient (managed code)

·         JDBC (Java)

·         ODBC (for native code)


I’m a big fan of open, cross-platform standards so I’m glad that Microsoft chose ODBC over OLE DB for relational database access. ODBC is an implementation of the SQL call-level interface standard (ISO/IEC 9075-3). In contrast, the COM-based OLE DB SQL Server provider relies on proprietary Microsoft Windows COM technology. The SNAC ODBC driver is a truly native driver and provides the fastest SQL Server database access from native code.


Denali CTP3: THROW Statement

Not to mince words, T-SQL error handling has historically sucked. I’m excited that SQL Server “Denali” CTP3 (a.k.a. SQL11) includes a long-awaited THROW statement that I hope to see in the final release. In this post, I’ll dive into how this seemingly minor T-SQL enhancement will make it much easier for T-SQL developers to write robust and bug-free error handling code.

T-SQL Error Handling Ugliness

Unlike compiled application code that halts code execution upon an unhandled exception, a T-SQL might continue code execution afterward. T-SQL developers must include error checking/handling is to ensure code doesn’t continue down the “happy” path oblivious to an error, report the error to the caller, perform any necessary cleanup operations (typically ROLLBACK) and continue/halt execution as desired. The script below shows how one might accomplish this without structured error handling:

–Unstructured error handling example


SELECT 1/0 AS CauseAnError –report error caller

IF @@ERROR<> 0 GOTO ErrorHandler –detect error



IF @@TRANCOUNT> 0 ROLLBACK–cleanup after error

RETURN –stop further code execution


PRINT ‘Done’–not executed after error

This script results in the error:

Msg 8134, Level 16, State 1, Line 3

Divide by zero error encountered.
Unstructured error handling like this is especially a pain for multi-statement scripts and stored procedures. One has to include repetitive “IF @@ERROR” check to detect errors after each statement and error-prone unstructured GOTO code. It’s easy to miss error checking/handling bugs in unit testing.

On a positive note, no T-SQL code is necessary to raise the error; SQL Server automatically reports errors to the calling application without any T-SQL code to do so (unless TRY/CATCH is used). This guarantees the calling application is notified of errors during execution.

Two Steps Forward, One Step Back

The introduction of structured error handling (TRY/CATCH) in SQL 2005 is a both a blessing and a curse. The good is that TRY/CATCH avoids the repetitive, error prone and ugly procedural code needed to check @@ERROR after each T-SQL statement and allows one to more easily centralize error handling. The structured error-handling paradigm in T-SQL is more aligned with most application languages.

Consider the equivalent script with TRY/CATCH:

–Structured error handling example


@ErrorNumber int

,@ErrorMessage nvarchar(2048)

,@ErrorSeverity int

,@ErrorState int

,@ErrorLine int;

BEGIN TRY–detect errors


SELECT 1/0 AS CauseAnError;





@ErrorNumber =ERROR_NUMBER()

,@ErrorMessage =ERROR_MESSAGE()

,@ErrorSeverity = ERROR_SEVERITY()

,@ErrorState =ERROR_STATE()

,@ErrorLine =ERROR_LINE();

IF @@TRANCOUNT> 0 ROLLBACK; –cleanup after error

RAISERROR(‘Error %d caught at line %d: %s’–report error to caller






RETURN;–stop further code execution


PRINT ‘Done’; –not executed after error


Msg 50000, Level 16, State 1, Line 21

Error 8134 caught at line 10: Divide by zero error encountered
I really like the way structured error handling catches errors declaratively with centralized error handling. But TRY/CATCH introduces a couple of issues. Foremost is reporting of the error to the caller. A caught error prevents the error message from being returned to the client. When TRY/CATCH is employed, the developer assumes responsibility to notify the application that an error occurred. Failure to do so will result in a silent error undetectable by the calling application, which is seldom desirable. Using TRY/CATCH necessitates that you write a bit of code in the CATCH block to capture, report and/or log error details as well as control code flow after the error.

Another downside of TRY/CATCH before Denali is that you cannot raise the original error because RAISERROR does not allow a system error number to be specified (8134 in this example). Consequently, the divide by zero system error here cannot be raised in the CATCH block; a user-defined error in the 50000+ error number range must be raised instead, obfuscating the original error and line number. So instead of returning error information natively, you must write code to return original error details by some other means, such as in the error message text. This often leads to inconsistencies in the way errors are reported.

THROW to the Rescue

Denali introduces a simple THROW statement. THROW in a CATCH block with no parameters raises the caught error and stops further code execution unless an outer CATCH block exists. This greatly simplifies CATCH block error reporting and control flow code since this THROW behavior is exactly what one typically does after handling a T-SQL error. Furthermore, unlike RAISERROR, THROW retains the original error number, message text, state, severity and line number. This is the biggest T-SQL error handling enhancement since the introduction of TRY/CATCH in SQL Server 2005.

The THROW example below raises the original error and stops further code execution and is less verbose and error-prone than other methods:

–Structured error handling example in Denali CTP3

BEGIN TRY–detect errors


SELECT 1/0 AS CauseAnError;




IF @@TRANCOUNT> 0 ROLLBACK; –cleanup after error

THROW; –report error to caller and stop further code execution


PRINT ‘Done’; –not executed after error


Msg 8134, Level 16, State 1, Line 4

Divide by zero error encountered.
There are only a couple of scenarios I can think of not to use THROW in a CATCH block. One is when you need to continue code execution in the same scope after an error. Another is in an outermost catch block when you want to prevent the error from being returned to the client. However, these cases are the exception (no pun intended) rather than the rule.


THROW is a simple, yet powerful extension to SQL Server error handling. I’ll discuss some other enhancements to the core database engine as outlined in the What’s New section of the SQL Server “Denali” Books Online in future posts as well.