Always Use Semicolon Statement Terminators

ANSI-standard semicolon statement terminators are often omitted in T-SQL queries and many developers are unaware that this is syntax is deprecated.  Omitting statement terminators is a dangerous practice because, even if the batch compiles, you may get unexpected results.  Consider the insidious examples below pointed out by SQL Server MVP Erland Sommarskog:

BEGIN TRY
	BEGIN TRAN
	SELECT 1/0 AS CauseAnException
	COMMIT
END TRY
BEGIN CATCH
	SELECT ERROR_MESSAGE()
	THROW
END CATCH

Few of us will catch (no pun intended) the bug in the above script.  What results do you expect after running the above script under SQL Server 2012 or later versions?  Rather than leave this as an exercise for the reader, I’ll spoil the fun and mention that no run-time error is raised at all.  Instead, the THROW statement is interpreted as a column alias for the ERROR_MESSAGE() column.  This sort of coding error is especially nasty because catch blocks are rarely unit tested and this catch block coding mistake hides the run-time error entirely without raising an exception.

Similarly, the absence of statement terminators in the script below causes another problem.  Can you spot it?

BEGIN TRY
	BEGIN TRAN
	SELECT 1/0 AS CauseAnException
	COMMIT
END TRY
BEGIN CATCH
    IF @@TRANCOUNT > 0 ROLLBACK TRAN
    THROW
END CATCH

At least an error is raised in this case, albeit not the one you might expect.  The resultant error is “Cannot roll back THROW. No transaction or savepoint of that name was found”.  This coding bug obfuscates the preceding divide by zero error and prevents the THROW statement from being executed.

Semicolons Will Become Mandatory
Microsoft announced with the SQL Server 2008 release that semicolon statement terminators will become mandatory in a future version so statement terminators other than semicolons (whitespace) are currently deprecated.  This deprecation announcement means that you should always use semicolon terminators in new development.  I honestly don’t expect SQL Server to strictly enforce mandatory semicolons in the near future but it is still a best practice to use semicolon statement to avoid issues like those mentioned earlier as well as facilitate code maintainability.  I suggest specifying statement terminators in all new development and perhaps adding terminators to existing code as you perform maintenance.

Transact-SQL does not currently enforce the ANSI semicolon statement terminator requirement.  Instead, semicolon statement terminators are optional and any whitespace (spaces, tabs, newline) may be used instead.  The exception to this rule is that many of the statements introduced in SQL Server 2005 and later require the preceding statement to be properly terminated in order for the batch to compile.

Below are some guidelines I suggest on when to, and when not to, use semicolon statement terminators.

Suggested Guidelines
The Transact-SQL parser is quite lax, allowing any whitespace (e.g. space, tab, newline) to be used.  This laxness results in ambiguity like the examples at the beginning of this article demonstrate.  Similarly, statement terminators may not only be omitted, they may also be used in inappropriately.  I strongly suggest you adhere to the T-SQL syntax documented in the Books Online even if the parser allows otherwise.  This practice will help future-proof your code since relying on undocumented behavior is inherently risky.

Don’t precede a statement with a semicolon
Remember that the purpose of semicolons is to terminate SQL statements, not begin them.  A common mistake I see is throwing a semicolon in front of statements in order to get a batch of statements to compile, especially with newer statements like WITH (CTE expression) that require previous statement termination.  Although the T-SQL parser currently ignores extraneous and misplaced semi-colons, I suggest they be specified in the appropriate place according statement syntax documented in the SQL Server Books Online.

Specify semicolons at the end of each stand-alone SQL statement
Not only will this conform to the ANSI standard, your intent will be clearer and the code easier to read.

Terminate control-of-flow statement blocks at the end of the control-of-flow scope
Control-of-flow statements are not covered by the ANSI SQL standard because these are proprietary SQL extensions.  The SQL Server Books Online is sketchy on the subject and many of the examples (as of this writing) are inconsistent and do not always include statement terminators.  Furthermore, control-of-flow statement blocks are confusing due to the many variations, nesting, and optional BEGIN/END specifications.

Below are examples illustrating what I believe to be proper use of statement terminators control-of-flow block terminators using IF statements in SQL 2008 and later versions.  The same concepts apply to other control-of-flow constructs like WHILE and TRY/CATCH blocks.  I should add that this batch example will not compile under SQL 2005 because an explicit BEGIN/END block is required to execute a common table expression conditionally in that version.  T-SQL parser enhancements eliminated that requirement in SQL 2008 and later.

IF 1 = 1
  WITH a AS (SELECT 1 a) SELECT a FROM a; --terminate statement and IF

IF 1 = 1
  WITH a AS (SELECT 1 a) SELECT a FROM a; --terminate statement
ELSE
  WITH a AS (SELECT 1 a) SELECT a FROM a; --terminate statement and IF

IF 1 = 1
  BEGIN
    WITH a AS (SELECT 1 a) SELECT a FROM a; --terminate statement
  END; --terminate IF

IF 1 = 1
  BEGIN
    WITH a AS (SELECT 1 a) SELECT a FROM a; --terminate statement
  END
ELSE
  BEGIN
    WITH a AS (SELECT 1 a) SELECT a FROM a; --terminate statement
  END; --terminate IF

IF 1 = 1
  WITH a AS (SELECT 1 a) SELECT a FROM a; --terminate statement
ELSE
  BEGIN
    WITH a AS (SELECT 1 a) SELECT a FROM a; --terminate statement
  END; --terminate IF

IF 1 = 1
  BEGIN
    WITH a AS (SELECT 1 a) SELECT a FROM a; --terminate statement
  END
ELSE
  WITH a AS (SELECT 1 a) SELECT a FROM a; --terminate statement and IF

IF 1 = 1
  BEGIN
    WITH a AS (SELECT 1 a) SELECT a FROM a; --terminate statement
	IF 1 = 1
		WITH a AS (SELECT 1 a) SELECT a FROM a; --terminate statement
	ELSE
	BEGIN
		WITH a AS (SELECT 1 a) SELECT a FROM a; --terminate statement
	END; --terminate inner nested IF
	WITH a AS (SELECT 1 a) SELECT a FROM a; --terminate statement
  END
ELSE
  WITH a AS (SELECT 1 a) SELECT a FROM a; --terminate statement and outer IF

Summary
Consistent user of semicolons helps avoid bugs in code that might otherwise go undetected.  Code with statement terminators can also be more easily modified without introducing compile errors and make code easier to maintain because the end of each statement is readily apparent to subsequent developers.  Importantly, you’ll be better positioned for future SQL Server versions by consistently using semicolon statement terminators.

Help Stop SQL Injection Madness

SQL injection continues to be the most exploited security vulnerability on the internet. A SQL injection vulnerability can put companies out of business and expose innocent people to sensitive data disclosure and damage, all due to developer ignorance or carelessness. Sadly, although there is much information available on SQL injection exploits and mitigation, SQL injection exploits continue because developers use flawed database access techniques. This article details the best, and simplest, method to prevent SQL injection in SQL Server: parameterized SQL statements.

Parameterized SQL queries and stored procedure calls (with CommandType.StoredProcedure and no dynamic SQL within proc code) prevent unintended SQL from being executed. I’ll discuss exactly why shortly. Parameterized SQL also provides many benefits besides security. Notably:

  • allows quotes in data without programmatic escaping
  •  avoids the need to format date strings according to DATEFORMAT session settings
  • handles decimal separators automatically
  •  improves performance due to plan reuse
  •  provides cleaner code

Most articles on SQL injection focus on validating user input rather than parameterized queries.  Input validation certainly has benefits related to data integrity and user-friendly validation messages but it is at best a secondary defense against injection compared to parameterized queries.  Input validation using a blacklist of prohibited keywords and characters is especially weak.

SQL Injection Overview
I am puzzled by the number of examples in articles and forum posts that show building SQL statements with literals using a string concatenation technique.  I suspect the main reason for this practice is that developers write code that generates the exact same ad-hoc SQL statement they would write using a query tool like SQL Server Management Studio or Visual Studio.  This leads to the nasty habit of not using parameters.  For example, a developer might develop and test a query like this in SSMS:

SELECT FirstName, LastName, EmailAddress
FROM dbo.Customer
WHERE CustomerID = 12345;

After testing the query, the C# developer includes the SQL statement in the application code but, instead of a hard-coded value for CustomerID, the desired value is concatenated with the remainder of the SQL statement as a literal.

var command = new SqlCommand(
             "SELECT FirstName, LastName, EmailAddress" +
            " FROM dbo.Customer " +
            " WHERE CustomerID = '" + txtBoxCustomerID.Text + "';"
            ,connection);

The query runs perfectly well and returns the expected result in testing.  It is not until after production deployment that a script kiddie or hacker finds the SQL injection vulnerability.  A malicious person can easily manipulate this SQL statement in ways unintended by the developer.  For example, a hacker could enter CustomerID value “1 OR 1 = 1” in the text box to harvest the names and email addresses of all customers in the database.  This vulnerability can also be exploited to execute additional SQL statements in the same batch, which is often used to update data with malicious html script that is subsequently returned and executed by unsuspecting client browsers.  There are many ways to exploit a SQL injection vulnerability that result in sensitive data disclosure and/or database manipulation.  I won’t detail those here but rather recommend a very simple defense; use a parameterized statement instead of literals.

Mitigating SQL Injection
Below is a functionally identical parameterized query.

SELECT FirstName, LastName, EmailAddress
FROM dbo.Customer
WHERE CustomerID = @CustomerID;

This query can be incorporated into the C# application code with the parameter and corresponding value added to the parameterized SQL command as a strongly-typed parameter:

var command = new SqlCommand(
             "SELECT LastName" +
            " FROM dbo.Customer " +
            " WHERE CustomerID = @CustomerID;"
            ,connection);
command.Parameters.Add("@CustomerID", SqlDbType.Int).Value = int.Parse(textBoxCustomerID.Text);

As you can see, the effort needed to do the same job with a parameterized command is about the same as the string concatenation method, yet the security benefits are immeasurable.

Why Parameters Prevent SQL Injection
Parameterized SQL is more secure for a couple of reasons.  In the case of an ad-hoc query, the SQL statement with parameter markers is hard-coded in the application (or generated via an ORM framework) and passed to SQL Server as-is.  Parameter values are passed separately rather than inside the SQL statement itself.  This ensures the SQL statement cannot be modified by the values provided.

Similarly, a stored procedure call (using CommandType.StoredProcedure) sends only the stored procedure name to SQL Server with separately passed parameter values.  Using stored procedures also allow one to more strictly adhere to the security principal of least privilege.  By granting only stored procedure execute permissions, permissions on indirectly referenced objects are not needed as long as the ownership chain is unbroken.  This practices limits ad-hoc access to underlying data should an attacker find a back door into the database under the application security context.  Personally, I’m a staunch advocate of stored procedures for this and other reasons too.  Stored procedures provide a well-defined database interface, allowing queries and database schema to be refactored without breaking changes.

I should add that it is perfectly acceptable to build a SQL statement dynamically in either application code or stored procedures, but only if parameterized and actually needed for the task at hand.  A common use case is a dynamic search query that is constructed based on user criteria. Instead of coding a separate query for each possible search permutation, one can add parameterized WHERE clauses as needed.  Under no circumstances should a SQL statement string be built with concatenation of untrusted values.

Additional Precautions
Use Windows Authentication for SQL Server database access.  This practice eliminates the need to store security credentials in application configuration files.  If SQL authentication must be used, protect the credentials with encryption.  This is easily done in .NET by encrypting the connectionStrings section of the app.config file.  See http://msdn.microsoft.com/en-us/library/ms254494.aspx.

Adhere to Principal of least privilege.  Never use a privileged account, such as a sysadmin role member, for routine application access.  Grant permissions only on database objects which are used directly by applications.  Use stored procedures so that one can leverage ownership chaining, eliminating the need to grant permissions on tables directly.

Specify strongly-typed parameters.  The parameter data type can be inferred (e.g. using SqlCommand.Parameters.AddWithValue method) but a best practice is to explicitly specify a parameter data type that matches the underlying column data type and maximum length.  These practices will minimize the size of the SQL Server procedure cache and avoid implicit conversions that can negatively affect performance.

Maximizing Performance with Table-Valued Parameters

I’ve seen TVPs improve performance by orders of magnitude when used appropriately.  This feature isn’t used as often as it should be so I’ll show how to implement a TVP using a C# application example and discuss TVP considerations.

Why Table-Valued Parameters Improve Performance
TVPs allow one to pass multiple rows or values at once to a query or stored procedure.  By doing so, the query can leverage set-based operations to greatly improve performance compared to executing the query/proc many times.  TVPs decrease network latency by reducing network round trips.  Prior to TVPs, one had to employ workarounds like passing a string containing a delimited list of records (or XML) and parsing on the SQL side.  Those techniques are at best a kludge.  Instead TVPs are the right tool for the job in SQL Server 2008 and later.

TVPs (and table variables) are temp tables.  TVPs and table variables are persisted in tempdb much the same way as regular temp tables (# prefix).  The big differences are that TVPs and table variables are limited in scope to the current batch and do not have statistics.  Also, table parameters/variables allow only indexes declared via primary key nor unique constraints; non-unique indexes are not supported.

When a TVP Is passed to SQL Server from a client application, the data is bulk-inserted into tempdb where it is made available to the SQL statement or stored procedure using the declared TVP.  This bulk insert is handled by the client API transparently to the application and allows a large number of rows to be passed to SQL Server efficiently.  In T-SQL, one can declare and load a table variable, where it can be used directly in a SQL statement or passed as a TVP to a stored procedure or query (invoked with sp_executesql).

The exact threshold where TVPs outperform single-row operations is much lower than you might expect.  Your mileage may vary but, at least in my experience, it’s only a few rows.  The performance tests I ran here show that the threshold where a TVP outperforms individual requests with a simple list of customers is only 4 rows.  That said, I would not recommend TVPs for single-row operations due to the start-up overhead but it’s a good choice when the most common use case is multiple rows.

How to Use TVPs
In order to pass a TVP to a SQL statement or stored procedure, one must first create a user-defined table type in SQL Server.  The table type defines the schema of the table SQL Server expects for the TVP.  Listing 1 shows the CREATE DDL for the table type I use in the AdventureWorks2012 database to pass a list of customers to the stored procedure in Listing 2 that returns all orders for these customers.  Note that the stored procedure TVP must be declared as READONLY.

Listing 1: Table type DDL

CREATE TYPE dbo.CustomerList AS TABLE(
CustomerID int NOT NULL
PRIMARY KEY
);

Listing 2: Stored procedure to return orders for multiple customers

CREATE PROC dbo.usp_select_orders_for_multiple_customers
@CustomerList dbo.CustomerList READONLY
AS
SELECT soh.CustomerID, soh.SalesOrderID
FROM Sales.SalesOrderHeader AS soh
JOIN @CustomerList AS cl ON
cl.CustomerID = soh.CustomerID;
RETURN @@ERROR;

TVPs are passed from a .NET application using parameter data type DbType.Structured.  The actual parameter value can be an object of type DataTable, DbDataReader or IEnumberable<SqlDataRecord>.  A DataTable is appropriate when the required TVP data is already in a DataTable object needed for other purposes.  A DbDataReader is a good choice when the source data is the result of a query (note that any DbDataReader may be used, not just a SqlDataReader).   For other cases, I recommend an IEnumerable<SqlDataRecord>.  It is very easy to implement IEnumerable<SqlDataRecord>, especially if you already have an enumerable object like an array or collection containing the values you need to pass.  See the code at the end of this article for an example class that implements IEnumerable<SqlDataRecord> for a TVP.

A benefit with DbDataReader and IEnumerable<SqlDataRecord> is that the source data can be streamed to SQL server without first loading all the TVP data into memory in either the client application or in SQL Server.  This is a big consideration when a lot of data needs to be passed because you are not constrained by memory on ether the client or SQL Server side.

Listing 3 shows the stored procedure I use to compare performance with the TVP method.  This proc takes only a single @CustomerID parameter so it must be called once for each customer.

Listing 3: Stored procedure to return orders for a single customer

CREATE PROC dbo.usp_select_orders_for_single_customer
@CustomerID int
AS
SELECT soh.CustomerID, soh.SalesOrderID
FROM Sales.SalesOrderHeader AS soh
WHERE
soh.CustomerID = @CustomerID;
RETURN @@ERROR;

The complete C# console application I used for the performance tests is in Listing 4.  The code demonstrates the 3 ways to pass a TVP from C# and logs timings for each invocation.  As you can see in Figure 1, the TVP outperformed individual calls once more than 3 customers were passed.  Note that I had to use a logarithmic y-axis scale because of the huge performance difference.  For example, 10,000 individual calls averaged about 1.5 seconds whereas passing 10,000 customers via a TVP took less to 100 milliseconds.

Figure 1:  TVP versus individual call performance
TVP performance graph

Listing 4: Passing TVPs in C#

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;
using System.Data.SqlClient;
using System.Diagnostics;
using Microsoft.SqlServer.Server;
using System.IO;

namespace TVPTest
{
    class Program
    {

        private static string connectionString = @"Data Source=MyServer;Initial Catalog=AdventureWorks2012;Integrated Security=SSPI";
        private static StreamWriter logFile = new StreamWriter(@"TvpTest.log", false);

        static void Main(string[] args)
        {

            // write column headers to log file
            logFile.WriteLine("{0},{1},{2}"
                , "Test Name"
                , "Customers"
                , "Durations");

            // run each performance test 100 times
            for (int i = 0; i < 100; ++i)
            {
                runPerformanceTest(1);
                runPerformanceTest(2);
                runPerformanceTest(3);
                runPerformanceTest(4);
                runPerformanceTest(5);
                runPerformanceTest(10);
                runPerformanceTest(25);
                runPerformanceTest(50);
                runPerformanceTest(100);
                runPerformanceTest(500);
                runPerformanceTest(1000);
                runPerformanceTest(2500);
                runPerformanceTest(5000);
                runPerformanceTest(10000);
                logFile.Flush();
            }

            logFile.Close();

            Console.WriteLine("Press any key to close.");
            Console.ReadKey();
        }

        private static void runPerformanceTest(int customerCount)
        {

            Console.WriteLine("{0} customers:", customerCount);

            DataTable orderDataTable;
            Stopwatch sw;
            DateTime duration;

            //individual call baseline
            sw = Stopwatch.StartNew();
            orderDataTable = getOrdersForCustomersIndividually(customerCount);
            sw.Stop();
            duration = new DateTime(sw.Elapsed.Ticks);
            Console.WriteLine("\tIndividual call: {0} orders returned, duration is {1}"
                , orderDataTable.Rows.Count, duration.ToString("HH:mm:ss.ffffff"));
            logFile.WriteLine("{0},{1},{2}"
                , "Individual call"
                , customerCount
                , duration.ToString("HH:mm:ss.ffffff"));

            //TVP from DataTable
            sw = Stopwatch.StartNew();
            orderDataTable = getOrdersForCustomersUsingTvp(getCustomerListDataTable(customerCount));
            sw.Stop();
            duration = new DateTime(sw.Elapsed.Ticks);
            Console.WriteLine("\tTVP DataTable: {0} orders returned, duration is {1}"
                , orderDataTable.Rows.Count, duration.ToString("HH:mm:ss.ffffff"));
            logFile.WriteLine("{0},{1},{2}"
                , "TVP DataTable"
                , customerCount
                , duration.ToString("HH:mm:ss.ffffff"));

            //TVP from DataReader
            sw = Stopwatch.StartNew();
            orderDataTable = getOrdersForCustomersUsingTvp(getCustomerListDataReader(customerCount));
            sw.Stop();
            duration = new DateTime(sw.Elapsed.Ticks);
            Console.WriteLine("\tTVP DataReader: {0} orders returned, duration is {1}"
                , orderDataTable.Rows.Count, duration.ToString("HH:mm:ss.ffffff"));
            logFile.WriteLine("{0},{1},{2}"
                , "TVP DataReader"
                , customerCount
                , duration.ToString("HH:mm:ss.ffffff"));

            //TVP from IEnumerable
            sw = Stopwatch.StartNew();
            orderDataTable = getOrdersForCustomersUsingTvp(getCustomerListSqlDataRecords(customerCount));
            sw.Stop();
            duration = new DateTime(sw.Elapsed.Ticks);
            Console.WriteLine("\tTVP IEnumerable: {0} orders returned, duration is {1}"
                , orderDataTable.Rows.Count, duration.ToString("HH:mm:ss.ffffff"));
            logFile.WriteLine("{0},{1},{2}"
                , "TVP IEnumerable"
                , customerCount
                , duration.ToString("HH:mm:ss.ffffff"));

            Console.WriteLine();

        }

        /// 
        /// Call proc individually for each customer
        /// 
        /// Number of customers to return
        /// DataTable of customer orders
        private static DataTable getOrdersForCustomersIndividually(int customerCount)
        {
            using (var connection = new SqlConnection(connectionString))
            using (var command = new SqlCommand(@"dbo.usp_select_orders_for_single_customer", connection))
            {
                command.CommandType = CommandType.StoredProcedure;
                var customerIdParameter = command.Parameters.Add("@CustomerID", System.Data.SqlDbType.Int);
                var orderListDataAdapter = new SqlDataAdapter(command);
                var orderDataTable = new DataTable();
                for (int customerId = 11000; customerId < 11000 + customerCount; ++customerId)
                {
                    customerIdParameter.Value = customerId;
                    orderListDataAdapter.Fill(orderDataTable);
                }
                return orderDataTable;
            }
        }

        /// 
        /// Call proc with TVP of CustomersIDs
        /// 
        /// List of customers
        /// DataTable of customer orders
        private static DataTable getOrdersForCustomersUsingTvp(Object customerList)
        {
            using (var connection = new SqlConnection(connectionString))
            using (var command = new SqlCommand(@"dbo.usp_select_orders_for_multiple_customers", connection))
            {
                command.CommandType = CommandType.StoredProcedure;
                command.Parameters.Add("@CustomerList", System.Data.SqlDbType.Structured).Value = customerList;
                var orderListDataAdapter = new SqlDataAdapter(command);
                var orderDataTable = new DataTable();
                orderListDataAdapter.Fill(orderDataTable);
                return orderDataTable;
            }
        }

        /// 
        /// Create a DataTable containing list of customers for TVP
        /// 
        /// Number of customers to return
        /// DataTable with single CustomerID column
        private static DataTable getCustomerListDataTable(int customerCount)
        {
            var customerListDataTable = new DataTable();
            customerListDataTable.Columns.Add("CustomerID", typeof(int));

            for (int customerId = 11000; customerId < 11000 + customerCount; ++customerId)
            {
                var row = customerListDataTable.NewRow();
                customerListDataTable.Rows.Add(row);
                row[0] = customerId;
            }
            return customerListDataTable;
        }

        /// 
        /// Create a DataReader containing list of customers for TVP
        /// 
        /// Number of customers to return
        /// DataReader with single CustomerID column        
        private static IDataReader getCustomerListDataReader(int customerCount)
        {
            var connection = new SqlConnection(connectionString);
            var command = new SqlCommand(@"SELECT CustomerID FROM Sales.Customer WHERE CustomerID BETWEEN 11000 AND 11000 + @CustomerCount - 1;", connection);
            command.Parameters.Add("@CustomerCount", System.Data.SqlDbType.Int).Value = customerCount;
            connection.Open();
            command.CommandType = CommandType.Text;
            return command.ExecuteReader();

        }

        /// 
        /// Create an IEnumerable<SqlDataRecord> object containing list of customers for TVP
        /// 
        /// Number of customers to return
        /// IEnumerable<SqlDataRecord> with single CustomerID column
        private static IEnumerable getCustomerListSqlDataRecords(int customerCount)
        {
            var customerList = new List();
            for (int customerId = 11000; customerId < 11000 + customerCount; ++customerId)
            {
                customerList.Add(customerId);
            }
            return new CustomerListSqlDataRecords(customerList);
        }

        /// 
        /// Implement IEnumerable<SqlDataRecord> to return a SqlDataRecord for each CustomerID
        /// in the provided IEnumerable<int>
        /// 
        private class CustomerListSqlDataRecords : IEnumerable
        {
            private SqlMetaData metaData = new SqlMetaData("CustomerID", SqlDbType.Int);

            private IEnumerable customerList;

            public CustomerListSqlDataRecords(IEnumerable customerList)
            {
                this.customerList = customerList;
            }

            public IEnumerator GetEnumerator()
            {
                foreach (var customerId in customerList)
                {
                    var record = new SqlDataRecord(metaData);
                    record.SetInt32(0, customerId);
                    yield return record;
                }
            }

            System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
            {
                throw new NotImplementedException();
            }
        }
    }
}

Improving Uniqueidentifier Performance

A common anti-pattern I run into is the random primary key, commonly a GUID. This design is insidious because the performance implications of random access aren’t immediately obvious and exacerbated when the primary key index is clustered. It is often only after the table grows to a larger size that the performance problems become apparent. Symptoms include slowly degrading performance over time, with increased blocking and deadlocking as a side effect.

Figure 1 shows the performance profile of a random inserts with a random GUID (SQL Server uniqueidentifier data type) clustered primary key. The red line indicates the rate of batch requests per second (inserts) while the blue line shows the total number of rows in the table, scaled such that the top of the graph represents 3M rows. Only about 700, 000 rows could be inserted during this 15 minute single-threaded random key insert test, even though the insert rate was fast initially.

Figure 1: Random key insert performance
Random insert performance graph

Incremental Primary Keys

As you might guess, the cure for the random primary key anti-pattern is an incremental key pattern. With a uniqueidentifier data type, a sequential value can be assigned by SQL Server using the NEWSEQUENTIALID function (in a default constraint expression) or in application code using the UuidCreateSequential Win32 API call along with some byte swapping (code example below). Alternatively, one can use an integral data type (int, bigint, etc.) along with a value generated by an IDENTITY property or a SEQUENCE object. The advantage of an integral type is the reduced space requirements compared to a 16-byte uniqueidentifier. The advantage of a uniqueidentifier is that it can easily be generated in application code before database persistence without a database round trip, which is desirable for distributed applications and when keys of related tables are assigned in application code before writing to the database.

Figure 2 shows the same test using a sequential key value. Over 2.2M rows were inserted in 15 minutes. As you can see, significant performance improvement is achieved with this trivial application change.

Figure 2: Incremental key insert performance
Random insert performance graph

Listing 1 shows the T-SQL code I used for these performance tests and listing 2 contains the C# code (with the random GUID commented out). I generated the uniqueidentifier value via application code in the tests but performance with NEWID() is comparable to the first test and NEWSEQUENTIALID() is similar to the second test.

Listing 1: T-SQL scripts for test table and stored procedure

CREATE TABLE dbo.TestTable(
	TestKey uniqueidentifier NOT NULL
		CONSTRAINT PK_TestTable PRIMARY KEY CLUSTERED
	,TestData char(8000) NOT NULL
);
GO

CREATE PROC dbo.InsertTestTable
	@TestKey uniqueidentifier
	,@TestData char(8000)
AS
SET NOCOUNT ON;
DECLARE @TotalRows int;

--insert row
INSERT INTO dbo.TestTable (TestKey, TestData) 
	VALUES(@TestKey, @TestData);

--update pmon counter for rowcount
SELECT @TotalRows = rows 
FROM sys.partitions 
WHERE object_id = OBJECT_ID(N'TestTable') AND index_id = 1;
EXEC sys.sp_user_counter1 @TotalRows; --for pmon row count

RETURN @@ERROR;
GO

Listing 2: C# insert test console application

using System;
using System.Data;
using System.Data.SqlClient;
using System.Runtime.InteropServices;

namespace UniqueIdentifierPerformanceTest
{
    class Program
    {
        [DllImport("rpcrt4.dll", CharSet = CharSet.Auto)]
        public static extern int UuidCreateSequential(ref Guid guid);

        static string connectionString = @"Data Source=MyServer;Initial Catalog=MyDatabase;Integrated Security=SSPI";
        static int rowsToInsert = 10000000;
        static SqlConnection connection;
        static SqlCommand command;

        static void Main(string[] args)
        {

            int rowsInserted = 0;

            using (connection = new SqlConnection(connectionString))
            {
                using (command = new SqlCommand("dbo.InsertTestTable", connection))
                {
                    command.Connection = connection;
                    command.CommandType = CommandType.StoredProcedure;
                    command.Parameters.Add("@TestKey", SqlDbType.UniqueIdentifier);
                    command.Parameters.Add("@TestData", SqlDbType.Char, 8000);

                    connection.Open();

                    while (rowsInserted < rowsToInsert)
                    {
                        //random guid
                        //command.Parameters["@TestKey"].Value = Guid.NewGuid();

                        //sequential guid
                        command.Parameters["@TestKey"].Value = NewSequentialGuid();
                        command.Parameters["@TestData"].Value = "Test";
                        command.ExecuteNonQuery();
                        ++rowsInserted;

                        //display progress every 1000 rows
                        if (rowsInserted % 1000 == 0)
                        {
                            Console.WriteLine(string.Format(
                                "{0} of {1} rows inserted"
                                , rowsInserted.ToString("#,##0")
                                , rowsToInsert.ToString("#,##0")));
                        }
                    }
                }

                connection.Close();
            }

        }

        /// 
        /// call UuidCreateSequential and swap bytes for SQL Server format
        /// 
        /// sequential guid for SQL Server
        private static Guid NewSequentialGuid()
        {
            const int S_OK = 0;
            const int RPC_S_UUID_LOCAL_ONLY = 1824;

            Guid oldGuid = Guid.Empty;

            int result = UuidCreateSequential(ref oldGuid);
            if (result != S_OK && result != RPC_S_UUID_LOCAL_ONLY)
            {
                throw new ExternalException("UuidCreateSequential call failed", result);
            }

            byte[] oldGuidBytes = oldGuid.ToByteArray();
            byte[] newGuidBytes = new byte[16];
            oldGuidBytes.CopyTo(newGuidBytes, 0);

            // swap low timestamp bytes (0-3)
            newGuidBytes[0] = oldGuidBytes[3];
            newGuidBytes[1] = oldGuidBytes[2];
            newGuidBytes[2] = oldGuidBytes[1];
            newGuidBytes[3] = oldGuidBytes[0];

            // swap middle timestamp bytes (4-5)
            newGuidBytes[4] = oldGuidBytes[5];
            newGuidBytes[5] = oldGuidBytes[4];

            // swap high timestamp bytes (6-7)
            newGuidBytes[6] = oldGuidBytes[7];
            newGuidBytes[7] = oldGuidBytes[6];

            //remaining 8 bytes are unchanged (8-15) 

            return new Guid(newGuidBytes);

        }

    }
}

Why Random Keys Are Bad

I think it’s important for one to understand why random keys have such a negative impact on performance against large tables. DBAs often cite fragmentation and page splits as the primary causes of poor performance with random keys. Although it is true random inserts do cause fragmentation and splits, the primary cause of bad performance with random keys is poor temporal reference locality (http://en.wikipedia.org/wiki/Locality_of_reference), which I’ll detail shortly. Note that there were no real page splits in these insert performance tests because the nearly 8K row size allowed only one row per page. Although significant extent fragmentation occurred, this didn’t impact these single-row requests; extent fragmentation is mostly an issue with sequential scans against spinning media. So neither splits nor fragmentation explain the poor performance of the random inserts.

Temporal reference locality basically means that once data is used (e.g. inserted or touched in any way), it is likely to be used again in the near future. This is why SQL Server uses a LRU-2 algorithm to manage the buffer cache; data most recently touched will remain in memory while older, less often referenced data are aged out. The impact of random key values on temporal locality (i.e. buffer efficiency) is huge. Consider that inserts are basically rewrites of existing pages. When a new row is inserted into a table, SQL Server first reads the page where the row belongs (by key value if the table has a clustered index) and then either adds the row to the existing data page or allocates a new one if there’s not enough space available in the existing page for the new row. With a random key value, the new key value is unlikely to be adjacent to the last one inserted (which is probably still in memory) so the needed page often must be read from storage.

All things being equal, single-row performance will be roughly the same with both sequential and random keys as long as data are memory resident. This is why the random and sequential key insert tests show the same good performance initially. But once the table size exceeded the size of the buffer pool, the random key test showed a precipitous drop in throughput and steady degradation thereafter. In short, random keys diminish temporal reference locality because there is no correlation between time (most recently accessed data) and the key value.

Why Incremental Keys Good

An incremental key value naturally improves temporal reference locality; the next key value is adjacent to the last one inserted and is likely still in memory. An incremental key provides excellent insert performance regardless of table size as the insert performance test shows. Also, applications typically use recently inserted data more often than older data. This allows the same amount of work to done with much less physical I/O than a random key value.

Random Notes about GUIDs

According to the Globally unique identifier Wiki (http://en.wikipedia.org/wiki/Globally_unique_identifier), the random 122 bits of a GUID can generate 2122 unique values. That’s an incomprehensibly large 5.3 x 1036 (or 5,300,000,000,000,000,000,000,000,000,000,000,000) number unique values.

The value returned by NEWSEQUENTIALID and UuidCreateSequential is guaranteed to be unique on a given computer. Furthermore, it is globally unique if the computer has a network card because the MAC address is used as part of the GUID generation algorithm.

Remediating a Table Partitioning Mess

Partitioning often involves exceptionally large tables, which are quite unforgiving when it comes to remediating a mess due to improper partition design or maintenance.  In this post, I’ll discuss the common design mistake of using a RANGE LEFT partition function containing exact date boundaries with a date type that may include a time component (datetime, datetime2 or datetimeoffset) and show how to change it to RANGE RIGHT with minimal data movement.

Background
The most common use case for table partitioning is an incremental value.  Partitioning on a date, datetime, datetime2 or datetimeoffset value that occurs naturally in the data allows incremental load using SPLIT and SWITCH operations and purge/archive using SWITCH and MERGE.  SWITCH is nearly instantaneous regardless of the amount of underlying data involved because only meta-data changes are needed to move partitions from one table to another.  SPLIT and MERGE are also very fast meta-data only operations as long as no data movement is required.  For this reason, it is critical to setup boundaries correctly and plan as to perform SPLIT and MERGE against empty partitions whenever possible.

Note that RANGE LEFT or RANGE RIGHT specification determines:

  • The partition that contains exact boundary value matches
  • The partition is created by SPLIT
  • The partition is removed by MERGE

A common design mistake is using exact date boundaries in a RANGE LEFT function with datetime, datetime2 or datetimeoffset data type.  This error can result in rows in the wrong partition because these data types include a time component and the implications are not obvious, especially to partitioning newbies.  Rows that exactly match the date boundary (i.e. midnight on boundary date) are inserted into the lower partition instead of the higher one as desired.  For example, consider a RANGE LEFT function is a monthly sliding window scenario and boundaries of ‘2013-01-01T00:00:00’ and ‘2013-02-01T00:00:00’:

CREATE PARTITION FUNCTION PF_Monthly(datetime2(0))
    AS RANGE LEFT FOR VALUES('2013-01-01T00:00:00', '2013-02-01T00:00:00');

CREATE PARTITION SCHEME PS_Monthly AS
    PARTITION PF_Monthly TO (December2012, January2013, February2013);

This setup looks reasonable at first glance but consider the actual partitions when a table is created on the partition scheme:

  • Partition 1 (December2012 filegroup):  <= ‘2013-01-01T00:00:00’
  • Partition 2 (January2013 filegroup):  > ‘2013-01-01T00:00:00’ AND <= ‘2013-01-02T00:00:00’
  • Partition 3 (February2013 filegroup):  > ‘2013-02-01T00:00:00’

A row with the January datetime of ‘2013-01-01T00:00:00’ will end up in the first partition (December2012 filegroup).  Sliding window maintenance, which purges the oldest month (December2012), will include the ‘2013-01-01T00:00:00’ row(s) inadvertently.  Similarly, a row with value ‘2013-02-01T00:00:00’ will end up in the second partition (January2013 fielgroup) and require both the second and third partitions to be accessed when data from all of February are queried.

Remediation
There are two ways to address this problem.  One is to change the boundary values of the existing RANGE LEFT function to the maximum inclusive value instead of an exact date. For example, instead of ‘2013-01-01T00:00:00’ for January, 2013, change the boundary values to the maximum value according to the partition function data type as follows:

  • *datetime: ‘2013-01-31T23:59:59.997’
  • datetime2(0): ‘2013-01-31T23:59:59’
  • datetime2(1): ‘2013-01-31T23:59:59.9’
  • datetime2(2): ‘2013-01-31T23:59:59.99’
  • datetime2(3): ‘2013-01-31T23:59:59.999’
  • datetime2(4): ‘2013-01-31T23:59:59.9999’
  • datetime2(5): ‘2013-01-31T23:59:59.99999’
  • datetime2(6): ‘2013-01-31T23:59:59.999999’
  • datetime2(7): ‘2013-01-31T23:59:59.9999999’
  • **datetimeoffset (0): ‘2013-01-31T23:59:59-05:00’’
  • **datetimeoffset (1): ‘2013-01-31T23:59:59.9-05:00’’
  • **datetimeoffset (2): ‘2013-01-31T23:59:59.99-05:00’’
  • **datetimeoffset (3): ‘2013-01-31T23:59:59.999-05:00’’
  • **datetimeoffset (4): ‘2013-01-31T23:59:59.9999-05:00’’
  • **datetimeoffset (5): ‘2013-01-31T23:59:59.99999-05:00’’
  • **datetimeoffset (6): ‘2013-01-31T23:59:59.999999-05:00’’
  • **datetimeoffset(7): ‘2013-01-31T23:59:59.9999999-05:00’’

*Importantly, one needs to specify 997 as the milliseconds value for datetime because a higher millisecond value will be rounded to the next exact date (e.g. ‘2013-02-01T00:00:00.000’)

**The actual offset for datetimeoffset partitioning depends on your environment but is typically the same for each boundary.

Rather than changing the boundary specification, I suggest instead changing the function from LEFT to RIGHT and continue using an exact date boundary.  The inclusive date boundary is more intuitive (in my opinion) and allows the same date boundaries to be used regardless of data type.  The same amount of work is required for a mass change of existing boundaries as is required to change from RANGE LEFT to RANGE RIGHT.

The script below converts a table partitioned using a RANGE LEFT function with exact date boundaries to one partitioned RANGE RIGHT as follows:

  1. create a staging table exactly like the source table and partitioned using the same scheme
  2. create a non-partitioned holding table with the same schema source but without constraints and indexes
  3. create a new RANGE RIGHT function with the same exact date boundaries as the existing RANGE LEFT scheme
  4. create a new partition scheme with the same filegroups as the source table partition scheme but based on the new RANGE RIGHT function
  5. create a new table exactly like the source but partitioned using the new RANGE RIGHT scheme
  6. for each existing partition
    1. switch source partition to staging table
    2. move rows with exact boundary matches to holding table
    3. create check constraint on staging table matching the target right boundaries
    4. switch staging partition to new table
    5. drop check constraint
    6. after all partitions are switched to the new table, insert exact boundary matching rows to new table
    7. drop the old table and staging tables
    8. rename the new table, constraints and indexes to the old names

Since this script uses SWITCH to accomplish the bulk of the work, the table must be aligned; data and indexes must be partitioned using the same (or functionally identical) partition function.  The SWITCH operations are nearly instantaneous and the only data movement required is moving rows that exactly match the boundary values to/from the holding table, which shouldn’t take long unless you have a lot of matching rows.  The long pole in the tent from a performance perspective is likely creating the needed check constraint on the staging table because all rows must be scanned.  However, this method is still considerably faster and requires less space than repartitioning the table by recreating indexes using the new partition scheme.

Here’s the setup script for the problem table and sample data:

--existing RANGE LEFT function with exact date boundaries
CREATE PARTITION FUNCTION PF_Left(datetime)
       AS RANGE LEFT FOR VALUES('20130101','20130201');
--existing scheme based on RANGE LEFT
CREATE PARTITION SCHEME PS_Left
       AS PARTITION PF_Left ALL TO ([DEFAULT]);
--existing table, constraint and index partitioned with RANGE LEFT scheme
CREATE TABLE dbo.OldTable(
       PartitioningColumn datetime NOT NULL
       ,DataColumn int NOT NULL
       ) ON PS_Left(PartitioningColumn);
ALTER TABLE dbo.OldTable
	ADD CONSTRAINT PK_OldTable PRIMARY KEY CLUSTERED (PartitioningColumn, DataColumn) 
	ON PS_Left(PartitioningColumn);
CREATE INDEX idx_OldTable_DataColumn ON dbo.OldTable(DataColumn) 
	ON PS_Left(PartitioningColumn);
--sample data
INSERT INTO dbo.OldTable VALUES
       ('20121231',1)
       ,('20130101',1)
       ,('20130102',2)
       ,('20130103',5)
       ,('20130131',2)
       ,('20130201',4)
       ,('20130202',3);
GO

And the remediation script to change to RANGE RIGHT:

--new RANGE RIGHT function with same boundaries as existing RANGE LEFT function
CREATE PARTITION FUNCTION PF_Right(datetime)
       AS RANGE RIGHT FOR VALUES('20130101','20130201');

--new scheme based on RANGE RIGHT with same filegroups as RANGE LEFT scheme
CREATE PARTITION SCHEME PS_Right
       AS PARTITION PF_Right ALL TO ([DEFAULT]);

--new staging table exactly line existing source table
CREATE TABLE dbo.OldTableStaging(
       PartitioningColumn datetime NOT NULL
	   ,DataColumn int NOT NULL
       ) ON PS_Left(PartitioningColumn);
ALTER TABLE dbo.OldTableStaging
	ADD CONSTRAINT PK_OldTableStaging PRIMARY KEY CLUSTERED (PartitioningColumn, DataColumn) 
	ON PS_Left(PartitioningColumn);
CREATE INDEX idx_OldTableStaging_DataColumn ON dbo.OldTableStaging(DataColumn) 
	ON PS_Left(PartitioningColumn);

--new holding table for exact boundary matches (no constraints or indexes needed)
CREATE TABLE dbo.OldTableStagingExactBoundaryMatches(
       PartitioningColumn datetime NOT NULL
	   ,DataColumn int NOT NULL
       ) ON [DEFAULT];

--new table exactly like existing table but partitioned with RANGE RIGHT scheme
CREATE TABLE dbo.NewTable(
       PartitioningColumn datetime NOT NULL
	   ,DataColumn int NOT NULL
       ) ON PS_Right(PartitioningColumn);
ALTER TABLE dbo.NewTable
	ADD CONSTRAINT PK_NewTable PRIMARY KEY CLUSTERED (PartitioningColumn, DataColumn) 
	ON PS_Right(PartitioningColumn);
CREATE INDEX idx_NewTable_DataColumn ON dbo.NewTable(DataColumn) 
	ON PS_Right(PartitioningColumn);
GO

--script to convert RANGE LEFT to RANGE RIGHT
SET NOCOUNT ON;
DECLARE 
       @LowerBoundary datetime
       ,@UpperBoundary datetime
       ,@PartitionNumber int
       ,@Sql nvarchar(MAX)
       ,@PartitionFunctionName sysname = N'PF_Right';

DECLARE PartitionBoundaries CURSOR LOCAL FAST_FORWARD FOR
WITH 
       PartitionNumbers AS (
              SELECT 1 AS PartitionNumber
              UNION ALL
              SELECT boundary.boundary_id + 1
              FROM sys.partition_functions AS pf
              JOIN sys.partition_range_values AS boundary ON pf.function_id = boundary.function_id
              WHERE pf.name = @PartitionFunctionName
              )
       ,PartitionBoundaries AS (
              SELECT
                     (SELECT CAST(boundary.value AS datetime)
                     FROM sys.partition_functions AS pf
                     JOIN sys.partition_range_values AS boundary ON pf.function_id = boundary.function_id
                     WHERE
                           pf.name = @PartitionFunctionName
                           AND boundary.boundary_id = PartitionNumbers.PartitionNumber - 1
                     ) AS LowerBoundary
                     ,(SELECT CAST(boundary.value AS datetime)
                     FROM sys.partition_functions AS pf
                     JOIN sys.partition_range_values AS boundary ON pf.function_id = boundary.function_id
                     WHERE
                           pf.name = @PartitionFunctionName
                           AND boundary.boundary_id = PartitionNumbers.PartitionNumber
                     ) AS UpperBoundary
                     ,PartitionNumbers.PartitionNumber
              FROM PartitionNumbers
              )
SELECT LowerBoundary, UpperBoundary, PartitionNumber
FROM PartitionBoundaries
ORDER BY PartitionNumber;

OPEN PartitionBoundaries;
WHILE 1 = 1
BEGIN
       --get next partition boundary
       FETCH NEXT FROM PartitionBoundaries INTO @LowerBoundary, @UpperBoundary, @PartitionNumber;
       IF @@FETCH_STATUS = -1 BREAK;

       --move partition from partitioned left table to staging
       RAISERROR('Switching out partition %d', 0, 1, @PartitionNumber) WITH NOWAIT;
       ALTER TABLE dbo.OldTable
              SWITCH PARTITION @PartitionNumber TO dbo.OldTableStaging PARTITION @PartitionNumber;

       --move exact boundary matches to holding table
       DELETE FROM dbo.OldTableStaging WITH(TABLOCKX)
       OUTPUT deleted.* INTO dbo.OldTableStagingExactBoundaryMatches
       WHERE PartitioningColumn = @UpperBoundary;
       RAISERROR('Moved %d exact boundary matches to holding table', 0, 1, @@ROWCOUNT) WITH NOWAIT;

       --create check constraint to match RANGE RIGHT boundary
       SET @Sql = N'ALTER TABLE dbo.OldTableStaging
              ADD CONSTRAINT CK_TestStaging CHECK (';
       IF @LowerBoundary IS NOT NULL
       BEGIN
              SET @Sql = @Sql + N'PartitioningColumn >= ''' + CONVERT(char(8), @LowerBoundary,112) + N'''';
              IF @UpperBoundary IS NOT NULL
              BEGIN
                     SET @Sql = @Sql + N' AND ';
              END;
       END;
       IF @UpperBoundary IS NOT NULL
       BEGIN
              SET @Sql = @Sql + N'PartitioningColumn < ''' + CONVERT(char(8), @UpperBoundary,112) + N'''';
       END;
       SET @Sql = @Sql + ');';
       RAISERROR('Creating check constraint: %s', 0, 1, @Sql) WITH NOWAIT;
       EXEC sp_executesql @Sql;

       --move staging table partition to partitioned right table
       RAISERROR('Switching in partition %d', 0, 1, @PartitionNumber) WITH NOWAIT;
       ALTER TABLE dbo.OldTableStaging
              SWITCH PARTITION @PartitionNumber TO dbo.NewTable PARTITION @PartitionNumber;

       --remove staging table constraint
       RAISERROR('Dropping check constraint', 0, 1, @PartitionNumber) WITH NOWAIT;
       ALTER TABLE dbo.OldTableStaging
              DROP CONSTRAINT CK_TestStaging;

END;
CLOSE PartitionBoundaries;
DEALLOCATE PartitionBoundaries;

--insert exact boundary matches
INSERT INTO dbo.NewTable WITH (TABLOCKX) (PartitioningColumn, DataColumn) 
       SELECT PartitioningColumn, DataColumn FROM dbo.OldTableStagingExactBoundaryMatches WITH (TABLOCKX);
RAISERROR('Inserted %d exact boundary matches from holding table', 0, 1, @@ROWCOUNT) WITH NOWAIT;

IF NOT EXISTS(SELECT * FROM OldTable)
BEGIN
	--drop old table
	DROP TABLE dbo.OldTable;
	--rename table, constraints and indexes to same names as original table
	EXEC sp_rename 'dbo.NewTable.idx_NewTable_DataColumn', 'idx_OldTable_DataColumn', 'INDEX';
	EXEC sp_rename 'dbo.PK_NewTable', 'PK_OldTable';
	EXEC sp_rename 'dbo.NewTable', 'OldTable';
END;
--drop staging tables
DROP TABLE dbo.OldTableStaging;
DROP TABLE dbo.OldTableStagingExactBoundaryMatches;
GO

Service Broker External Activator Example

The Service Broker External Activator (SBEA) allows one to launch any command-line program asynchronously using T-SQL.  I haven’t found many complete end-to-end examples on how to create an SBEA application so this article will demonstrate with a common use-case; launch a SSIS package on demand with the DTEXEC command-line utility.  The package can be run on any machine with SBEA and SSIS installed, which allows you to run the package remotely on a different server if you so desire.

Service Broker External Activator Overview

The latest version of the tool, officially named Microsoft® SQL Server® Service Broker External Activator for Microsoft® SQL Server® 2012, is a free download (http://www.microsoft.com/en-us/download/details.aspx?id=29065).  Service Broker External Activator is also included in the SQL Server 2008 and SQL Server 2008 R2 Feature Packs.

SBEA runs as a Windows service and leverages the SQL Server Service Broker event notification infrastructure to run a command-line application upon receipt of an event notification message.  When an event notification message is received, it launches the command-line program associated with the queue identified in the notification event message.  There is some learning curve if you are not already familiar with Service Broker and event notification concepts but it is a worthwhile investment, not just for SBEA, but for any application that needs asynchronous or message-based processing.

SBEA is an alternative to over other methods that can launch an external program via T-SQL, such as xp_cmdshell, a SQL Agent Job (sp_start_job) or a custom SQLCLR proc.  The xp_cmdshell extended stored procedure allows any ad-hoc command to be run on the database server, limited only by SQL Server service or proxy account permissions.  You need to be mindful of the security implications and take precautions to ensure it can be used only in the way intended.  Also, xp_cmdshell runs the launched program synchronously so the calling application must wait for the application to finish.  You could take special measures to emulate asynchronous behavior with xp_cmdshell but that’s a kludge at best.

A SQL Agent job allows you to run a command asynchronously but the limitation is only one instance of a job to run at a time.  In contrast, SBEA allows one to run multiple application instances concurrently and does so automatically.  SBEA allows a configurable maximum number of application instances so that you can scale automatically to meet peak demands.

SQL Server 2012 introduces the SSISDB catalog, which provides stored procedures to run SSIS packages asynchronously via T-SQL.  However, packages must be stored in the SSISDB database in order to use this feature.  SBEA is an alternative if you are pre SQL Server 2012, need to run packages not stored in the SSIS catalog, or need to run command-line programs besides DTEXEC.EXE.

SBEA Database Objects

As with all Service Broker applications, the minimal database objects needed are an initiator service and queue as well as a target service and queue.  These are used to pass messages to and from the target application (SSIS package in this case).  Additionally, the SBEA needs a notification service, notification queue and event notification so that it is notified when messages are waiting in the target queue for processing.

Stored procedures are a great way to encapsulate Service Broker functionality and implementation details so I use procs here as the application database interface.  Stored procs also simply security since the only database permissions necessary are execute permissions on the procs.  Within the procs, I also log to a table to facilitate monitoring progress, troubleshooting and auditing.  The complete T-SQL script is included at the end of this article.

Below is a summary of the database objects I created for this example:

Service Broker Objects

  • BatchProcessInitiatorService:  SB service that requests package execution (requesting application)
  • BatchProcessInitiatorQueue:  SB queue for success or failure result messages (returned by package after execution)
  • BatchProcessTargetService:  SB service that executes the request (SSIS package)
  • BatchProcessTargetQueue:  SB queue for execution requests, with message containing parameters needed by SSIS package (or an empty message if no parameters are needed)
  • BatchProcessNotificationService:  SB service that is the target of the event notifications (SBEA service)
  • BatchProcessNotificationQueue:  SB queue for event notification messages
  • BatchProcessEventNotification:  SB event notification that monitors the BatchProcessTargetQueue

Stored Procedures

  • dbo.usp_LaunchBatchProcess (called by invoking application): begins conversation to target service and sends message with parameters needed by package
  • dbo.usp_GetBatchProcessParemeters (called by SSIS package at start): receives next message from target queue and logs to table
  • dbo.usp_CompleteBatchProcess (called by SSIS package at completion): returns result message to initiator using END CONVERSATION (or END CONVERSATION WITH ERROR)
  • dbo.usp_LogBatchProcessResult (initiator queue activated proc): receives result messages from initiator queue, logs results to table and ends the conversation

Table

  • dbo.BatchProcessLog: Each received message is logged to this table

SBEA Configuration File

Service Broker External Activator is configured using the xml configuration file (EAService.config) in the application Config folder (e.g. C:\Program Files\Service Broker\External Activator\Config\).  This configuration file specifies:

  • The name of the notification service
  • Connection string used to connect to the database
  • A list of one or more Service Broker services along with the command-line application and arguments to launch for each

Below is the example I use for this sample with the relevant values in bold.  As a side note, SBEA currently supports only one notification service even though the parent element name (NotificationServiceList) implies one can have many.  However, I haven’t personally run into a case where this is an issue since, in a distributed environment, one can designate a single database for the SBEA messaging infrastructure.

<?xml version="1.0" encoding="utf-8"?>
<Activator xmlns="http://schemas.microsoft.com/sqlserver/2008/10/servicebroker/externalactivator"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://schemas.microsoft.com/sqlserver/2008/10/servicebroker/externalactivator EAServiceConfig.xsd">
<NotificationServiceList>
<NotificationService name="BatchProcessNotificationService" id="100" enabled="true">
<Description>My test notification service</Description>
<ConnectionString>
<!-- All connection string parameters except User Id and Password should be specificed here -->
<Unencrypted>server=myserver;database=SBEA_Example;Application Name=External Activator;Integrated Security=true;</Unencrypted>
</ConnectionString>
</NotificationService>
</NotificationServiceList>
<ApplicationServiceList>
<ApplicationService name="BatchProcessTargetService" enabled="true">
<OnNotification>
<ServerName>MyServer</ServerName>
<DatabaseName>SBEA_Example</DatabaseName>
<SchemaName>dbo</SchemaName>
<QueueName>BatchProcessTargetQueue</QueueName>
</OnNotification>
<LaunchInfo>
<ImagePath>C:\Program Files\Microsoft SQL Server\110\DTS\Binn\DTEXEC.EXE</ImagePath>
<CmdLineArgs>/F C:\Packages\Package.dtsx</CmdLineArgs>
<WorkDir>c:\temp</WorkDir>
</LaunchInfo>
<Concurrency min="1" max="1" />
</ApplicationService>
</ApplicationServiceList>
<LogSettings>
<LogFilter>
<TraceFlag>All Levels</TraceFlag>
 <TraceFlag>All Modules</TraceFlag>
 <TraceFlag>All Entities</TraceFlag>
 <TraceFlag>Verbose</TraceFlag>
<!--
http://schemas.microsoft.com/sqlserver/2008/10/servicebroker/externalactivator/EAServiceConfig.xsd
<xs:enumeration value="Error"/>
<xs:enumeration value="Warning"/>
<xs:enumeration value="Information"/>
<xs:enumeration value="Verbose"/>
<xs:enumeration value="All Levels"/>
<xs:enumeration value="Application Monitor"/>
<xs:enumeration value="Configuration Manager"/>
<xs:enumeration value="Global"/>
<xs:enumeration value="Recovery Log"/>
<xs:enumeration value="Imported Methods"/>
<xs:enumeration value="Notification Service"/>
<xs:enumeration value="Service Broker"/>
<xs:enumeration value="All Modules"/>
<xs:enumeration value="Asynchronous Updates"/>
<xs:enumeration value="Threads"/>
<xs:enumeration value="All Entities"/>
-->
</LogFilter>
</LogSettings>
</Activator>

Service Broker External Activator Processing

When the SBEA service starts, it validates the application service list against Service Broker meta-data to ensure the specified services and queues exist and are properly associated.  Then SBEA starts monitoring the notification queue associated with the specified notification service.  When a queue notification event message is received, it matches the server, database, schema and queue contained in the message payload against the application services listed.  When matched, it launches the command-line application associated with that service.  A warning message () is logged if a queue notification message is received that doesn’t have a corresponding application.

SBEA logs trace messages to a file named EATrace.log in the Log folder (C:\Program Files\Service Broker\External Activator\Log).  Logging levels are configurable.  I suggest you start with verbose logging initially (as in the same config) and adjust to less logging once your application is stable.  I included the config schema reference and various logging options as comments in the sampe config for your convenience.  Be aware that trace messages are appended to the file so you’ll want to periodically delete (or archive) the file.

SBEA Application Design and Message Flow

There are a many different patterns one can generally employ with Service Broker bi-directional message processing.  Here, I use a single-message-per-conversation pattern to pass parameters needed by the package as well-formed XML.  The target (SSIS package) does its work and ends its side of the conversation upon completion, indicating success or failure on the END CONVERSATION statement, which generates an EndDialog or Error message that is returned via the initiator queue.

Service Broker automatically executes the initiator queue activated proc when a message is waiting on the queue.  Acting on behalf of the initiating application, which might not be running due to asynchronous execution of the package, the activated proc ends the other side of the conversation upon receipt of the EndDialog or Error message.  Both sides of the conversation are ended, a best practice with this single-message-per-conversation pattern.

The following sequence describes the interaction and flow of messages between components:

  1. Initiating application executes dbo.usp_LaunchBatchProcess to launch console application.  Proc executes BEGIN DIALOG and SEND to insert the parameter message into the target queue.
  2. SBEA executes RECEIVE of the event notification message, indicating a message is waiting in the target queue
  3. SBEA launches the console application associated with the event notification
  4. The console application executes dbo.usp_GetBatchProcessParameters to get parameter values.  Proc executes RECEIVE of message from target queue and logs the message.
  5. The console application executes dbo.usp_CompleteBatchProcess at completion.  Proc executes END CONVERSATION to signal successful completion (or END CONVERSATION WITH ERROR to indicate an error occurred).
  6. Service Broker launches activated stored procedure dbo.usp_LogBatchProcessResult in response to the EndDialog message.  Proc execute RECEIVE from initiator queue and logs the result to a table.

The diagram below shows this message flow.SBEAApplicationMessageFlow

SBEAApplicationMessageFlow

Example Script

USE master;
GO

---------------------------------------------------
--- create database with Service Broker enabled ---
---------------------------------------------------
ALTER DATABASE SBEA_Example
SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
GO

DROP DATABASE SBEA_Example;
GO

CREATE DATABASE SBEA_Example;
ALTER AUTHORIZATION ON DATABASE::SBEA_Example TO sa;
GO

ALTER DATABASE SBEA_Example
SET ENABLE_BROKER;
GO

-------------------------------
--- create database objects ---
-------------------------------
USE SBEA_Example;
GO

--log table
CREATE TABLE dbo.BatchProcessLog(
ConversationHandle uniqueidentifier NOT NULL
,MessageTypeName sysname NOT NULL
,MessageBody varbinary(MAX) NULL
,LogTime datetime2(3) NOT NULL
CONSTRAINT DF_ServiceBrokerLog_LogTime
DEFAULT (SYSDATETIME())
);
CREATE CLUSTERED INDEX cdx_BatchProcessLog ON dbo.BatchProcessLog(LogTime);
GO

CREATE PROC dbo.usp_LogBatchProcessResult
---------------------------------------------
--initiator queue activated proc to process messages
---------------------------------------------
AS
DECLARE
@conversation_handle uniqueidentifier
,@message_type_name sysname
,@message_body varbinary(MAX);
WHILE 1 = 1
BEGIN
WAITFOR (
RECEIVE TOP (1)
@conversation_handle = conversation_handle
,@message_type_name = message_type_name
,@message_body = message_body
FROM dbo.BatchProcessInitiatorQueue
), TIMEOUT 1000;
IF @@ROWCOUNT = 0
BEGIN
--exit when no more messages
RETURN;
END;

--log message
INSERT INTO dbo.BatchProcessLog(
ConversationHandle
,MessageTypeName
,MessageBody
)
VALUES(
@conversation_handle
,@message_type_name
,@message_body
);
END CONVERSATION @conversation_handle;
END;
GO

CREATE PROC dbo.usp_LaunchBatchProcess
@Parameter1 int
---------------------------------------------
--called by application to trigger batch process
--Sample Usage:
--
-- EXEC dbo.usp_LaunchBatchProcess @@Parameter1 = 1;
---------------------------------------------
AS
DECLARE
@conversation_handle uniqueidentifier
,@message_body varbinary(MAX);

BEGIN TRY

BEGIN TRAN;

BEGIN DIALOG CONVERSATION @conversation_handle
FROM SERVICE BatchProcessInitiatorService
TO SERVICE 'BatchProcessTargetService'
ON CONTRACT [DEFAULT]
WITH
ENCRYPTION = OFF,
LIFETIME = 6000;

SET @message_body = CAST(
N''
+ CAST(@Parameter1 AS nvarchar(10))
+ N'' AS varbinary(MAX));

SEND ON CONVERSATION @conversation_handle (@message_body);

COMMIT;
END TRY
BEGIN CATCH
THROW;
END CATCH;

RETURN @@ERROR;
GO

CREATE PROC dbo.usp_GetBatchProcessParameters
--------------------------------------
--called by batch package at start ---
--------------------------------------
AS
DECLARE
@conversation_handle uniqueidentifier
,@message_body xml
,@message_type_name sysname
,@parameter1 int;

BEGIN TRY

BEGIN TRAN;

RECEIVE TOP(1)
@conversation_handle = conversation_handle
,@message_type_name = message_type_name
,@message_body = message_body
FROM dbo.BatchProcessTargetQueue;

IF @@ROWCOUNT = 0
BEGIN
RAISERROR ('No messages received from dbo.BatchProcessTargetQueue', 16, 1);
RETURN 1;
END;

INSERT INTO dbo.BatchProcessLog(
ConversationHandle
,MessageTypeName
,MessageBody
)
VALUES(
@conversation_handle
,@message_type_name
,CAST(@message_body AS varbinary(MAX))
);

SET @parameter1 = @message_body.query('/Parameters/Parameter1').value('.', 'int');

COMMIT;

SELECT
@conversation_handle AS ConversationHandle
,@parameter1 AS Parameter1;

END TRY
BEGIN CATCH
THROW;
END CATCH;

RETURN @@ERROR;
GO

CREATE PROC dbo.usp_CompleteBatchProcess
@ConversationHandle uniqueidentifier
,@ErrorMessage nvarchar(3000) = NULL
------------------------------------------
-- called by SSIS package at completion
-- Sample Usage:

-- normal completion:
-- EXEC dbo.usp_CompleteBatchProcess
-- @ConversationHandle = '00000000-0000-0000-0000-000000000000';

-- completed with error:
-- EXEC dbo.usp_CompleteBatchProcess
-- @ConversationHandle = '00000000-0000-0000-0000-000000000000'
-- @ErrorMessage = 'an error occurred;
------------------------------------------
AS

IF @ErrorMessage IS NULL
BEGIN
END CONVERSATION @ConversationHandle;
END
ELSE
BEGIN
END CONVERSATION @ConversationHandle
WITH ERROR = 1
DESCRIPTION = @ErrorMessage;
END;

RETURN @@ERROR;
GO

--initiator queue with activated proc to process batch completed messages
CREATE QUEUE dbo.BatchProcessInitiatorQueue
WITH STATUS = ON,
ACTIVATION (
PROCEDURE_NAME = dbo.usp_LogBatchProcessResult,
MAX_QUEUE_READERS = 1,
EXECUTE AS SELF );
GO

--initiator service that triggers batch process
CREATE SERVICE BatchProcessInitiatorService
ON QUEUE dbo.BatchProcessInitiatorQueue
([DEFAULT]);
GO

--queue for event notifications
CREATE QUEUE dbo.BatchProcessNotificationQueue;
GO

--service for event notifications
CREATE SERVICE BatchProcessNotificationService
ON QUEUE dbo.BatchProcessNotificationQueue
(
[http://schemas.microsoft.com/SQL/Notifications/PostEventNotification]
);
GO

--target queue for batch process parameters
CREATE QUEUE dbo.BatchProcessTargetQueue;
GO

--target service for batch process parameters
CREATE SERVICE BatchProcessTargetService
ON QUEUE dbo.BatchProcessTargetQueue
([DEFAULT]);
GO

--event notification for target queue
CREATE EVENT NOTIFICATION BatchProcessTargetNotification
ON QUEUE dbo.BatchProcessTargetQueue
FOR QUEUE_ACTIVATION
TO SERVICE 'BatchProcessNotificationService' , 'current database';
GO

Summary

The Service Broker External Activator is a powerful way to launch command-line programs asynchronously.  There is a learning curve, which can be steep if you know nothing about the SQL Server Service Broker feature.  But I hope this doesn’t discourage you from exploring and using SBEA.  It may be the right tool for the job in cases where you need asynchronous batch processing, avoiding custom programming or kludge design.  It’s always a good thing to have another tool in your toolbox.

 

 

 

Secrets of Foreign Key Index Binding

You might be surprised to learn that foreign keys bind to physical indexes when they are created. Furthermore, a foreign key does not necessarily bind to the primary key index of the referenced table; SQL Server allows a foreign key to refer to any column(s) that are guaranteed to be unique as enforced by a primary key constraint, unique constraint or unique index.

In this post, I’ll discuss the undocumented rules SQL Server uses to bind foreign key constraints to referenced table indexes so that you can achieve performance goals and protect yourself against unexpected errors in DDL modification scripts.

Background

Typically, one references the primary key in foreign key relationships. I’ve seen a foreign key (deliberately) reference columns other than the primary key only a couple of times in my career. The foreign key referenced an alternate key with a unique constraint in those cases. Why one would create such a relationship is an exercise for the reader. I’ll focus on the primary key here, although the same considerations apply to foreign keys referencing alternate keys.

As I mentioned earlier, SQL Server binds a foreign key to a physical unique index. This binding performance implications because it determines the index SQL Server uses to enforce referential integrity as child table rows are inserted or updated. Also, SQL Server will not allow the index bound to a foreign key to be dropped since that could allow duplicate rows in the parent table and thus break the unique side of the relationship. This must be considered when developing scripts that drop unique indexes (including primary key and unique constraints) that may be bound to foreign keys.

A foreign key referencing the primary key will always be bound to the primary key index when that is the only unique index on the foreign key column(s). However, you might have additional unique indexes on the primary key column(s) for performance reasons. For example, consider the case of a clustered primary key. Performance of a frequently executed query may be improved with a covering non-clustered index:

–create parent table

CREATE TABLE dbo.ParentTable(

       ParentTableID int NOT NULL IDENTITY

              CONSTRAINT PK_ParentTable PRIMARY KEY CLUSTERED

       ,Column1 int NOT NULL

       ,Column2 varchar(100) NOT NULL

       );

GO

 

–create a non-clustered covering index

CREATE UNIQUE NONCLUSTERED INDEX idx_ParentTable_ParentTableID

       ON dbo.ParentTable(ParentTableID) INCLUDE(Column1);

GO

 

INSERT INTO dbo.ParentTable VALUES(1, ‘some data’);

INSERT INTO dbo.ParentTable VALUES(2, ‘some data’);

INSERT INTO dbo.ParentTable VALUES(3, ‘some data’);

GO

 

–create child table

CREATE TABLE dbo.ChildTable(

       ChildTableID int NOT NULL IDENTITY

            CONSTRAINT PK_ChildTable PRIMARY KEY CLUSTERED

       ,ParentTableID int NOT NULL

              CONSTRAINT FK_ChildTable_ParentTable

                     FOREIGN KEY REFERENCES dbo.ParentTable(ParentTableID)

       );

GO

 

INSERT INTO dbo.ChildTable VALUES(1);

INSERT INTO dbo.ChildTable VALUES(1);

INSERT INTO dbo.ChildTable VALUES(1);

INSERT INTO dbo.ChildTable VALUES(1);

INSERT INTO dbo.ChildTable VALUES(2);

INSERT INTO dbo.ChildTable VALUES(2);

INSERT INTO dbo.ChildTable VALUES(2);

INSERT INTO dbo.ChildTable VALUES(2);

INSERT INTO dbo.ChildTable VALUES(3);

INSERT INTO dbo.ChildTable VALUES(3);

INSERT INTO dbo.ChildTable VALUES(3);

INSERT INTO dbo.ChildTable VALUES(3);

GO

 

UPDATE STATISTICS dbo.ParentTable;

UPDATE STATISTICS dbo.ChildTable;

GO

 

–show the foreign key index binding

SELECT

    fki.name

FROM sys.foreign_keys AS f

JOIN sys.indexes AS fki ON

      fki.object_id = f.referenced_object_id     

      AND fki.index_id = f.key_index_id

WHERE

      f.object_id = OBJECT_ID(N’dbo.FK_ChildTable_ParentTable’);

GO

 

–this query uses the covering index instead of clustered PK index

SELECT p.ParentTableID, p.Column1

FROM dbo.ParentTable AS p

WHERE p.ParentTableID IN(1,2,3);

GO

 

 

The SELECT query in the above script uses the covering idx_ParentTable_ParentTableID index. While this is good for performance, it introduces ambiguity regarding index binding to the foreign key. Again, any primary key constraint, unique constraint or index on the referenced column(s) may be referenced by a foreign key. With two candidate unique indexes (PK_ParentTable and idx_ParentTable_ParentTableID), you have little control which index is bound to the foreign key. 

SQL Server chooses the index binding based on rules that vary by version so you will get different binding depending on your version of SQLServer. SQL Server 2005 chooses the clustered index when possible and, if no suitable clustered index exists, the first (lowest index_id) unique non-clustered index on the referenced column(s) is used. The sample script above binds the foreign key to the PK_WideTable index under SQL Server 2005 because it is the clustered index, not because it is the primary key.

In later versions (SQL 2008, SQL 2008R2 and SQL 2012), the foreign key is bound to the unique non-clustered index on the referenced column(s) with the lowest index_id when possible. Only when no suitable unique non-clustered index exists is the unique clustered index chosen. So the foreign key in the above script is bound to idx_ParentTable_ParentTableID in SQL 2008 and later versions instead of the primary key index as one might expect.

Why Foreign Key Index Binding is Important

There are two reasons why it is important to control the index bound to a foreign key. One is performance. As I mentioned earlier, the index bound to the foreign key constraint is used at execution time to enforce the constraint as child table rows are inserted or the foreign key column(s) updated. If the parent table is large and not queried often but rows are inserted into the child table heavily, a unique non-clustered index that “covers” the referential integrity check may be more desirable than the clustered index. This can improve buffer efficiency and page life expectancy compared to using a clustered index (e.g. primary key). My assumption is that this is why SQL Server 2008 and later versions prefer the unique non-clustered index over the clustered index for constraint enforcement.

Another reason one should control the index bound to the foreign key is to facilitate index changes. If you try to drop an index bound to a foreign key, you’ll get an error like “An explicit DROP INDEX is not allowed on index ‘dbo.ParentTable.idx_ParentTable_ParentTableID ‘. It is being used for FOREIGN KEY constraint enforcement.” You’ll need to drop the foreign key first and recreate after dropping the index.

Since one can’t specify the bound foreign key index declaratively, the only guaranteed way to control the binding is to create the foreign key when only the desired unique index exists and create additional indexes afterward. This isn’t to say you can’t rely on the rules described earlier but you need to be aware that such rules vary depending on the SQL Server version and could change in the future. 

 

RIP OLE DB

I was very surprised when Microsoft announced deprecation of OLE DB provider for SQL Server data access last week on the Data Access Blog and MSDN Forums Announcement. The next release of SQL Server, code-named “Denali”, will be the last to ship a new SQL Server Native Client OLE DB provider. The SQL Server Native Client OLE DB driver will continue to be supported for 7 years after the Denali release so we have plenty of time to plan accordingly.

The other Microsoft-supplied OLE DB driver for SQL Server, SQLOLEDB, has been deprecated for many years now. The deprecated SQLOLEDB driver (and deprecated SQLSRV32.DLL ODBC driver) is part of the older MDAC package and is currently included in Windows operating systems as part of Windows Data Access Components for backwards compatibility. Windows 7 is the last Windows version that will include a SQL Server OLE DB and ODBC driver out of the box. Microsoft recommends that we use the SQL Server Native Client ODBC driver as the SQL Server data access technology of choice from native code going forward.

What This Means to You

Avoid using OLE DB for new SQL Server application development. Update your technology roadmap to move towards migrating existing SQL Server applications that use the SQLNCLI, SQLNCLI10, SQLNCLI11 or SQLOLEDB OLE DB providers to the SQL Server Native Client ODBC driver.

Note that much is still unknown since current versions of SQL Server rely heavily on OLE DB. Although this is purely speculation on my part, it stands to reason that we will see improved ODBC support across all Microsoft products and SQL Server features that currently rely on OLE DB for relational data access.

New SQL Server Development

Use one of the following SQL Server relational database access technologies for new development:

·         Managed code (e.g. C#, VB.NET, managed C++): Use Sysem.Data SqlClient. SqlClient is part of the .NET framework and is the preferred way to access SQL Server from managed code (C#, VB.NET, managed C++). The only reason I can think why not to use SqlClient from managed code is if an application needs to also support other DBMS products using the same interface without coding an additional abstraction layer. In that case accessing different database products Sysem.Data.Odbc is an alternative.

·         Native code (e.g. unmanaged C++): Use ODBC with the Server Native Client driver. The ODBC call-level interface can be used directly or via the higher-level ADO API. The SQL Server Native Client ODBC driver is included with SQL Server and also available as a separate download. 

Migrating Existing Applications

I sometimes see existing managed applications use ADO (e.g. ADODB.Connection) instead of SqlClient. ADO is a COM-based API primarily intended to be used from native code rather than managed code. Typically, these applications were either converted from VB 6 or the developer used ADO instead of ADO.NET due to unfamiliarity with the ADO.NET object model.  This is a good opportunity to convert such code to use System.Data.SqlClient, which will perform better than OLE DB or ODBC from managed code. 

If you have an ADO application where performance is not a concern or the conversion is not worth the effort, an alternative is to simply change the provider to MSDASQL (OLE DB Provider for ODBC Drivers) and add the SQL Server Native Client ODBC driver specification. This can be done with a simple connection string change and the MSDASQL provider will translate the ADO OLE DB calls to ODBC. For example, to use the SQL Server 2008 SNAC ODBC driver:

Old OLE DB connection string: “Provider=SQLNCLI10.1;Data Source=MyServer;Integrated Security=SSPI”

New ODBC connection string: “Provider=MSDASQL;Driver={SQL Server Native Client 10.0};Server=MyServer;Trusted_Connection=Yes”

 

The same connection string change can be used for any ADO application, including ASP classic, legacy VB 6 or unmanaged C++.

Perhaps the biggest challenge will be native code that uses the OLE DB COM interfaces directly instead of going through higher level APIs like ADO. I’ve seen this most commonly done for performance sensitive applications in C++. The best approach here will be to convert the application to use the ODBC call-level interface directly. This will provide the highest SQL Server data access performance from native code. The difficulty of such a change will depend much on the application object model and design. Ideally, data access libraries are shared and abstracted so that low-level data access code changes only need to be made in one place.

Why SQLOLEDB and SQLNCLI Was Deprecated

If you’ve used SQL Server for a long time like me, you’ve seen a number of APIs come and go (http://blogs.msdn.com/b/data/archive/2006/12/05/data-access-api-of-the-day-part-i.aspx). APIs are largely driven by changes in development and platform technologies that change over time. It is possible for Microsoft to support legacy APIs indefinitely but doing so would waste precious development resources on maintenance instead of adding new features that are important to us. COM-based APIs like OLE DB are complex and it just doesn’t make sense to have many APIs that basically do the same thing. 

So we now have the short list of SQL Server relational data access APIs going forward:

·         SqlClient (managed code)

·         JDBC (Java)

·         ODBC (for native code)

Summary

I’m a big fan of open, cross-platform standards so I’m glad that Microsoft chose ODBC over OLE DB for relational database access. ODBC is an implementation of the SQL call-level interface standard (ISO/IEC 9075-3). In contrast, the COM-based OLE DB SQL Server provider relies on proprietary Microsoft Windows COM technology. The SNAC ODBC driver is a truly native driver and provides the fastest SQL Server database access from native code.

 

Denali CTP3: THROW Statement

Not to mince words, T-SQL error handling has historically sucked. I’m excited that SQL Server “Denali” CTP3 (a.k.a. SQL11) includes a long-awaited THROW statement that I hope to see in the final release. In this post, I’ll dive into how this seemingly minor T-SQL enhancement will make it much easier for T-SQL developers to write robust and bug-free error handling code.

T-SQL Error Handling Ugliness

Unlike compiled application code that halts code execution upon an unhandled exception, a T-SQL might continue code execution afterward. T-SQL developers must include error checking/handling is to ensure code doesn’t continue down the “happy” path oblivious to an error, report the error to the caller, perform any necessary cleanup operations (typically ROLLBACK) and continue/halt execution as desired. The script below shows how one might accomplish this without structured error handling:

–Unstructured error handling example

BEGIN TRAN

SELECT 1/0 AS CauseAnError –report error caller

IF @@ERROR<> 0 GOTO ErrorHandler –detect error

COMMIT

GOTO Done
ErrorHandler:

IF @@TRANCOUNT> 0 ROLLBACK–cleanup after error

RETURN –stop further code execution

Done:

PRINT ‘Done’–not executed after error

GO
This script results in the error:

Msg 8134, Level 16, State 1, Line 3

Divide by zero error encountered.
Unstructured error handling like this is especially a pain for multi-statement scripts and stored procedures. One has to include repetitive “IF @@ERROR” check to detect errors after each statement and error-prone unstructured GOTO code. It’s easy to miss error checking/handling bugs in unit testing.

On a positive note, no T-SQL code is necessary to raise the error; SQL Server automatically reports errors to the calling application without any T-SQL code to do so (unless TRY/CATCH is used). This guarantees the calling application is notified of errors during execution.

Two Steps Forward, One Step Back

The introduction of structured error handling (TRY/CATCH) in SQL 2005 is a both a blessing and a curse. The good is that TRY/CATCH avoids the repetitive, error prone and ugly procedural code needed to check @@ERROR after each T-SQL statement and allows one to more easily centralize error handling. The structured error-handling paradigm in T-SQL is more aligned with most application languages.

Consider the equivalent script with TRY/CATCH:

–Structured error handling example

DECLARE

@ErrorNumber int

,@ErrorMessage nvarchar(2048)

,@ErrorSeverity int

,@ErrorState int

,@ErrorLine int;

BEGIN TRY–detect errors

BEGIN TRAN;

SELECT 1/0 AS CauseAnError;

COMMIT;

END TRY

BEGIN CATCH

SELECT

@ErrorNumber =ERROR_NUMBER()

,@ErrorMessage =ERROR_MESSAGE()

,@ErrorSeverity = ERROR_SEVERITY()

,@ErrorState =ERROR_STATE()

,@ErrorLine =ERROR_LINE();

IF @@TRANCOUNT> 0 ROLLBACK; –cleanup after error

RAISERROR(‘Error %d caught at line %d: %s’–report error to caller

,@ErrorSeverity

,@ErrorState

,@ErrorNumber

,@ErrorLine

,@ErrorMessage);

RETURN;–stop further code execution

END CATCH

PRINT ‘Done’; –not executed after error

GO

Msg 50000, Level 16, State 1, Line 21

Error 8134 caught at line 10: Divide by zero error encountered
I really like the way structured error handling catches errors declaratively with centralized error handling. But TRY/CATCH introduces a couple of issues. Foremost is reporting of the error to the caller. A caught error prevents the error message from being returned to the client. When TRY/CATCH is employed, the developer assumes responsibility to notify the application that an error occurred. Failure to do so will result in a silent error undetectable by the calling application, which is seldom desirable. Using TRY/CATCH necessitates that you write a bit of code in the CATCH block to capture, report and/or log error details as well as control code flow after the error.

Another downside of TRY/CATCH before Denali is that you cannot raise the original error because RAISERROR does not allow a system error number to be specified (8134 in this example). Consequently, the divide by zero system error here cannot be raised in the CATCH block; a user-defined error in the 50000+ error number range must be raised instead, obfuscating the original error and line number. So instead of returning error information natively, you must write code to return original error details by some other means, such as in the error message text. This often leads to inconsistencies in the way errors are reported.

THROW to the Rescue

Denali introduces a simple THROW statement. THROW in a CATCH block with no parameters raises the caught error and stops further code execution unless an outer CATCH block exists. This greatly simplifies CATCH block error reporting and control flow code since this THROW behavior is exactly what one typically does after handling a T-SQL error. Furthermore, unlike RAISERROR, THROW retains the original error number, message text, state, severity and line number. This is the biggest T-SQL error handling enhancement since the introduction of TRY/CATCH in SQL Server 2005.

The THROW example below raises the original error and stops further code execution and is less verbose and error-prone than other methods:

–Structured error handling example in Denali CTP3

BEGIN TRY–detect errors

BEGIN TRAN;

SELECT 1/0 AS CauseAnError;

COMMIT;

END TRY

BEGIN CATCH

IF @@TRANCOUNT> 0 ROLLBACK; –cleanup after error

THROW; –report error to caller and stop further code execution

END CATCH

PRINT ‘Done’; –not executed after error

GO

Msg 8134, Level 16, State 1, Line 4

Divide by zero error encountered.
There are only a couple of scenarios I can think of not to use THROW in a CATCH block. One is when you need to continue code execution in the same scope after an error. Another is in an outermost catch block when you want to prevent the error from being returned to the client. However, these cases are the exception (no pun intended) rather than the rule.

Summary

THROW is a simple, yet powerful extension to SQL Server error handling. I’ll discuss some other enhancements to the core database engine as outlined in the What’s New section of the SQL Server “Denali” Books Online in future posts as well.

Internal SQL Server Database Version Numbers

A database created by a more recent version of SQL Server cannot be attached or restored to an earlier version. This restriction is simply because an older version cannot know about file format changes that were introduced in the newer release. 
If you attempt to attach a database to an earlier version, you will get SQL Server error 948 with the internal version numbers listed in the error message text. For example, the following error occurs if you try to attach a SQL Server 2008 R2 database to a SQL Server 2008 server:

The database ‘MyDatabase’ cannot be opened because it is version 665. This server supports version 661 and earlier. A downgrade path is not supported.

Sample text from SQL Server error 948
The cryptic version numbers in the error message refer to the internal database version. These internal version numbers are undocumented but are (at least currently) the same value reported by the DATABASEPROPERTYEX function ‘Version’ property of the source database. If you are unsure of the source database version, the table below maps the internal version numbers to SQL Server versions so you can determine the minimum version you need for the attach to succeed:

SQL Server Version

Internal Database Version

SQL Server 2008 R2

665

SQL Server 2008

661

SQL Server 2005 SP2+ with vardecimal enabled

612

SQL Server 2005

611

SQL Server 2000

539

SQL Server 7

515

SQL Server versions and internal database versions
Below are the allowable SQL Server upgrade paths for a database attach or restore. The internal database version will be as above after a successful attach or restore.

Target SQL Server Version

Source SQL Server Version

Internal Database Version

SQL Server 2008 R2

SQL Server 2008 R2

665

SQL Server 2008

661

SQL Server 2005 with vardecimal enabled

612

SQL Server 2005

611

SQL Server 2000

539

SQL Server 2008

SQL Server 2008

661

SQL Server 2005 with vardecimal enabled

612

SQL Server 2005

611

SQL Server 2000

539

SQL Server 2005 SP2+

SQL Server 2005 with vardecimal enabled

612

SQL Server 2005

611

SQL Server 2000

539

SQL Server 7

515

SQL Server 2005

SQL Server 2005

611

SQL Server 2000

539

SQL Server 7

515

SQL Server 2000

SQL Server 2000

539

SQL Server 7

515
SQL Server 7

SQL Server 7

515

Database File Versions and Upgrade Paths
As I mentioned earlier, downgrades are not supported. You’ll need to copy objects and data from the newer source database to the older target if you need to downgrade; attach or restore is not an option to copy a database to an earlier version.