Maximizing Performance with Table-Valued Parameters

I’ve seen TVPs improve performance by orders of magnitude when used appropriately.  This feature isn’t used as often as it should be so I’ll show how to implement a TVP using a C# application example and discuss TVP considerations.

Why Table-Valued Parameters Improve Performance
TVPs allow one to pass multiple rows or values at once to a query or stored procedure.  By doing so, the query can leverage set-based operations to greatly improve performance compared to executing the query/proc many times.  TVPs decrease network latency by reducing network round trips.  Prior to TVPs, one had to employ workarounds like passing a string containing a delimited list of records (or XML) and parsing on the SQL side.  Those techniques are at best a kludge.  Instead TVPs are the right tool for the job in SQL Server 2008 and later.

TVPs (and table variables) are temp tables.  TVPs and table variables are persisted in tempdb much the same way as regular temp tables (# prefix).  The big differences are that TVPs and table variables are limited in scope to the current batch and do not have statistics.  Also, table parameters/variables allow only indexes declared via primary key nor unique constraints; non-unique indexes are not supported.

When a TVP Is passed to SQL Server from a client application, the data is bulk-inserted into tempdb where it is made available to the SQL statement or stored procedure using the declared TVP.  This bulk insert is handled by the client API transparently to the application and allows a large number of rows to be passed to SQL Server efficiently.  In T-SQL, one can declare and load a table variable, where it can be used directly in a SQL statement or passed as a TVP to a stored procedure or query (invoked with sp_executesql).

The exact threshold where TVPs outperform single-row operations is much lower than you might expect.  Your mileage may vary but, at least in my experience, it’s only a few rows.  The performance tests I ran here show that the threshold where a TVP outperforms individual requests with a simple list of customers is only 4 rows.  That said, I would not recommend TVPs for single-row operations due to the start-up overhead but it’s a good choice when the most common use case is multiple rows.

How to Use TVPs
In order to pass a TVP to a SQL statement or stored procedure, one must first create a user-defined table type in SQL Server.  The table type defines the schema of the table SQL Server expects for the TVP.  Listing 1 shows the CREATE DDL for the table type I use in the AdventureWorks2012 database to pass a list of customers to the stored procedure in Listing 2 that returns all orders for these customers.  Note that the stored procedure TVP must be declared as READONLY.

Listing 1: Table type DDL

CREATE TYPE dbo.CustomerList AS TABLE(
CustomerID int NOT NULL
PRIMARY KEY
);

Listing 2: Stored procedure to return orders for multiple customers

CREATE PROC dbo.usp_select_orders_for_multiple_customers
@CustomerList dbo.CustomerList READONLY
AS
SELECT soh.CustomerID, soh.SalesOrderID
FROM Sales.SalesOrderHeader AS soh
JOIN @CustomerList AS cl ON
cl.CustomerID = soh.CustomerID;
RETURN @@ERROR;

TVPs are passed from a .NET application using parameter data type DbType.Structured.  The actual parameter value can be an object of type DataTable, DbDataReader or IEnumberable<SqlDataRecord>.  A DataTable is appropriate when the required TVP data is already in a DataTable object needed for other purposes.  A DbDataReader is a good choice when the source data is the result of a query (note that any DbDataReader may be used, not just a SqlDataReader).   For other cases, I recommend an IEnumerable<SqlDataRecord>.  It is very easy to implement IEnumerable<SqlDataRecord>, especially if you already have an enumerable object like an array or collection containing the values you need to pass.  See the code at the end of this article for an example class that implements IEnumerable<SqlDataRecord> for a TVP.

A benefit with DbDataReader and IEnumerable<SqlDataRecord> is that the source data can be streamed to SQL server without first loading all the TVP data into memory in either the client application or in SQL Server.  This is a big consideration when a lot of data needs to be passed because you are not constrained by memory on ether the client or SQL Server side.

Listing 3 shows the stored procedure I use to compare performance with the TVP method.  This proc takes only a single @CustomerID parameter so it must be called once for each customer.

Listing 3: Stored procedure to return orders for a single customer

CREATE PROC dbo.usp_select_orders_for_single_customer
@CustomerID int
AS
SELECT soh.CustomerID, soh.SalesOrderID
FROM Sales.SalesOrderHeader AS soh
WHERE
soh.CustomerID = @CustomerID;
RETURN @@ERROR;

The complete C# console application I used for the performance tests is in Listing 4.  The code demonstrates the 3 ways to pass a TVP from C# and logs timings for each invocation.  As you can see in Figure 1, the TVP outperformed individual calls once more than 3 customers were passed.  Note that I had to use a logarithmic y-axis scale because of the huge performance difference.  For example, 10,000 individual calls averaged about 1.5 seconds whereas passing 10,000 customers via a TVP took less to 100 milliseconds.

Figure 1:  TVP versus individual call performance
TVP performance graph

Listing 4: Passing TVPs in C#

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;
using System.Data.SqlClient;
using System.Diagnostics;
using Microsoft.SqlServer.Server;
using System.IO;

namespace TVPTest
{
    class Program
    {

        private static string connectionString = @"Data Source=MyServer;Initial Catalog=AdventureWorks2012;Integrated Security=SSPI";
        private static StreamWriter logFile = new StreamWriter(@"TvpTest.log", false);

        static void Main(string[] args)
        {

            // write column headers to log file
            logFile.WriteLine("{0},{1},{2}"
                , "Test Name"
                , "Customers"
                , "Durations");

            // run each performance test 100 times
            for (int i = 0; i < 100; ++i)
            {
                runPerformanceTest(1);
                runPerformanceTest(2);
                runPerformanceTest(3);
                runPerformanceTest(4);
                runPerformanceTest(5);
                runPerformanceTest(10);
                runPerformanceTest(25);
                runPerformanceTest(50);
                runPerformanceTest(100);
                runPerformanceTest(500);
                runPerformanceTest(1000);
                runPerformanceTest(2500);
                runPerformanceTest(5000);
                runPerformanceTest(10000);
                logFile.Flush();
            }

            logFile.Close();

            Console.WriteLine("Press any key to close.");
            Console.ReadKey();
        }

        private static void runPerformanceTest(int customerCount)
        {

            Console.WriteLine("{0} customers:", customerCount);

            DataTable orderDataTable;
            Stopwatch sw;
            DateTime duration;

            //individual call baseline
            sw = Stopwatch.StartNew();
            orderDataTable = getOrdersForCustomersIndividually(customerCount);
            sw.Stop();
            duration = new DateTime(sw.Elapsed.Ticks);
            Console.WriteLine("\tIndividual call: {0} orders returned, duration is {1}"
                , orderDataTable.Rows.Count, duration.ToString("HH:mm:ss.ffffff"));
            logFile.WriteLine("{0},{1},{2}"
                , "Individual call"
                , customerCount
                , duration.ToString("HH:mm:ss.ffffff"));

            //TVP from DataTable
            sw = Stopwatch.StartNew();
            orderDataTable = getOrdersForCustomersUsingTvp(getCustomerListDataTable(customerCount));
            sw.Stop();
            duration = new DateTime(sw.Elapsed.Ticks);
            Console.WriteLine("\tTVP DataTable: {0} orders returned, duration is {1}"
                , orderDataTable.Rows.Count, duration.ToString("HH:mm:ss.ffffff"));
            logFile.WriteLine("{0},{1},{2}"
                , "TVP DataTable"
                , customerCount
                , duration.ToString("HH:mm:ss.ffffff"));

            //TVP from DataReader
            sw = Stopwatch.StartNew();
            orderDataTable = getOrdersForCustomersUsingTvp(getCustomerListDataReader(customerCount));
            sw.Stop();
            duration = new DateTime(sw.Elapsed.Ticks);
            Console.WriteLine("\tTVP DataReader: {0} orders returned, duration is {1}"
                , orderDataTable.Rows.Count, duration.ToString("HH:mm:ss.ffffff"));
            logFile.WriteLine("{0},{1},{2}"
                , "TVP DataReader"
                , customerCount
                , duration.ToString("HH:mm:ss.ffffff"));

            //TVP from IEnumerable
            sw = Stopwatch.StartNew();
            orderDataTable = getOrdersForCustomersUsingTvp(getCustomerListSqlDataRecords(customerCount));
            sw.Stop();
            duration = new DateTime(sw.Elapsed.Ticks);
            Console.WriteLine("\tTVP IEnumerable: {0} orders returned, duration is {1}"
                , orderDataTable.Rows.Count, duration.ToString("HH:mm:ss.ffffff"));
            logFile.WriteLine("{0},{1},{2}"
                , "TVP IEnumerable"
                , customerCount
                , duration.ToString("HH:mm:ss.ffffff"));

            Console.WriteLine();

        }

        /// 
        /// Call proc individually for each customer
        /// 
        /// Number of customers to return
        /// DataTable of customer orders
        private static DataTable getOrdersForCustomersIndividually(int customerCount)
        {
            using (var connection = new SqlConnection(connectionString))
            using (var command = new SqlCommand(@"dbo.usp_select_orders_for_single_customer", connection))
            {
                command.CommandType = CommandType.StoredProcedure;
                var customerIdParameter = command.Parameters.Add("@CustomerID", System.Data.SqlDbType.Int);
                var orderListDataAdapter = new SqlDataAdapter(command);
                var orderDataTable = new DataTable();
                for (int customerId = 11000; customerId < 11000 + customerCount; ++customerId)
                {
                    customerIdParameter.Value = customerId;
                    orderListDataAdapter.Fill(orderDataTable);
                }
                return orderDataTable;
            }
        }

        /// 
        /// Call proc with TVP of CustomersIDs
        /// 
        /// List of customers
        /// DataTable of customer orders
        private static DataTable getOrdersForCustomersUsingTvp(Object customerList)
        {
            using (var connection = new SqlConnection(connectionString))
            using (var command = new SqlCommand(@"dbo.usp_select_orders_for_multiple_customers", connection))
            {
                command.CommandType = CommandType.StoredProcedure;
                command.Parameters.Add("@CustomerList", System.Data.SqlDbType.Structured).Value = customerList;
                var orderListDataAdapter = new SqlDataAdapter(command);
                var orderDataTable = new DataTable();
                orderListDataAdapter.Fill(orderDataTable);
                return orderDataTable;
            }
        }

        /// 
        /// Create a DataTable containing list of customers for TVP
        /// 
        /// Number of customers to return
        /// DataTable with single CustomerID column
        private static DataTable getCustomerListDataTable(int customerCount)
        {
            var customerListDataTable = new DataTable();
            customerListDataTable.Columns.Add("CustomerID", typeof(int));

            for (int customerId = 11000; customerId < 11000 + customerCount; ++customerId)
            {
                var row = customerListDataTable.NewRow();
                customerListDataTable.Rows.Add(row);
                row[0] = customerId;
            }
            return customerListDataTable;
        }

        /// 
        /// Create a DataReader containing list of customers for TVP
        /// 
        /// Number of customers to return
        /// DataReader with single CustomerID column        
        private static IDataReader getCustomerListDataReader(int customerCount)
        {
            var connection = new SqlConnection(connectionString);
            var command = new SqlCommand(@"SELECT CustomerID FROM Sales.Customer WHERE CustomerID BETWEEN 11000 AND 11000 + @CustomerCount - 1;", connection);
            command.Parameters.Add("@CustomerCount", System.Data.SqlDbType.Int).Value = customerCount;
            connection.Open();
            command.CommandType = CommandType.Text;
            return command.ExecuteReader();

        }

        /// 
        /// Create an IEnumerable<SqlDataRecord> object containing list of customers for TVP
        /// 
        /// Number of customers to return
        /// IEnumerable<SqlDataRecord> with single CustomerID column
        private static IEnumerable getCustomerListSqlDataRecords(int customerCount)
        {
            var customerList = new List();
            for (int customerId = 11000; customerId < 11000 + customerCount; ++customerId)
            {
                customerList.Add(customerId);
            }
            return new CustomerListSqlDataRecords(customerList);
        }

        /// 
        /// Implement IEnumerable<SqlDataRecord> to return a SqlDataRecord for each CustomerID
        /// in the provided IEnumerable<int>
        /// 
        private class CustomerListSqlDataRecords : IEnumerable
        {
            private SqlMetaData metaData = new SqlMetaData("CustomerID", SqlDbType.Int);

            private IEnumerable customerList;

            public CustomerListSqlDataRecords(IEnumerable customerList)
            {
                this.customerList = customerList;
            }

            public IEnumerator GetEnumerator()
            {
                foreach (var customerId in customerList)
                {
                    var record = new SqlDataRecord(metaData);
                    record.SetInt32(0, customerId);
                    yield return record;
                }
            }

            System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
            {
                throw new NotImplementedException();
            }
        }
    }
}

Improving Uniqueidentifier Performance

A common anti-pattern I run into is the random primary key, commonly a GUID. This design is insidious because the performance implications of random access aren’t immediately obvious and exacerbated when the primary key index is clustered. It is often only after the table grows to a larger size that the performance problems become apparent. Symptoms include slowly degrading performance over time, with increased blocking and deadlocking as a side effect.

Figure 1 shows the performance profile of a random inserts with a random GUID (SQL Server uniqueidentifier data type) clustered primary key. The red line indicates the rate of batch requests per second (inserts) while the blue line shows the total number of rows in the table, scaled such that the top of the graph represents 3M rows. Only about 700, 000 rows could be inserted during this 15 minute single-threaded random key insert test, even though the insert rate was fast initially.

Figure 1: Random key insert performance
Random insert performance graph

Incremental Primary Keys

As you might guess, the cure for the random primary key anti-pattern is an incremental key pattern. With a uniqueidentifier data type, a sequential value can be assigned by SQL Server using the NEWSEQUENTIALID function (in a default constraint expression) or in application code using the UuidCreateSequential Win32 API call along with some byte swapping (code example below). Alternatively, one can use an integral data type (int, bigint, etc.) along with a value generated by an IDENTITY property or a SEQUENCE object. The advantage of an integral type is the reduced space requirements compared to a 16-byte uniqueidentifier. The advantage of a uniqueidentifier is that it can easily be generated in application code before database persistence without a database round trip, which is desirable for distributed applications and when keys of related tables are assigned in application code before writing to the database.

Figure 2 shows the same test using a sequential key value. Over 2.2M rows were inserted in 15 minutes. As you can see, significant performance improvement is achieved with this trivial application change.

Figure 2: Incremental key insert performance
Random insert performance graph

Listing 1 shows the T-SQL code I used for these performance tests and listing 2 contains the C# code (with the random GUID commented out). I generated the uniqueidentifier value via application code in the tests but performance with NEWID() is comparable to the first test and NEWSEQUENTIALID() is similar to the second test.

Listing 1: T-SQL scripts for test table and stored procedure

CREATE TABLE dbo.TestTable(
	TestKey uniqueidentifier NOT NULL
		CONSTRAINT PK_TestTable PRIMARY KEY CLUSTERED
	,TestData char(8000) NOT NULL
);
GO

CREATE PROC dbo.InsertTestTable
	@TestKey uniqueidentifier
	,@TestData char(8000)
AS
SET NOCOUNT ON;
DECLARE @TotalRows int;

--insert row
INSERT INTO dbo.TestTable (TestKey, TestData) 
	VALUES(@TestKey, @TestData);

--update pmon counter for rowcount
SELECT @TotalRows = rows 
FROM sys.partitions 
WHERE object_id = OBJECT_ID(N'TestTable') AND index_id = 1;
EXEC sys.sp_user_counter1 @TotalRows; --for pmon row count

RETURN @@ERROR;
GO

Listing 2: C# insert test console application

using System;
using System.Data;
using System.Data.SqlClient;
using System.Runtime.InteropServices;

namespace UniqueIdentifierPerformanceTest
{
    class Program
    {
        [DllImport("rpcrt4.dll", CharSet = CharSet.Auto)]
        public static extern int UuidCreateSequential(ref Guid guid);

        static string connectionString = @"Data Source=MyServer;Initial Catalog=MyDatabase;Integrated Security=SSPI";
        static int rowsToInsert = 10000000;
        static SqlConnection connection;
        static SqlCommand command;

        static void Main(string[] args)
        {

            int rowsInserted = 0;

            using (connection = new SqlConnection(connectionString))
            {
                using (command = new SqlCommand("dbo.InsertTestTable", connection))
                {
                    command.Connection = connection;
                    command.CommandType = CommandType.StoredProcedure;
                    command.Parameters.Add("@TestKey", SqlDbType.UniqueIdentifier);
                    command.Parameters.Add("@TestData", SqlDbType.Char, 8000);

                    connection.Open();

                    while (rowsInserted < rowsToInsert)
                    {
                        //random guid
                        //command.Parameters["@TestKey"].Value = Guid.NewGuid();

                        //sequential guid
                        command.Parameters["@TestKey"].Value = NewSequentialGuid();
                        command.Parameters["@TestData"].Value = "Test";
                        command.ExecuteNonQuery();
                        ++rowsInserted;

                        //display progress every 1000 rows
                        if (rowsInserted % 1000 == 0)
                        {
                            Console.WriteLine(string.Format(
                                "{0} of {1} rows inserted"
                                , rowsInserted.ToString("#,##0")
                                , rowsToInsert.ToString("#,##0")));
                        }
                    }
                }

                connection.Close();
            }

        }

        /// 
        /// call UuidCreateSequential and swap bytes for SQL Server format
        /// 
        /// sequential guid for SQL Server
        private static Guid NewSequentialGuid()
        {
            const int S_OK = 0;
            const int RPC_S_UUID_LOCAL_ONLY = 1824;

            Guid oldGuid = Guid.Empty;

            int result = UuidCreateSequential(ref oldGuid);
            if (result != S_OK && result != RPC_S_UUID_LOCAL_ONLY)
            {
                throw new ExternalException("UuidCreateSequential call failed", result);
            }

            byte[] oldGuidBytes = oldGuid.ToByteArray();
            byte[] newGuidBytes = new byte[16];
            oldGuidBytes.CopyTo(newGuidBytes, 0);

            // swap low timestamp bytes (0-3)
            newGuidBytes[0] = oldGuidBytes[3];
            newGuidBytes[1] = oldGuidBytes[2];
            newGuidBytes[2] = oldGuidBytes[1];
            newGuidBytes[3] = oldGuidBytes[0];

            // swap middle timestamp bytes (4-5)
            newGuidBytes[4] = oldGuidBytes[5];
            newGuidBytes[5] = oldGuidBytes[4];

            // swap high timestamp bytes (6-7)
            newGuidBytes[6] = oldGuidBytes[7];
            newGuidBytes[7] = oldGuidBytes[6];

            //remaining 8 bytes are unchanged (8-15) 

            return new Guid(newGuidBytes);

        }

    }
}

Why Random Keys Are Bad

I think it’s important for one to understand why random keys have such a negative impact on performance against large tables. DBAs often cite fragmentation and page splits as the primary causes of poor performance with random keys. Although it is true random inserts do cause fragmentation and splits, the primary cause of bad performance with random keys is poor temporal reference locality (http://en.wikipedia.org/wiki/Locality_of_reference), which I’ll detail shortly. Note that there were no real page splits in these insert performance tests because the nearly 8K row size allowed only one row per page. Although significant extent fragmentation occurred, this didn’t impact these single-row requests; extent fragmentation is mostly an issue with sequential scans against spinning media. So neither splits nor fragmentation explain the poor performance of the random inserts.

Temporal reference locality basically means that once data is used (e.g. inserted or touched in any way), it is likely to be used again in the near future. This is why SQL Server uses a LRU-2 algorithm to manage the buffer cache; data most recently touched will remain in memory while older, less often referenced data are aged out. The impact of random key values on temporal locality (i.e. buffer efficiency) is huge. Consider that inserts are basically rewrites of existing pages. When a new row is inserted into a table, SQL Server first reads the page where the row belongs (by key value if the table has a clustered index) and then either adds the row to the existing data page or allocates a new one if there’s not enough space available in the existing page for the new row. With a random key value, the new key value is unlikely to be adjacent to the last one inserted (which is probably still in memory) so the needed page often must be read from storage.

All things being equal, single-row performance will be roughly the same with both sequential and random keys as long as data are memory resident. This is why the random and sequential key insert tests show the same good performance initially. But once the table size exceeded the size of the buffer pool, the random key test showed a precipitous drop in throughput and steady degradation thereafter. In short, random keys diminish temporal reference locality because there is no correlation between time (most recently accessed data) and the key value.

Why Incremental Keys Good

An incremental key value naturally improves temporal reference locality; the next key value is adjacent to the last one inserted and is likely still in memory. An incremental key provides excellent insert performance regardless of table size as the insert performance test shows. Also, applications typically use recently inserted data more often than older data. This allows the same amount of work to done with much less physical I/O than a random key value.

Random Notes about GUIDs

According to the Globally unique identifier Wiki (http://en.wikipedia.org/wiki/Globally_unique_identifier), the random 122 bits of a GUID can generate 2122 unique values. That’s an incomprehensibly large 5.3 x 1036 (or 5,300,000,000,000,000,000,000,000,000,000,000,000) number unique values.

The value returned by NEWSEQUENTIALID and UuidCreateSequential is guaranteed to be unique on a given computer. Furthermore, it is globally unique if the computer has a network card because the MAC address is used as part of the GUID generation algorithm.

Remediating a Table Partitioning Mess

Partitioning often involves exceptionally large tables, which are quite unforgiving when it comes to remediating a mess due to improper partition design or maintenance.  In this post, I’ll discuss the common design mistake of using a RANGE LEFT partition function containing exact date boundaries with a date type that may include a time component (datetime, datetime2 or datetimeoffset) and show how to change it to RANGE RIGHT with minimal data movement.

Background
The most common use case for table partitioning is an incremental value.  Partitioning on a date, datetime, datetime2 or datetimeoffset value that occurs naturally in the data allows incremental load using SPLIT and SWITCH operations and purge/archive using SWITCH and MERGE.  SWITCH is nearly instantaneous regardless of the amount of underlying data involved because only meta-data changes are needed to move partitions from one table to another.  SPLIT and MERGE are also very fast meta-data only operations as long as no data movement is required.  For this reason, it is critical to setup boundaries correctly and plan as to perform SPLIT and MERGE against empty partitions whenever possible.

Note that RANGE LEFT or RANGE RIGHT specification determines:

  • The partition that contains exact boundary value matches
  • The partition is created by SPLIT
  • The partition is removed by MERGE

A common design mistake is using exact date boundaries in a RANGE LEFT function with datetime, datetime2 or datetimeoffset data type.  This error can result in rows in the wrong partition because these data types include a time component and the implications are not obvious, especially to partitioning newbies.  Rows that exactly match the date boundary (i.e. midnight on boundary date) are inserted into the lower partition instead of the higher one as desired.  For example, consider a RANGE LEFT function is a monthly sliding window scenario and boundaries of ‘2013-01-01T00:00:00’ and ‘2013-02-01T00:00:00’:

CREATE PARTITION FUNCTION PF_Monthly(datetime2(0))
    AS RANGE LEFT FOR VALUES('2013-01-01T00:00:00', '2013-02-01T00:00:00');

CREATE PARTITION SCHEME PS_Monthly AS
    PARTITION PF_Monthly TO (December2012, January2013, February2013);

This setup looks reasonable at first glance but consider the actual partitions when a table is created on the partition scheme:

  • Partition 1 (December2012 filegroup):  <= ‘2013-01-01T00:00:00’
  • Partition 2 (January2013 filegroup):  > ‘2013-01-01T00:00:00’ AND <= ‘2013-01-02T00:00:00’
  • Partition 3 (February2013 filegroup):  > ‘2013-02-01T00:00:00’

A row with the January datetime of ‘2013-01-01T00:00:00’ will end up in the first partition (December2012 filegroup).  Sliding window maintenance, which purges the oldest month (December2012), will include the ‘2013-01-01T00:00:00’ row(s) inadvertently.  Similarly, a row with value ‘2013-02-01T00:00:00’ will end up in the second partition (January2013 fielgroup) and require both the second and third partitions to be accessed when data from all of February are queried.

Remediation
There are two ways to address this problem.  One is to change the boundary values of the existing RANGE LEFT function to the maximum inclusive value instead of an exact date. For example, instead of ‘2013-01-01T00:00:00’ for January, 2013, change the boundary values to the maximum value according to the partition function data type as follows:

  • *datetime: ‘2013-01-31T23:59:59.997’
  • datetime2(0): ‘2013-01-31T23:59:59’
  • datetime2(1): ‘2013-01-31T23:59:59.9’
  • datetime2(2): ‘2013-01-31T23:59:59.99’
  • datetime2(3): ‘2013-01-31T23:59:59.999’
  • datetime2(4): ‘2013-01-31T23:59:59.9999’
  • datetime2(5): ‘2013-01-31T23:59:59.99999’
  • datetime2(6): ‘2013-01-31T23:59:59.999999’
  • datetime2(7): ‘2013-01-31T23:59:59.9999999’
  • **datetimeoffset (0): ‘2013-01-31T23:59:59-05:00’’
  • **datetimeoffset (1): ‘2013-01-31T23:59:59.9-05:00’’
  • **datetimeoffset (2): ‘2013-01-31T23:59:59.99-05:00’’
  • **datetimeoffset (3): ‘2013-01-31T23:59:59.999-05:00’’
  • **datetimeoffset (4): ‘2013-01-31T23:59:59.9999-05:00’’
  • **datetimeoffset (5): ‘2013-01-31T23:59:59.99999-05:00’’
  • **datetimeoffset (6): ‘2013-01-31T23:59:59.999999-05:00’’
  • **datetimeoffset(7): ‘2013-01-31T23:59:59.9999999-05:00’’

*Importantly, one needs to specify 997 as the milliseconds value for datetime because a higher millisecond value will be rounded to the next exact date (e.g. ‘2013-02-01T00:00:00.000’)

**The actual offset for datetimeoffset partitioning depends on your environment but is typically the same for each boundary.

Rather than changing the boundary specification, I suggest instead changing the function from LEFT to RIGHT and continue using an exact date boundary.  The inclusive date boundary is more intuitive (in my opinion) and allows the same date boundaries to be used regardless of data type.  The same amount of work is required for a mass change of existing boundaries as is required to change from RANGE LEFT to RANGE RIGHT.

The script below converts a table partitioned using a RANGE LEFT function with exact date boundaries to one partitioned RANGE RIGHT as follows:

  1. create a staging table exactly like the source table and partitioned using the same scheme
  2. create a non-partitioned holding table with the same schema source but without constraints and indexes
  3. create a new RANGE RIGHT function with the same exact date boundaries as the existing RANGE LEFT scheme
  4. create a new partition scheme with the same filegroups as the source table partition scheme but based on the new RANGE RIGHT function
  5. create a new table exactly like the source but partitioned using the new RANGE RIGHT scheme
  6. for each existing partition
    1. switch source partition to staging table
    2. move rows with exact boundary matches to holding table
    3. create check constraint on staging table matching the target right boundaries
    4. switch staging partition to new table
    5. drop check constraint
    6. after all partitions are switched to the new table, insert exact boundary matching rows to new table
    7. drop the old table and staging tables
    8. rename the new table, constraints and indexes to the old names

Since this script uses SWITCH to accomplish the bulk of the work, the table must be aligned; data and indexes must be partitioned using the same (or functionally identical) partition function.  The SWITCH operations are nearly instantaneous and the only data movement required is moving rows that exactly match the boundary values to/from the holding table, which shouldn’t take long unless you have a lot of matching rows.  The long pole in the tent from a performance perspective is likely creating the needed check constraint on the staging table because all rows must be scanned.  However, this method is still considerably faster and requires less space than repartitioning the table by recreating indexes using the new partition scheme.

Here’s the setup script for the problem table and sample data:

--existing RANGE LEFT function with exact date boundaries
CREATE PARTITION FUNCTION PF_Left(datetime)
       AS RANGE LEFT FOR VALUES('20130101','20130201');
--existing scheme based on RANGE LEFT
CREATE PARTITION SCHEME PS_Left
       AS PARTITION PF_Left ALL TO ([DEFAULT]);
--existing table, constraint and index partitioned with RANGE LEFT scheme
CREATE TABLE dbo.OldTable(
       PartitioningColumn datetime NOT NULL
       ,DataColumn int NOT NULL
       ) ON PS_Left(PartitioningColumn);
ALTER TABLE dbo.OldTable
	ADD CONSTRAINT PK_OldTable PRIMARY KEY CLUSTERED (PartitioningColumn, DataColumn) 
	ON PS_Left(PartitioningColumn);
CREATE INDEX idx_OldTable_DataColumn ON dbo.OldTable(DataColumn) 
	ON PS_Left(PartitioningColumn);
--sample data
INSERT INTO dbo.OldTable VALUES
       ('20121231',1)
       ,('20130101',1)
       ,('20130102',2)
       ,('20130103',5)
       ,('20130131',2)
       ,('20130201',4)
       ,('20130202',3);
GO

And the remediation script to change to RANGE RIGHT:

--new RANGE RIGHT function with same boundaries as existing RANGE LEFT function
CREATE PARTITION FUNCTION PF_Right(datetime)
       AS RANGE RIGHT FOR VALUES('20130101','20130201');

--new scheme based on RANGE RIGHT with same filegroups as RANGE LEFT scheme
CREATE PARTITION SCHEME PS_Right
       AS PARTITION PF_Right ALL TO ([DEFAULT]);

--new staging table exactly line existing source table
CREATE TABLE dbo.OldTableStaging(
       PartitioningColumn datetime NOT NULL
	   ,DataColumn int NOT NULL
       ) ON PS_Left(PartitioningColumn);
ALTER TABLE dbo.OldTableStaging
	ADD CONSTRAINT PK_OldTableStaging PRIMARY KEY CLUSTERED (PartitioningColumn, DataColumn) 
	ON PS_Left(PartitioningColumn);
CREATE INDEX idx_OldTableStaging_DataColumn ON dbo.OldTableStaging(DataColumn) 
	ON PS_Left(PartitioningColumn);

--new holding table for exact boundary matches (no constraints or indexes needed)
CREATE TABLE dbo.OldTableStagingExactBoundaryMatches(
       PartitioningColumn datetime NOT NULL
	   ,DataColumn int NOT NULL
       ) ON [DEFAULT];

--new table exactly like existing table but partitioned with RANGE RIGHT scheme
CREATE TABLE dbo.NewTable(
       PartitioningColumn datetime NOT NULL
	   ,DataColumn int NOT NULL
       ) ON PS_Right(PartitioningColumn);
ALTER TABLE dbo.NewTable
	ADD CONSTRAINT PK_NewTable PRIMARY KEY CLUSTERED (PartitioningColumn, DataColumn) 
	ON PS_Right(PartitioningColumn);
CREATE INDEX idx_NewTable_DataColumn ON dbo.NewTable(DataColumn) 
	ON PS_Right(PartitioningColumn);
GO

--script to convert RANGE LEFT to RANGE RIGHT
SET NOCOUNT ON;
DECLARE 
       @LowerBoundary datetime
       ,@UpperBoundary datetime
       ,@PartitionNumber int
       ,@Sql nvarchar(MAX)
       ,@PartitionFunctionName sysname = N'PF_Right';

DECLARE PartitionBoundaries CURSOR LOCAL FAST_FORWARD FOR
WITH 
       PartitionNumbers AS (
              SELECT 1 AS PartitionNumber
              UNION ALL
              SELECT boundary.boundary_id + 1
              FROM sys.partition_functions AS pf
              JOIN sys.partition_range_values AS boundary ON pf.function_id = boundary.function_id
              WHERE pf.name = @PartitionFunctionName
              )
       ,PartitionBoundaries AS (
              SELECT
                     (SELECT CAST(boundary.value AS datetime)
                     FROM sys.partition_functions AS pf
                     JOIN sys.partition_range_values AS boundary ON pf.function_id = boundary.function_id
                     WHERE
                           pf.name = @PartitionFunctionName
                           AND boundary.boundary_id = PartitionNumbers.PartitionNumber - 1
                     ) AS LowerBoundary
                     ,(SELECT CAST(boundary.value AS datetime)
                     FROM sys.partition_functions AS pf
                     JOIN sys.partition_range_values AS boundary ON pf.function_id = boundary.function_id
                     WHERE
                           pf.name = @PartitionFunctionName
                           AND boundary.boundary_id = PartitionNumbers.PartitionNumber
                     ) AS UpperBoundary
                     ,PartitionNumbers.PartitionNumber
              FROM PartitionNumbers
              )
SELECT LowerBoundary, UpperBoundary, PartitionNumber
FROM PartitionBoundaries
ORDER BY PartitionNumber;

OPEN PartitionBoundaries;
WHILE 1 = 1
BEGIN
       --get next partition boundary
       FETCH NEXT FROM PartitionBoundaries INTO @LowerBoundary, @UpperBoundary, @PartitionNumber;
       IF @@FETCH_STATUS = -1 BREAK;

       --move partition from partitioned left table to staging
       RAISERROR('Switching out partition %d', 0, 1, @PartitionNumber) WITH NOWAIT;
       ALTER TABLE dbo.OldTable
              SWITCH PARTITION @PartitionNumber TO dbo.OldTableStaging PARTITION @PartitionNumber;

       --move exact boundary matches to holding table
       DELETE FROM dbo.OldTableStaging WITH(TABLOCKX)
       OUTPUT deleted.* INTO dbo.OldTableStagingExactBoundaryMatches
       WHERE PartitioningColumn = @UpperBoundary;
       RAISERROR('Moved %d exact boundary matches to holding table', 0, 1, @@ROWCOUNT) WITH NOWAIT;

       --create check constraint to match RANGE RIGHT boundary
       SET @Sql = N'ALTER TABLE dbo.OldTableStaging
              ADD CONSTRAINT CK_TestStaging CHECK (';
       IF @LowerBoundary IS NOT NULL
       BEGIN
              SET @Sql = @Sql + N'PartitioningColumn >= ''' + CONVERT(char(8), @LowerBoundary,112) + N'''';
              IF @UpperBoundary IS NOT NULL
              BEGIN
                     SET @Sql = @Sql + N' AND ';
              END;
       END;
       IF @UpperBoundary IS NOT NULL
       BEGIN
              SET @Sql = @Sql + N'PartitioningColumn < ''' + CONVERT(char(8), @UpperBoundary,112) + N'''';
       END;
       SET @Sql = @Sql + ');';
       RAISERROR('Creating check constraint: %s', 0, 1, @Sql) WITH NOWAIT;
       EXEC sp_executesql @Sql;

       --move staging table partition to partitioned right table
       RAISERROR('Switching in partition %d', 0, 1, @PartitionNumber) WITH NOWAIT;
       ALTER TABLE dbo.OldTableStaging
              SWITCH PARTITION @PartitionNumber TO dbo.NewTable PARTITION @PartitionNumber;

       --remove staging table constraint
       RAISERROR('Dropping check constraint', 0, 1, @PartitionNumber) WITH NOWAIT;
       ALTER TABLE dbo.OldTableStaging
              DROP CONSTRAINT CK_TestStaging;

END;
CLOSE PartitionBoundaries;
DEALLOCATE PartitionBoundaries;

--insert exact boundary matches
INSERT INTO dbo.NewTable WITH (TABLOCKX) (PartitioningColumn, DataColumn) 
       SELECT PartitioningColumn, DataColumn FROM dbo.OldTableStagingExactBoundaryMatches WITH (TABLOCKX);
RAISERROR('Inserted %d exact boundary matches from holding table', 0, 1, @@ROWCOUNT) WITH NOWAIT;

IF NOT EXISTS(SELECT * FROM OldTable)
BEGIN
	--drop old table
	DROP TABLE dbo.OldTable;
	--rename table, constraints and indexes to same names as original table
	EXEC sp_rename 'dbo.NewTable.idx_NewTable_DataColumn', 'idx_OldTable_DataColumn', 'INDEX';
	EXEC sp_rename 'dbo.PK_NewTable', 'PK_OldTable';
	EXEC sp_rename 'dbo.NewTable', 'OldTable';
END;
--drop staging tables
DROP TABLE dbo.OldTableStaging;
DROP TABLE dbo.OldTableStagingExactBoundaryMatches;
GO

Service Broker External Activator Example

The Service Broker External Activator (SBEA) allows one to launch any command-line program asynchronously using T-SQL.  I haven’t found many complete end-to-end examples on how to create an SBEA application so this article will demonstrate with a common use-case; launch a SSIS package on demand with the DTEXEC command-line utility.  The package can be run on any machine with SBEA and SSIS installed, which allows you to run the package remotely on a different server if you so desire.

Service Broker External Activator Overview

The latest version of the tool, officially named Microsoft® SQL Server® Service Broker External Activator for Microsoft® SQL Server® 2012, is a free download (http://www.microsoft.com/en-us/download/details.aspx?id=29065).  Service Broker External Activator is also included in the SQL Server 2008 and SQL Server 2008 R2 Feature Packs.

SBEA runs as a Windows service and leverages the SQL Server Service Broker event notification infrastructure to run a command-line application upon receipt of an event notification message.  When an event notification message is received, it launches the command-line program associated with the queue identified in the notification event message.  There is some learning curve if you are not already familiar with Service Broker and event notification concepts but it is a worthwhile investment, not just for SBEA, but for any application that needs asynchronous or message-based processing.

SBEA is an alternative to over other methods that can launch an external program via T-SQL, such as xp_cmdshell, a SQL Agent Job (sp_start_job) or a custom SQLCLR proc.  The xp_cmdshell extended stored procedure allows any ad-hoc command to be run on the database server, limited only by SQL Server service or proxy account permissions.  You need to be mindful of the security implications and take precautions to ensure it can be used only in the way intended.  Also, xp_cmdshell runs the launched program synchronously so the calling application must wait for the application to finish.  You could take special measures to emulate asynchronous behavior with xp_cmdshell but that’s a kludge at best.

A SQL Agent job allows you to run a command asynchronously but the limitation is only one instance of a job to run at a time.  In contrast, SBEA allows one to run multiple application instances concurrently and does so automatically.  SBEA allows a configurable maximum number of application instances so that you can scale automatically to meet peak demands.

SQL Server 2012 introduces the SSISDB catalog, which provides stored procedures to run SSIS packages asynchronously via T-SQL.  However, packages must be stored in the SSISDB database in order to use this feature.  SBEA is an alternative if you are pre SQL Server 2012, need to run packages not stored in the SSIS catalog, or need to run command-line programs besides DTEXEC.EXE.

SBEA Database Objects

As with all Service Broker applications, the minimal database objects needed are an initiator service and queue as well as a target service and queue.  These are used to pass messages to and from the target application (SSIS package in this case).  Additionally, the SBEA needs a notification service, notification queue and event notification so that it is notified when messages are waiting in the target queue for processing.

Stored procedures are a great way to encapsulate Service Broker functionality and implementation details so I use procs here as the application database interface.  Stored procs also simply security since the only database permissions necessary are execute permissions on the procs.  Within the procs, I also log to a table to facilitate monitoring progress, troubleshooting and auditing.  The complete T-SQL script is included at the end of this article.

Below is a summary of the database objects I created for this example:

Service Broker Objects

  • BatchProcessInitiatorService:  SB service that requests package execution (requesting application)
  • BatchProcessInitiatorQueue:  SB queue for success or failure result messages (returned by package after execution)
  • BatchProcessTargetService:  SB service that executes the request (SSIS package)
  • BatchProcessTargetQueue:  SB queue for execution requests, with message containing parameters needed by SSIS package (or an empty message if no parameters are needed)
  • BatchProcessNotificationService:  SB service that is the target of the event notifications (SBEA service)
  • BatchProcessNotificationQueue:  SB queue for event notification messages
  • BatchProcessEventNotification:  SB event notification that monitors the BatchProcessTargetQueue

Stored Procedures

  • dbo.usp_LaunchBatchProcess (called by invoking application): begins conversation to target service and sends message with parameters needed by package
  • dbo.usp_GetBatchProcessParemeters (called by SSIS package at start): receives next message from target queue and logs to table
  • dbo.usp_CompleteBatchProcess (called by SSIS package at completion): returns result message to initiator using END CONVERSATION (or END CONVERSATION WITH ERROR)
  • dbo.usp_LogBatchProcessResult (initiator queue activated proc): receives result messages from initiator queue, logs results to table and ends the conversation

Table

  • dbo.BatchProcessLog: Each received message is logged to this table

SBEA Configuration File

Service Broker External Activator is configured using the xml configuration file (EAService.config) in the application Config folder (e.g. C:\Program Files\Service Broker\External Activator\Config\).  This configuration file specifies:

  • The name of the notification service
  • Connection string used to connect to the database
  • A list of one or more Service Broker services along with the command-line application and arguments to launch for each

Below is the example I use for this sample with the relevant values in bold.  As a side note, SBEA currently supports only one notification service even though the parent element name (NotificationServiceList) implies one can have many.  However, I haven’t personally run into a case where this is an issue since, in a distributed environment, one can designate a single database for the SBEA messaging infrastructure.

<?xml version="1.0" encoding="utf-8"?>
<Activator xmlns="http://schemas.microsoft.com/sqlserver/2008/10/servicebroker/externalactivator"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://schemas.microsoft.com/sqlserver/2008/10/servicebroker/externalactivator EAServiceConfig.xsd">
<NotificationServiceList>
<NotificationService name="BatchProcessNotificationService" id="100" enabled="true">
<Description>My test notification service</Description>
<ConnectionString>
<!-- All connection string parameters except User Id and Password should be specificed here -->
<Unencrypted>server=myserver;database=SBEA_Example;Application Name=External Activator;Integrated Security=true;</Unencrypted>
</ConnectionString>
</NotificationService>
</NotificationServiceList>
<ApplicationServiceList>
<ApplicationService name="BatchProcessTargetService" enabled="true">
<OnNotification>
<ServerName>MyServer</ServerName>
<DatabaseName>SBEA_Example</DatabaseName>
<SchemaName>dbo</SchemaName>
<QueueName>BatchProcessTargetQueue</QueueName>
</OnNotification>
<LaunchInfo>
<ImagePath>C:\Program Files\Microsoft SQL Server\110\DTS\Binn\DTEXEC.EXE</ImagePath>
<CmdLineArgs>/F C:\Packages\Package.dtsx</CmdLineArgs>
<WorkDir>c:\temp</WorkDir>
</LaunchInfo>
<Concurrency min="1" max="1" />
</ApplicationService>
</ApplicationServiceList>
<LogSettings>
<LogFilter>
<TraceFlag>All Levels</TraceFlag>
 <TraceFlag>All Modules</TraceFlag>
 <TraceFlag>All Entities</TraceFlag>
 <TraceFlag>Verbose</TraceFlag>
<!--
http://schemas.microsoft.com/sqlserver/2008/10/servicebroker/externalactivator/EAServiceConfig.xsd
<xs:enumeration value="Error"/>
<xs:enumeration value="Warning"/>
<xs:enumeration value="Information"/>
<xs:enumeration value="Verbose"/>
<xs:enumeration value="All Levels"/>
<xs:enumeration value="Application Monitor"/>
<xs:enumeration value="Configuration Manager"/>
<xs:enumeration value="Global"/>
<xs:enumeration value="Recovery Log"/>
<xs:enumeration value="Imported Methods"/>
<xs:enumeration value="Notification Service"/>
<xs:enumeration value="Service Broker"/>
<xs:enumeration value="All Modules"/>
<xs:enumeration value="Asynchronous Updates"/>
<xs:enumeration value="Threads"/>
<xs:enumeration value="All Entities"/>
-->
</LogFilter>
</LogSettings>
</Activator>

Service Broker External Activator Processing

When the SBEA service starts, it validates the application service list against Service Broker meta-data to ensure the specified services and queues exist and are properly associated.  Then SBEA starts monitoring the notification queue associated with the specified notification service.  When a queue notification event message is received, it matches the server, database, schema and queue contained in the message payload against the application services listed.  When matched, it launches the command-line application associated with that service.  A warning message () is logged if a queue notification message is received that doesn’t have a corresponding application.

SBEA logs trace messages to a file named EATrace.log in the Log folder (C:\Program Files\Service Broker\External Activator\Log).  Logging levels are configurable.  I suggest you start with verbose logging initially (as in the same config) and adjust to less logging once your application is stable.  I included the config schema reference and various logging options as comments in the sampe config for your convenience.  Be aware that trace messages are appended to the file so you’ll want to periodically delete (or archive) the file.

SBEA Application Design and Message Flow

There are a many different patterns one can generally employ with Service Broker bi-directional message processing.  Here, I use a single-message-per-conversation pattern to pass parameters needed by the package as well-formed XML.  The target (SSIS package) does its work and ends its side of the conversation upon completion, indicating success or failure on the END CONVERSATION statement, which generates an EndDialog or Error message that is returned via the initiator queue.

Service Broker automatically executes the initiator queue activated proc when a message is waiting on the queue.  Acting on behalf of the initiating application, which might not be running due to asynchronous execution of the package, the activated proc ends the other side of the conversation upon receipt of the EndDialog or Error message.  Both sides of the conversation are ended, a best practice with this single-message-per-conversation pattern.

The following sequence describes the interaction and flow of messages between components:

  1. Initiating application executes dbo.usp_LaunchBatchProcess to launch console application.  Proc executes BEGIN DIALOG and SEND to insert the parameter message into the target queue.
  2. SBEA executes RECEIVE of the event notification message, indicating a message is waiting in the target queue
  3. SBEA launches the console application associated with the event notification
  4. The console application executes dbo.usp_GetBatchProcessParameters to get parameter values.  Proc executes RECEIVE of message from target queue and logs the message.
  5. The console application executes dbo.usp_CompleteBatchProcess at completion.  Proc executes END CONVERSATION to signal successful completion (or END CONVERSATION WITH ERROR to indicate an error occurred).
  6. Service Broker launches activated stored procedure dbo.usp_LogBatchProcessResult in response to the EndDialog message.  Proc execute RECEIVE from initiator queue and logs the result to a table.

The diagram below shows this message flow.SBEAApplicationMessageFlow

SBEAApplicationMessageFlow

Example Script

USE master;
GO

---------------------------------------------------
--- create database with Service Broker enabled ---
---------------------------------------------------
ALTER DATABASE SBEA_Example
SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
GO

DROP DATABASE SBEA_Example;
GO

CREATE DATABASE SBEA_Example;
ALTER AUTHORIZATION ON DATABASE::SBEA_Example TO sa;
GO

ALTER DATABASE SBEA_Example
SET ENABLE_BROKER;
GO

-------------------------------
--- create database objects ---
-------------------------------
USE SBEA_Example;
GO

--log table
CREATE TABLE dbo.BatchProcessLog(
ConversationHandle uniqueidentifier NOT NULL
,MessageTypeName sysname NOT NULL
,MessageBody varbinary(MAX) NULL
,LogTime datetime2(3) NOT NULL
CONSTRAINT DF_ServiceBrokerLog_LogTime
DEFAULT (SYSDATETIME())
);
CREATE CLUSTERED INDEX cdx_BatchProcessLog ON dbo.BatchProcessLog(LogTime);
GO

CREATE PROC dbo.usp_LogBatchProcessResult
---------------------------------------------
--initiator queue activated proc to process messages
---------------------------------------------
AS
DECLARE
@conversation_handle uniqueidentifier
,@message_type_name sysname
,@message_body varbinary(MAX);
WHILE 1 = 1
BEGIN
WAITFOR (
RECEIVE TOP (1)
@conversation_handle = conversation_handle
,@message_type_name = message_type_name
,@message_body = message_body
FROM dbo.BatchProcessInitiatorQueue
), TIMEOUT 1000;
IF @@ROWCOUNT = 0
BEGIN
--exit when no more messages
RETURN;
END;

--log message
INSERT INTO dbo.BatchProcessLog(
ConversationHandle
,MessageTypeName
,MessageBody
)
VALUES(
@conversation_handle
,@message_type_name
,@message_body
);
END CONVERSATION @conversation_handle;
END;
GO

CREATE PROC dbo.usp_LaunchBatchProcess
@Parameter1 int
---------------------------------------------
--called by application to trigger batch process
--Sample Usage:
--
-- EXEC dbo.usp_LaunchBatchProcess @@Parameter1 = 1;
---------------------------------------------
AS
DECLARE
@conversation_handle uniqueidentifier
,@message_body varbinary(MAX);

BEGIN TRY

BEGIN TRAN;

BEGIN DIALOG CONVERSATION @conversation_handle
FROM SERVICE BatchProcessInitiatorService
TO SERVICE 'BatchProcessTargetService'
ON CONTRACT [DEFAULT]
WITH
ENCRYPTION = OFF,
LIFETIME = 6000;

SET @message_body = CAST(
N''
+ CAST(@Parameter1 AS nvarchar(10))
+ N'' AS varbinary(MAX));

SEND ON CONVERSATION @conversation_handle (@message_body);

COMMIT;
END TRY
BEGIN CATCH
THROW;
END CATCH;

RETURN @@ERROR;
GO

CREATE PROC dbo.usp_GetBatchProcessParameters
--------------------------------------
--called by batch package at start ---
--------------------------------------
AS
DECLARE
@conversation_handle uniqueidentifier
,@message_body xml
,@message_type_name sysname
,@parameter1 int;

BEGIN TRY

BEGIN TRAN;

RECEIVE TOP(1)
@conversation_handle = conversation_handle
,@message_type_name = message_type_name
,@message_body = message_body
FROM dbo.BatchProcessTargetQueue;

IF @@ROWCOUNT = 0
BEGIN
RAISERROR ('No messages received from dbo.BatchProcessTargetQueue', 16, 1);
RETURN 1;
END;

INSERT INTO dbo.BatchProcessLog(
ConversationHandle
,MessageTypeName
,MessageBody
)
VALUES(
@conversation_handle
,@message_type_name
,CAST(@message_body AS varbinary(MAX))
);

SET @parameter1 = @message_body.query('/Parameters/Parameter1').value('.', 'int');

COMMIT;

SELECT
@conversation_handle AS ConversationHandle
,@parameter1 AS Parameter1;

END TRY
BEGIN CATCH
THROW;
END CATCH;

RETURN @@ERROR;
GO

CREATE PROC dbo.usp_CompleteBatchProcess
@ConversationHandle uniqueidentifier
,@ErrorMessage nvarchar(3000) = NULL
------------------------------------------
-- called by SSIS package at completion
-- Sample Usage:

-- normal completion:
-- EXEC dbo.usp_CompleteBatchProcess
-- @ConversationHandle = '00000000-0000-0000-0000-000000000000';

-- completed with error:
-- EXEC dbo.usp_CompleteBatchProcess
-- @ConversationHandle = '00000000-0000-0000-0000-000000000000'
-- @ErrorMessage = 'an error occurred;
------------------------------------------
AS

IF @ErrorMessage IS NULL
BEGIN
END CONVERSATION @ConversationHandle;
END
ELSE
BEGIN
END CONVERSATION @ConversationHandle
WITH ERROR = 1
DESCRIPTION = @ErrorMessage;
END;

RETURN @@ERROR;
GO

--initiator queue with activated proc to process batch completed messages
CREATE QUEUE dbo.BatchProcessInitiatorQueue
WITH STATUS = ON,
ACTIVATION (
PROCEDURE_NAME = dbo.usp_LogBatchProcessResult,
MAX_QUEUE_READERS = 1,
EXECUTE AS SELF );
GO

--initiator service that triggers batch process
CREATE SERVICE BatchProcessInitiatorService
ON QUEUE dbo.BatchProcessInitiatorQueue
([DEFAULT]);
GO

--queue for event notifications
CREATE QUEUE dbo.BatchProcessNotificationQueue;
GO

--service for event notifications
CREATE SERVICE BatchProcessNotificationService
ON QUEUE dbo.BatchProcessNotificationQueue
(
[http://schemas.microsoft.com/SQL/Notifications/PostEventNotification]
);
GO

--target queue for batch process parameters
CREATE QUEUE dbo.BatchProcessTargetQueue;
GO

--target service for batch process parameters
CREATE SERVICE BatchProcessTargetService
ON QUEUE dbo.BatchProcessTargetQueue
([DEFAULT]);
GO

--event notification for target queue
CREATE EVENT NOTIFICATION BatchProcessTargetNotification
ON QUEUE dbo.BatchProcessTargetQueue
FOR QUEUE_ACTIVATION
TO SERVICE 'BatchProcessNotificationService' , 'current database';
GO

Summary

The Service Broker External Activator is a powerful way to launch command-line programs asynchronously.  There is a learning curve, which can be steep if you know nothing about the SQL Server Service Broker feature.  But I hope this doesn’t discourage you from exploring and using SBEA.  It may be the right tool for the job in cases where you need asynchronous batch processing, avoiding custom programming or kludge design.  It’s always a good thing to have another tool in your toolbox.

 

 

 

Secrets of Foreign Key Index Binding

You might be surprised to learn that foreign keys bind to physical indexes when they are created. Furthermore, a foreign key does not necessarily bind to the primary key index of the referenced table; SQL Server allows a foreign key to refer to any column(s) that are guaranteed to be unique as enforced by a primary key constraint, unique constraint or unique index.

In this post, I’ll discuss the undocumented rules SQL Server uses to bind foreign key constraints to referenced table indexes so that you can achieve performance goals and protect yourself against unexpected errors in DDL modification scripts.

Background

Typically, one references the primary key in foreign key relationships. I’ve seen a foreign key (deliberately) reference columns other than the primary key only a couple of times in my career. The foreign key referenced an alternate key with a unique constraint in those cases. Why one would create such a relationship is an exercise for the reader. I’ll focus on the primary key here, although the same considerations apply to foreign keys referencing alternate keys.

As I mentioned earlier, SQL Server binds a foreign key to a physical unique index. This binding performance implications because it determines the index SQL Server uses to enforce referential integrity as child table rows are inserted or updated. Also, SQL Server will not allow the index bound to a foreign key to be dropped since that could allow duplicate rows in the parent table and thus break the unique side of the relationship. This must be considered when developing scripts that drop unique indexes (including primary key and unique constraints) that may be bound to foreign keys.

A foreign key referencing the primary key will always be bound to the primary key index when that is the only unique index on the foreign key column(s). However, you might have additional unique indexes on the primary key column(s) for performance reasons. For example, consider the case of a clustered primary key. Performance of a frequently executed query may be improved with a covering non-clustered index:

–create parent table

CREATE TABLE dbo.ParentTable(

       ParentTableID int NOT NULL IDENTITY

              CONSTRAINT PK_ParentTable PRIMARY KEY CLUSTERED

       ,Column1 int NOT NULL

       ,Column2 varchar(100) NOT NULL

       );

GO

 

–create a non-clustered covering index

CREATE UNIQUE NONCLUSTERED INDEX idx_ParentTable_ParentTableID

       ON dbo.ParentTable(ParentTableID) INCLUDE(Column1);

GO

 

INSERT INTO dbo.ParentTable VALUES(1, ‘some data’);

INSERT INTO dbo.ParentTable VALUES(2, ‘some data’);

INSERT INTO dbo.ParentTable VALUES(3, ‘some data’);

GO

 

–create child table

CREATE TABLE dbo.ChildTable(

       ChildTableID int NOT NULL IDENTITY

            CONSTRAINT PK_ChildTable PRIMARY KEY CLUSTERED

       ,ParentTableID int NOT NULL

              CONSTRAINT FK_ChildTable_ParentTable

                     FOREIGN KEY REFERENCES dbo.ParentTable(ParentTableID)

       );

GO

 

INSERT INTO dbo.ChildTable VALUES(1);

INSERT INTO dbo.ChildTable VALUES(1);

INSERT INTO dbo.ChildTable VALUES(1);

INSERT INTO dbo.ChildTable VALUES(1);

INSERT INTO dbo.ChildTable VALUES(2);

INSERT INTO dbo.ChildTable VALUES(2);

INSERT INTO dbo.ChildTable VALUES(2);

INSERT INTO dbo.ChildTable VALUES(2);

INSERT INTO dbo.ChildTable VALUES(3);

INSERT INTO dbo.ChildTable VALUES(3);

INSERT INTO dbo.ChildTable VALUES(3);

INSERT INTO dbo.ChildTable VALUES(3);

GO

 

UPDATE STATISTICS dbo.ParentTable;

UPDATE STATISTICS dbo.ChildTable;

GO

 

–show the foreign key index binding

SELECT

    fki.name

FROM sys.foreign_keys AS f

JOIN sys.indexes AS fki ON

      fki.object_id = f.referenced_object_id     

      AND fki.index_id = f.key_index_id

WHERE

      f.object_id = OBJECT_ID(N’dbo.FK_ChildTable_ParentTable’);

GO

 

–this query uses the covering index instead of clustered PK index

SELECT p.ParentTableID, p.Column1

FROM dbo.ParentTable AS p

WHERE p.ParentTableID IN(1,2,3);

GO

 

 

The SELECT query in the above script uses the covering idx_ParentTable_ParentTableID index. While this is good for performance, it introduces ambiguity regarding index binding to the foreign key. Again, any primary key constraint, unique constraint or index on the referenced column(s) may be referenced by a foreign key. With two candidate unique indexes (PK_ParentTable and idx_ParentTable_ParentTableID), you have little control which index is bound to the foreign key. 

SQL Server chooses the index binding based on rules that vary by version so you will get different binding depending on your version of SQLServer. SQL Server 2005 chooses the clustered index when possible and, if no suitable clustered index exists, the first (lowest index_id) unique non-clustered index on the referenced column(s) is used. The sample script above binds the foreign key to the PK_WideTable index under SQL Server 2005 because it is the clustered index, not because it is the primary key.

In later versions (SQL 2008, SQL 2008R2 and SQL 2012), the foreign key is bound to the unique non-clustered index on the referenced column(s) with the lowest index_id when possible. Only when no suitable unique non-clustered index exists is the unique clustered index chosen. So the foreign key in the above script is bound to idx_ParentTable_ParentTableID in SQL 2008 and later versions instead of the primary key index as one might expect.

Why Foreign Key Index Binding is Important

There are two reasons why it is important to control the index bound to a foreign key. One is performance. As I mentioned earlier, the index bound to the foreign key constraint is used at execution time to enforce the constraint as child table rows are inserted or the foreign key column(s) updated. If the parent table is large and not queried often but rows are inserted into the child table heavily, a unique non-clustered index that “covers” the referential integrity check may be more desirable than the clustered index. This can improve buffer efficiency and page life expectancy compared to using a clustered index (e.g. primary key). My assumption is that this is why SQL Server 2008 and later versions prefer the unique non-clustered index over the clustered index for constraint enforcement.

Another reason one should control the index bound to the foreign key is to facilitate index changes. If you try to drop an index bound to a foreign key, you’ll get an error like “An explicit DROP INDEX is not allowed on index ‘dbo.ParentTable.idx_ParentTable_ParentTableID ‘. It is being used for FOREIGN KEY constraint enforcement.” You’ll need to drop the foreign key first and recreate after dropping the index.

Since one can’t specify the bound foreign key index declaratively, the only guaranteed way to control the binding is to create the foreign key when only the desired unique index exists and create additional indexes afterward. This isn’t to say you can’t rely on the rules described earlier but you need to be aware that such rules vary depending on the SQL Server version and could change in the future. 

 

RIP OLE DB

I was very surprised when Microsoft announced deprecation of OLE DB provider for SQL Server data access last week on the Data Access Blog and MSDN Forums Announcement. The next release of SQL Server, code-named “Denali”, will be the last to ship a new SQL Server Native Client OLE DB provider. The SQL Server Native Client OLE DB driver will continue to be supported for 7 years after the Denali release so we have plenty of time to plan accordingly.

The other Microsoft-supplied OLE DB driver for SQL Server, SQLOLEDB, has been deprecated for many years now. The deprecated SQLOLEDB driver (and deprecated SQLSRV32.DLL ODBC driver) is part of the older MDAC package and is currently included in Windows operating systems as part of Windows Data Access Components for backwards compatibility. Windows 7 is the last Windows version that will include a SQL Server OLE DB and ODBC driver out of the box. Microsoft recommends that we use the SQL Server Native Client ODBC driver as the SQL Server data access technology of choice from native code going forward.

What This Means to You

Avoid using OLE DB for new SQL Server application development. Update your technology roadmap to move towards migrating existing SQL Server applications that use the SQLNCLI, SQLNCLI10, SQLNCLI11 or SQLOLEDB OLE DB providers to the SQL Server Native Client ODBC driver.

Note that much is still unknown since current versions of SQL Server rely heavily on OLE DB. Although this is purely speculation on my part, it stands to reason that we will see improved ODBC support across all Microsoft products and SQL Server features that currently rely on OLE DB for relational data access.

New SQL Server Development

Use one of the following SQL Server relational database access technologies for new development:

·         Managed code (e.g. C#, VB.NET, managed C++): Use Sysem.Data SqlClient. SqlClient is part of the .NET framework and is the preferred way to access SQL Server from managed code (C#, VB.NET, managed C++). The only reason I can think why not to use SqlClient from managed code is if an application needs to also support other DBMS products using the same interface without coding an additional abstraction layer. In that case accessing different database products Sysem.Data.Odbc is an alternative.

·         Native code (e.g. unmanaged C++): Use ODBC with the Server Native Client driver. The ODBC call-level interface can be used directly or via the higher-level ADO API. The SQL Server Native Client ODBC driver is included with SQL Server and also available as a separate download. 

Migrating Existing Applications

I sometimes see existing managed applications use ADO (e.g. ADODB.Connection) instead of SqlClient. ADO is a COM-based API primarily intended to be used from native code rather than managed code. Typically, these applications were either converted from VB 6 or the developer used ADO instead of ADO.NET due to unfamiliarity with the ADO.NET object model.  This is a good opportunity to convert such code to use System.Data.SqlClient, which will perform better than OLE DB or ODBC from managed code. 

If you have an ADO application where performance is not a concern or the conversion is not worth the effort, an alternative is to simply change the provider to MSDASQL (OLE DB Provider for ODBC Drivers) and add the SQL Server Native Client ODBC driver specification. This can be done with a simple connection string change and the MSDASQL provider will translate the ADO OLE DB calls to ODBC. For example, to use the SQL Server 2008 SNAC ODBC driver:

Old OLE DB connection string: “Provider=SQLNCLI10.1;Data Source=MyServer;Integrated Security=SSPI”

New ODBC connection string: “Provider=MSDASQL;Driver={SQL Server Native Client 10.0};Server=MyServer;Trusted_Connection=Yes”

 

The same connection string change can be used for any ADO application, including ASP classic, legacy VB 6 or unmanaged C++.

Perhaps the biggest challenge will be native code that uses the OLE DB COM interfaces directly instead of going through higher level APIs like ADO. I’ve seen this most commonly done for performance sensitive applications in C++. The best approach here will be to convert the application to use the ODBC call-level interface directly. This will provide the highest SQL Server data access performance from native code. The difficulty of such a change will depend much on the application object model and design. Ideally, data access libraries are shared and abstracted so that low-level data access code changes only need to be made in one place.

Why SQLOLEDB and SQLNCLI Was Deprecated

If you’ve used SQL Server for a long time like me, you’ve seen a number of APIs come and go (http://blogs.msdn.com/b/data/archive/2006/12/05/data-access-api-of-the-day-part-i.aspx). APIs are largely driven by changes in development and platform technologies that change over time. It is possible for Microsoft to support legacy APIs indefinitely but doing so would waste precious development resources on maintenance instead of adding new features that are important to us. COM-based APIs like OLE DB are complex and it just doesn’t make sense to have many APIs that basically do the same thing. 

So we now have the short list of SQL Server relational data access APIs going forward:

·         SqlClient (managed code)

·         JDBC (Java)

·         ODBC (for native code)

Summary

I’m a big fan of open, cross-platform standards so I’m glad that Microsoft chose ODBC over OLE DB for relational database access. ODBC is an implementation of the SQL call-level interface standard (ISO/IEC 9075-3). In contrast, the COM-based OLE DB SQL Server provider relies on proprietary Microsoft Windows COM technology. The SNAC ODBC driver is a truly native driver and provides the fastest SQL Server database access from native code.

 

Denali CTP3: THROW Statement

Not to mince words, T-SQL error handling has historically sucked. I’m excited that SQL Server “Denali” CTP3 (a.k.a. SQL11) includes a long-awaited THROW statement that I hope to see in the final release. In this post, I’ll dive into how this seemingly minor T-SQL enhancement will make it much easier for T-SQL developers to write robust and bug-free error handling code.

T-SQL Error Handling Ugliness

Unlike compiled application code that halts code execution upon an unhandled exception, a T-SQL might continue code execution afterward. T-SQL developers must include error checking/handling is to ensure code doesn’t continue down the “happy” path oblivious to an error, report the error to the caller, perform any necessary cleanup operations (typically ROLLBACK) and continue/halt execution as desired. The script below shows how one might accomplish this without structured error handling:

–Unstructured error handling example

BEGIN TRAN

SELECT 1/0 AS CauseAnError –report error caller

IF @@ERROR<> 0 GOTO ErrorHandler –detect error

COMMIT

GOTO Done
ErrorHandler:

IF @@TRANCOUNT> 0 ROLLBACK–cleanup after error

RETURN –stop further code execution

Done:

PRINT ‘Done’–not executed after error

GO
This script results in the error:

Msg 8134, Level 16, State 1, Line 3

Divide by zero error encountered.
Unstructured error handling like this is especially a pain for multi-statement scripts and stored procedures. One has to include repetitive “IF @@ERROR” check to detect errors after each statement and error-prone unstructured GOTO code. It’s easy to miss error checking/handling bugs in unit testing.

On a positive note, no T-SQL code is necessary to raise the error; SQL Server automatically reports errors to the calling application without any T-SQL code to do so (unless TRY/CATCH is used). This guarantees the calling application is notified of errors during execution.

Two Steps Forward, One Step Back

The introduction of structured error handling (TRY/CATCH) in SQL 2005 is a both a blessing and a curse. The good is that TRY/CATCH avoids the repetitive, error prone and ugly procedural code needed to check @@ERROR after each T-SQL statement and allows one to more easily centralize error handling. The structured error-handling paradigm in T-SQL is more aligned with most application languages.

Consider the equivalent script with TRY/CATCH:

–Structured error handling example

DECLARE

@ErrorNumber int

,@ErrorMessage nvarchar(2048)

,@ErrorSeverity int

,@ErrorState int

,@ErrorLine int;

BEGIN TRY–detect errors

BEGIN TRAN;

SELECT 1/0 AS CauseAnError;

COMMIT;

END TRY

BEGIN CATCH

SELECT

@ErrorNumber =ERROR_NUMBER()

,@ErrorMessage =ERROR_MESSAGE()

,@ErrorSeverity = ERROR_SEVERITY()

,@ErrorState =ERROR_STATE()

,@ErrorLine =ERROR_LINE();

IF @@TRANCOUNT> 0 ROLLBACK; –cleanup after error

RAISERROR(‘Error %d caught at line %d: %s’–report error to caller

,@ErrorSeverity

,@ErrorState

,@ErrorNumber

,@ErrorLine

,@ErrorMessage);

RETURN;–stop further code execution

END CATCH

PRINT ‘Done’; –not executed after error

GO

Msg 50000, Level 16, State 1, Line 21

Error 8134 caught at line 10: Divide by zero error encountered
I really like the way structured error handling catches errors declaratively with centralized error handling. But TRY/CATCH introduces a couple of issues. Foremost is reporting of the error to the caller. A caught error prevents the error message from being returned to the client. When TRY/CATCH is employed, the developer assumes responsibility to notify the application that an error occurred. Failure to do so will result in a silent error undetectable by the calling application, which is seldom desirable. Using TRY/CATCH necessitates that you write a bit of code in the CATCH block to capture, report and/or log error details as well as control code flow after the error.

Another downside of TRY/CATCH before Denali is that you cannot raise the original error because RAISERROR does not allow a system error number to be specified (8134 in this example). Consequently, the divide by zero system error here cannot be raised in the CATCH block; a user-defined error in the 50000+ error number range must be raised instead, obfuscating the original error and line number. So instead of returning error information natively, you must write code to return original error details by some other means, such as in the error message text. This often leads to inconsistencies in the way errors are reported.

THROW to the Rescue

Denali introduces a simple THROW statement. THROW in a CATCH block with no parameters raises the caught error and stops further code execution unless an outer CATCH block exists. This greatly simplifies CATCH block error reporting and control flow code since this THROW behavior is exactly what one typically does after handling a T-SQL error. Furthermore, unlike RAISERROR, THROW retains the original error number, message text, state, severity and line number. This is the biggest T-SQL error handling enhancement since the introduction of TRY/CATCH in SQL Server 2005.

The THROW example below raises the original error and stops further code execution and is less verbose and error-prone than other methods:

–Structured error handling example in Denali CTP3

BEGIN TRY–detect errors

BEGIN TRAN;

SELECT 1/0 AS CauseAnError;

COMMIT;

END TRY

BEGIN CATCH

IF @@TRANCOUNT> 0 ROLLBACK; –cleanup after error

THROW; –report error to caller and stop further code execution

END CATCH

PRINT ‘Done’; –not executed after error

GO

Msg 8134, Level 16, State 1, Line 4

Divide by zero error encountered.
There are only a couple of scenarios I can think of not to use THROW in a CATCH block. One is when you need to continue code execution in the same scope after an error. Another is in an outermost catch block when you want to prevent the error from being returned to the client. However, these cases are the exception (no pun intended) rather than the rule.

Summary

THROW is a simple, yet powerful extension to SQL Server error handling. I’ll discuss some other enhancements to the core database engine as outlined in the What’s New section of the SQL Server “Denali” Books Online in future posts as well.

Internal SQL Server Database Version Numbers

A database created by a more recent version of SQL Server cannot be attached or restored to an earlier version. This restriction is simply because an older version cannot know about file format changes that were introduced in the newer release. 
If you attempt to attach a database to an earlier version, you will get SQL Server error 948 with the internal version numbers listed in the error message text. For example, the following error occurs if you try to attach a SQL Server 2008 R2 database to a SQL Server 2008 server:

The database ‘MyDatabase’ cannot be opened because it is version 665. This server supports version 661 and earlier. A downgrade path is not supported.

Sample text from SQL Server error 948
The cryptic version numbers in the error message refer to the internal database version. These internal version numbers are undocumented but are (at least currently) the same value reported by the DATABASEPROPERTYEX function ‘Version’ property of the source database. If you are unsure of the source database version, the table below maps the internal version numbers to SQL Server versions so you can determine the minimum version you need for the attach to succeed:

SQL Server Version

Internal Database Version

SQL Server 2008 R2

665

SQL Server 2008

661

SQL Server 2005 SP2+ with vardecimal enabled

612

SQL Server 2005

611

SQL Server 2000

539

SQL Server 7

515

SQL Server versions and internal database versions
Below are the allowable SQL Server upgrade paths for a database attach or restore. The internal database version will be as above after a successful attach or restore.

Target SQL Server Version

Source SQL Server Version

Internal Database Version

SQL Server 2008 R2

SQL Server 2008 R2

665

SQL Server 2008

661

SQL Server 2005 with vardecimal enabled

612

SQL Server 2005

611

SQL Server 2000

539

SQL Server 2008

SQL Server 2008

661

SQL Server 2005 with vardecimal enabled

612

SQL Server 2005

611

SQL Server 2000

539

SQL Server 2005 SP2+

SQL Server 2005 with vardecimal enabled

612

SQL Server 2005

611

SQL Server 2000

539

SQL Server 7

515

SQL Server 2005

SQL Server 2005

611

SQL Server 2000

539

SQL Server 7

515

SQL Server 2000

SQL Server 2000

539

SQL Server 7

515
SQL Server 7

SQL Server 7

515

Database File Versions and Upgrade Paths
As I mentioned earlier, downgrades are not supported. You’ll need to copy objects and data from the newer source database to the older target if you need to downgrade; attach or restore is not an option to copy a database to an earlier version.

SQL Server Connection Strings

This is the first of a series of posts on SQL Server connection strings. I don’t think connection strings are all that complicated but I often see developers have problems because they simply cloned an existing connection string (or found one on the internet) and tweaked it for the task at hand without really understanding what the keywords and values mean. This often results in run-time errors that can be tricky to diagnose. 
In this post, I’ll provide a connection string overview and discuss SqlClient connection strings and examples. I’ll discuss OLE DB and ODBC (used via ADO or ADO.NET) and JDBC in more detail the future articles.
Overview
SQL Server can be accessed using several technologies, each of which has different connection string particulars. Connection strings are provider/driver specific so one first needs to decide on a client API before formulating the proper string can be created. 
All connection strings share the same basic format, name/value pairs separated by semicolons, but the actual connection string keywords may vary by provider. Which keywords are required or optional also vary by provider and providers often share the same keywords (or provide synonyms) to minimize the connection string changes when switching between different providers. Most connection string keywords are optional and need to be specified only when the default is not appropriate. Connection string values should be enclosed in single or double quotes when the value may include a semicolon or equal sign (e.g. Password=”a&==b=;1@23″)
The purpose of a connection string is to supply a SQL Server provider/driver with the information needed to establish a connection to a SQL Server instance and may also be used to specify other configuration values, such as whether connection pooling is used. At the end of the day, the provider/driver needs to know at least:
·         SQL Server name (or address)
·         Authentication method (Windows or SQL Server)
·         Login credentials (login and password for SQL Server authentication)
SqlClient
One typically uses the .Net Framework Provider for SQL Server (abbreviated to SqlClient here) in managed code and a SQL Server OLE DB provider or ODBC driver from unmanaged code. It is possible to use OLE DB or ODBC for SQL Server data access in managed code but there is seldom a reason to do so since SqlClient offers high-performance access to SQL Server natively.
The authoritative reference for SqlClient connection strings is http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlconnection.connectionstring.aspx. My goal is not to rehash all of the keywords or illustrate the many combinations here but rather show the ones most commonly used along with best practices. I use the primary keywords rather than synonyms or equivalent keywords in the examples.
The SqlConnectionStringBuilder class provides a programmatic way to build connection strings needed by SqlConnection class. The nice thing about SqlConnectionStringBuilder is that it provides IntelliSense and avoids connection string typos. It should always be used when constructing connection strings based in user input (e.g. user id and password prompt). But you still need to know which connection string properties (keywords) you need to set along with the default values. The examples here apply regardless of whether or not you use yjr SqlConnectionStringBuilder class.
SqlClient Connection String Keyword Examples
Unlike other providers, there is no “Provider” or “Driver” connection string keyword in a SqlClient connection string.  The .Net Framework Provider for SQL Server is implicit with a SqlConnection class so it is redundant to also specify the provider.
I’ll start with the minimal keyword(s) needed. The minimal SqlClient connection string need only specify the authentication method.  The example below specifies Windows authentication using “Integrated Security=SSPI”. This connection string will connect the default instance on the local machine using Windows authentication under the current process Windows security credentials. 

Integrated Security=SSPI
Listing 1: Connect to local default instance using Windows authentication
To connect to the local default instance using SQL authentication, just specify the credentials using the “User ID” and “Password” keywords instead of “Integrated Security=SSPI” keyword. SQL authentication is the default when “Integrated Security” or “Trused_Connection” keyword is not specified. Although I commonly see “Persist Security Info=False” also specified (a best practice from a security perspective), that is the default setting and may be omitted. Be aware that you should encrypt connection strings (or passwords in general) stored in configuration files when using SQL authentication.

User ID=MyLogin;Password=MiP@ssw0rd
Listing 2: Connect to local default instance using SQL authentication
One often connects to a remote SQL Server. Along with the authentication method, add the Data Source keyword to specify the desired SQL Server name or network address.

Data Source=SQLSERVERNAME;Integrated Security=SSPI
Listing 3: Connect to default instance on host SQLSERVERNAME using Windows authentication

Data Source=SQLSERVERNAME;User ID=MyLogin;Password=MiP@ssw0rd
Listing 4: Connect to instance on host SQLSERVERNAME using SQL authentication
Note that these same connection strings may be used to connect locally or remotely. Personally, I recommend always specifying the Data Source even when connecting locally. This makes it easy to move the application to another machine using with the same configuration and helps avoid oversights.
It is usually best to let SqlClient determine the appropriate network library to use rather than an explicit specification. SqlClient will figure out the appropriate network library based on the specified Data Source value. When you connect to a local instance using an unqualified name (or the value “(local)”), Shared Memory is used by default. SqlClient will use TCP/IP if a FQDN (e.g. SQLSERVERNAME.MyDOMAIN.COM) or IP address is specified regardless of whether the instance is local or remote. Since TCP/IP is most commonly used nowadays, I’ll focus on TCP/IP in this article and use a FQDN in the subsequent examples to avoid ambiguity.
It is often desirable to specify the initial database context in the connection sting. If omitted, the default database of the authenticated account is used. This is accomplished using either the “Initial Catalog” or “Database” keyword. I suggest always including the “Initial Catalog” keyword.

Data Source=SQLSERVERNAME.MYDOMAIN.COM;Integrated Security=SSPI;Initial Catalog=MyDatabase
Listing 4: Connect to default instance on host SQLSERVERNAME using Windows authentication with initial database context of MyDatabase
Named Instances
The connection strings I’ve shown so far assume the target is a default SQL Server instance listening on port 1433. One can run multiple instances of SQL Server on the same host using the named instance feature. If your target database instance is a named instance, SqlClient will also need to know the instance name or instance port number. The instance name can be specified by appending a backslash and instance name to the Data Source value:

Data Source=SQLSERVERNAME.MYDOMAIN.COM\MYINSTANCE;Integrated Security=SSPI;Initial Catalog=MyDatabase
Listing 5: Connect to named instance on host SQLSERVERNAME using Windows authentication with initial database context of MyDatabase
As an aside, I often see connectivity problems with named instances due to oversights in the SQL Server configuration. When an instance name is specified, SqlClient interrogates the SQL Server Brower service on the SQL Server host to determine the instance port (or named pipe name). The SQL Server Brower service is disabled by default so you need to enable and start it in order to connect by the instance name. This can be done using the SQL Server Configuration Manager tool. Also, since the SQL Server Brower service communicates over UDP port 1434, that port must be allowed through firewalls.
You can specify a port number instead of instance name to directly to a named instance (or to a default instance listing on a non-standard port). The port may be specified by appending a comma and port number to the data source value. The needed port number can be ascertained from the SQL Server Configuration Manager tool.

Data Source=SQLSERVERNAME.MYDOMAIN.COM,60086;Integrated Security=SSPI;Initial Catalog=MyDatabase
Listing 6: Connect to instance on host SQLSERVERNAME listening on port 60086 using Windows authentication with initial database context of MyDatabase
Additional Keywords

In addition to the “Data Source”, “Initial Catalog” and “Integrated Security” (or “User Id” and “Password”) keywords I’ve discussed so far, I recommend that “Application Name” also be specified. The specified string is helps identify the application when monitoring activity on the database server. This is especially useful when an application server or client hosts multiple applications.

Data Source=SQLSERVERNAME.MYDOMAIN.COM;Integrated Security=SSPI;Initial Catalog=MyDatabase;Application Name=Connection String Example
Listing 7: Connect to default instance on host SQLSERVERNAME using Windows authentication with initial database context of MyDatabase with application name specification
In my opinion, the many other keywords are noise unless the default values are inappropriate for your environment. 
Summary

You can get by nicely in most cases with only the 4 or 5 SqlClient connection string keywords I’ve discussed here. I suggest you establish a connection string standard that includes the “Data Source”, “Initial Catalog”, “Application Name” keywords plus the authentication method, “Integrated Security=SSPI” or “User Id” and “Password”.

Move a Partition to a Different File Group Efficiently

SQL Server table partitioning can reduce storage costs associated with large tables while maintaining performance SLAs.  Table partitioning, available in Enterprise and above SKUs, allows you to keep frequently used current data on fast storage while storing infrequently accessed older data on slower, less expensive storage.  But moving vast amounts of data efficiently as data ages can be a challenge.  This post will discuss alternate techniques to accomplish this task.

Consider the scenario of a table partitioned on a datetime column by month.  Your objective is to keep recent (current and prior month) data on a solid state disk and older data on traditional spinning media.  2 filegroups are used for this table, one with files on a solid state device and the other with files on spinning disks.  The table is partitioned with a RANGE RIGHT partition function (inclusive date boundary) and monthly sliding window maintenance is scheduled to create a partition for the new month and perhaps remove the oldest month.  Every month after the slide, you want to move an older partition (prior month minus 1) from fast to slow storage to make room for new data on the fast file group.

The Simple Method

The easiest way to move a partition from the NewerData file group to the OlderData filegroup is with MERGE and SPLIT.  The example below will move the February partition from the NewerData to the OlderData filegroup:  

Simple maintenance script example:

– Monthly Partition Move Scipt

– merge month to be moved into prior month partition

ALTER PARTITION FUNCTION PF_Last12Months()

MERGE RANGE (’20110201′);

 

– set partition scheme next used to the OlderData filegroup

ALTER PARTITION SCHEME PS_Last12Months

NEXT USED OlderData;

 

– move data from NewData to OlderData filegroup

ALTER PARTITION FUNCTION PF_Last12Months()

SPLIT RANGE (’20110201′);

 

The figures below show the partitions before and after this script was run against a 10M row test table (setup script with complete DDL and sample data at the end of this post).  Although this method is quite easy, it can take quite a bit of time with large partitions.  This MERGE command will merge February data into the January partition on the OlderData filegroup, requiring all of February’s data to be moved in the process, and then remove the February partition.  The SPLIT will then create a new February partition on the OlderData filegroup, move February data to the new partition and finally remove the February data from the source partition.  So February data is actually moved twice, once by the MERGE and again by the SPLIT. 

This MERGE/SPLIT process took 52 seconds on my test system with a cold buffer cache but I was only moving 738,780 rows.  Think about the performance impact of this method against a much larger production table partition.  The atomic MERGE and SPLIT are offline operations so the entire table is unavailable while those statements are running.  Also, these operations are resource intensive when a lot of data needs to be moved and/or you have many indexes.

Before maintenance:

Rows

Partition Number

Filegroup

Lower Boundary

Upper Boundary

0

1

PartitioningDemo_OlderData

 

4/1/2010 12:00:00 AM

791,549

2

PartitioningDemo_OlderData

4/1/2010 12:00:00 AM

5/1/2010 12:00:00 AM

817,935

3

PartitioningDemo_OlderData

5/1/2010 12:00:00 AM

6/1/2010 12:00:00 AM

791,550

4

PartitioningDemo_OlderData

6/1/2010 12:00:00 AM

7/1/2010 12:00:00 AM

817,935

5

PartitioningDemo_OlderData

7/1/2010 12:00:00 AM

8/1/2010 12:00:00 AM

817,935

6

PartitioningDemo_OlderData

8/1/2010 12:00:00 AM

9/1/2010 12:00:00 AM

791,550

7

PartitioningDemo_OlderData

9/1/2010 12:00:00 AM

10/1/2010 12:00:00 AM

817,935

8

PartitioningDemo_OlderData

10/1/2010 12:00:00 AM

11/1/2010 12:00:00 AM

791,550

9

PartitioningDemo_OlderData

11/1/2010 12:00:00 AM

12/1/2010 12:00:00 AM

817,935

10

PartitioningDemo_OlderData

12/1/2010 12:00:00 AM

1/1/2011 12:00:00 AM

817,935

11

PartitioningDemo_OlderData

1/1/2011 12:00:00 AM

2/1/2011 12:00:00 AM

738,780

12

PartitioningDemo_NewerData

2/1/2011 12:00:00 AM

3/1/2011 12:00:00 AM

817,935

13

PartitioningDemo_NewerData

3/1/2011 12:00:00 AM

4/1/2011 12:00:00 AM

369,476

14

PartitioningDemo_NewerData

4/1/2011 12:00:00 AM

5/1/2011 12:00:00 AM

0

15

PartitioningDemo_NewerData

5/1/2011 12:00:00 AM

 

 

After maintenance:

Rows

Partition Number

Filegroup

Lower Boundary

Upper Boundary

0

1

PartitioningDemo_OlderData

 

4/1/2010 12:00:00 AM

791,549

2

PartitioningDemo_OlderData

4/1/2010 12:00:00 AM

5/1/2010 12:00:00 AM

817,935

3

PartitioningDemo_OlderData

5/1/2010 12:00:00 AM

6/1/2010 12:00:00 AM

791,550

4

PartitioningDemo_OlderData

6/1/2010 12:00:00 AM

7/1/2010 12:00:00 AM

817,935

5

PartitioningDemo_OlderData

7/1/2010 12:00:00 AM

8/1/2010 12:00:00 AM

817,935

6

PartitioningDemo_OlderData

8/1/2010 12:00:00 AM

9/1/2010 12:00:00 AM

791,550

7

PartitioningDemo_OlderData

9/1/2010 12:00:00 AM

10/1/2010 12:00:00 AM

817,935

8

PartitioningDemo_OlderData

10/1/2010 12:00:00 AM

11/1/2010 12:00:00 AM

791,550

9

PartitioningDemo_OlderData

11/1/2010 12:00:00 AM

12/1/2010 12:00:00 AM

817,935

10

PartitioningDemo_OlderData

12/1/2010 12:00:00 AM

1/1/2011 12:00:00 AM

817,935

11

PartitioningDemo_OlderData

1/1/2011 12:00:00 AM

2/1/2011 12:00:00 AM

738,780

12

PartitioningDemo_OlderData

2/1/2011 12:00:00 AM

3/1/2011 12:00:00 AM

817,935

13

PartitioningDemo_NewerData

3/1/2011 12:00:00 AM

4/1/2011 12:00:00 AM

369,476

14

PartitioningDemo_NewerData

4/1/2011 12:00:00 AM

5/1/2011 12:00:00 AM

0

15

PartitioningDemo_NewerData

5/1/2011 12:00:00 AM

 

 

SWITCH and DROP_EXISTING Method

An alternative to the method above is to employ SWITCH along with the DROP EXISTING option of CREATE INDEX.  As you may know, SWITCH of an aligned partition is a metadata-only operation and is very fast because no physical data movement is required.  Furthermore, CREATE INDEX…WITH DROP_EXISTING = ON avoids sorting when the existing table index is already suitably sorted and is especially appropriate for improving performance of large index rebuilds.  Using these commands, instead of relying on SPLIT and MERGE to move data, will greatly reduce the time needed to move a partition from one filegroup to another.  The maintenance script below reduced the time of the partition move from 52 seconds down to 7 seconds, reducing maintenance time by over 85% compared to the MERGE/SPLIT script above.  

Demo Maintenance Script

– Monthly Partition Move Scipt

DECLARE @MonthToMove datetime = ’20110201′;

 

– create staging table on NewerData filegroup with aligned indexes

IF OBJECT_ID(N’dbo.PartitionMoveDemoStaging’) IS NOT NULL

      DROP TABLE dbo.PartitionMoveDemoStaging;

CREATE TABLE dbo.PartitionMoveDemoStaging(

      PartitioningDateTimeColumn datetime NOT NULL

      ,Column1 bigint NOT NULL

) ON PartitioningDemo_NewerData;

 

CREATE CLUSTERED INDEX cdx_PartitionMoveDemoStaging_PartitioningColumn

      ON dbo.PartitionMoveDemoStaging(PartitioningDateTimeColumn)

      ON PartitioningDemo_NewerData;     

 

CREATE NONCLUSTERED INDEX idx_PartitionMoveDemoStaging_Column1

      ON dbo.PartitionMoveDemoStaging(Column1)

      ON PartitioningDemo_NewerData;     

 

– switch partition into staging table

ALTER TABLE dbo.PartitionMoveDemo

      SWITCH PARTITION $PARTITION.PF_Last12Months(@MonthToMove)

      TO dbo.PartitionMoveDemoStaging;

 

– remove partition

ALTER PARTITION FUNCTION PF_Last12Months()

      MERGE RANGE (@MonthToMove);

     

– set next used to OlderData filegroup

ALTER PARTITION SCHEME PS_Last12Months

      NEXT USED PartitioningDemo_OlderData;

 

– recreate partition on OlderData filegroup

ALTER PARTITION FUNCTION PF_Last12Months()

      SPLIT RANGE (@MonthToMove);

     

– recreate staging table indexes using the partition scheme

– this will move the staging table to OlderData filegroup with aligned indexes

CREATE CLUSTERED INDEX cdx_PartitionMoveDemoStaging_PartitioningColumn

      ON dbo.PartitionMoveDemoStaging(PartitioningDateTimeColumn)

      WITH (DROP_EXISTING = ON)

      ON PS_Last12Months(PartitioningDateTimeColumn);

     

CREATE NONCLUSTERED INDEX idx_PartitionMoveDemoStaging_Column1

      ON dbo.PartitionMoveDemoStaging(Column1)

      WITH (DROP_EXISTING = ON)

      ON PS_Last12Months(PartitioningDateTimeColumn);

 

– switch staging table back into primary table partition

ALTER TABLE dbo.PartitionMoveDemoStaging

      SWITCH PARTITION $PARTITION.PF_Last12Months(@MonthToMove)

      TO dbo.PartitionMoveDemo PARTITION $PARTITION.PF_Last12Months(@MonthToMove);

 

The maintenance steps here are similar to the first method except that the partition is SWITCHed into a staging table before the MERGE and SPLIT.  This way, no data movement is needed during the MERGE or SPLIT.  After the MERGE and SPLIT, staging table indexes are recreated using the same partition scheme as the primary table.  This will move the staging table from the NewerData to the OlderData filegroup and ensure staging table indexes are aligned for the SWITCH.  The DROP_EXISTING = ON option allows the CREATE INDEX to leverage the existing staging table index sequence, thus eliminating the need to sort the index keys.  Finally, the staging table is SWITCHed back into the moved partition.

I hope you find this method useful.  Below is the script I used to create the demo database and objects. 

Demo Setup Script

–create database with monthly filegroups

CREATE DATABASE PartitioningDemo

ON(

      NAME=‘Primary’,

      FILENAME=‘S:\SolidState\PartitioningDemo.mdf’,

      SIZE=10MB),

FILEGROUP NewerData (

      NAME=‘PartitioningDemo_NewerData’,

      FILENAME=‘S:\SolidState\PartitioningDemo_NewerData.ndf’,

      SIZE=400MB,

      FILEGROWTH=10MB),

FILEGROUP OlderData (

      NAME=‘PartitioningDemo_OlderData’,

      FILENAME=‘D:\SpinningDisks\PartitioningDemo_OlderData.ndf’,

      SIZE=600MB,

      FILEGROWTH=10MB)

LOG ON(

      NAME=‘PartitioningDemo_Log’,

      FILENAME=‘L:\LogFiles\PartitioningDemo_Log.ldf’,

      SIZE=10MB,

      FILEGROWTH=10MB);

     

ALTER DATABASE PartitioningDemo

      SET RECOVERY SIMPLE;

GO

 

USE PartitioningDemo;

 

CREATE PARTITION FUNCTION PF_Last12Months( datetime )

AS RANGE RIGHT

FOR VALUES

(               – older_than_current_minus_12

      ’20100401′  – current_minus_12

      ,’20100501′ – current_minus_11

      ,’20100601′ – current_minus_10

      ,’20100701′ – current_minus_9

      ,’20100801′ – current_minus_8

      ,’20100901′ – current_minus_7

      ,’20101001′ – current_minus_6

      ,’20101101′ – current_minus_5

      ,’20101201′ – current_minus_4

      ,’20110101′ – current_minus_3

      ,’20110201′ – current_minus_2

      ,’20110301′ – current_minus_1

      ,’20110401′ – current

      ,’20110501′ – future

);

 

CREATE PARTITION SCHEME PS_Last12Months

AS PARTITION PF_Last12Months

TO

      (

      OlderData,

      OlderData,

      OlderData,

      OlderData,

      OlderData,

      OlderData,

      OlderData,

      OlderData,

      OlderData,

      OlderData,

      OlderData,

      NewerData, – minus 2 month (to be moved to OlderData)

      NewerData, – minus 1 month

      NewerData, – current month

      NewerData  – future month+

      );

 

– create table with 10,000,000 rows

ALTER DATABASE PartitioningDemo

      MODIFY FILEGROUP NewerData DEFAULT;

 

WITH

      t1 AS (SELECT 0 AS n UNION ALL SELECT 1 UNION ALL SELECT 2

                            UNION ALL SELECT 3 UNION ALL SELECT 4

                            UNION ALL SELECT 5 UNION ALL SELECT 6

                              UNION ALL SELECT 7 UNION ALL SELECT 8

                              UNION ALL SELECT 9),

      t2 AS (SELECT a.n

                    FROM t1 a, t1 b, t1 c, t1 d, t1 e, t1 f, t1 g)

SELECT

      ISNULL(

            DATEADD(

                  day

                  , (ROW_NUMBER() OVER(ORDER BY t2.n))/26385, ’20100401′)

                  , ’20100401′) AS PartitioningDateTimeColumn

      ,ISNULL((ROW_NUMBER() OVER(ORDER BY t2.n)), 0) AS Column1

INTO dbo.PartitionMoveDemo

FROM t2;

 

– create indexes partitioned indexes on table

CREATE CLUSTERED INDEX cdx_PartitionMoveDemo_PartitioningColumn

      ON dbo.PartitionMoveDemo(PartitioningDateTimeColumn)

      ON PS_Last12Months(PartitioningDateTimeColumn);

     

CREATE NONCLUSTERED INDEX idx_PartitionMoveDemo_Column1

      ON dbo.PartitionMoveDemo(Column1)

      ON PS_Last12Months(PartitioningDateTimeColumn);

GO