Denali CTP3: THROW Statement

Not to mince words, T-SQL error handling has historically sucked. I’m excited that SQL Server “Denali” CTP3 (a.k.a. SQL11) includes a long-awaited THROW statement that I hope to see in the final release. In this post, I’ll dive into how this seemingly minor T-SQL enhancement will make it much easier for T-SQL developers to write robust and bug-free error handling code.

T-SQL Error Handling Ugliness
Unlike compiled application code that halts code execution upon an unhandled exception, a T-SQL might continue code execution afterward. T-SQL developers must include error checking/handling is to ensure code doesn’t continue down the “happy” path oblivious to an error, report the error to the caller, perform any necessary cleanup operations (typically ROLLBACK) and continue/halt execution as desired. The script below shows how one might accomplish this without structured error handling:
–Unstructured error handling example
BEGIN TRAN
SELECT 1/0 AS CauseAnError –report error caller
IF @@ERROR<> 0 GOTO ErrorHandler –detect error
COMMIT
GOTO Done
ErrorHandler:
IF @@TRANCOUNT> 0 ROLLBACK–cleanup after error
RETURN –stop further code execution
Done:
PRINT ‘Done’–not executed after error
GO
This script results in the error:
Msg 8134, Level 16, State 1, Line 3
Divide by zero error encountered.
Unstructured error handling like this is especially a pain for multi-statement scripts and stored procedures. One has to include repetitive “IF @@ERROR” check to detect errors after each statement and error-prone unstructured GOTO code. It’s easy to miss error checking/handling bugs in unit testing.
On a positive note, no T-SQL code is necessary to raise the error; SQL Server automatically reports errors to the calling application without any T-SQL code to do so (unless TRY/CATCH is used). This guarantees the calling application is notified of errors during execution.
Two Steps Forward, One Step Back
The introduction of structured error handling (TRY/CATCH) in SQL 2005 is a both a blessing and a curse. The good is that TRY/CATCH avoids the repetitive, error prone and ugly procedural code needed to check @@ERROR after each T-SQL statement and allows one to more easily centralize error handling. The structured error-handling paradigm in T-SQL is more aligned with most application languages.
Consider the equivalent script with TRY/CATCH:
–Structured error handling example
DECLARE
@ErrorNumber int
,@ErrorMessage nvarchar(2048)
,@ErrorSeverity int
,@ErrorState int
,@ErrorLine int;
BEGIN TRY–detect errors
BEGIN TRAN;
SELECT 1/0 AS CauseAnError;
COMMIT;
END TRY
BEGIN CATCH
SELECT
@ErrorNumber =ERROR_NUMBER()
,@ErrorMessage =ERROR_MESSAGE()
,@ErrorSeverity = ERROR_SEVERITY()
,@ErrorState =ERROR_STATE()
,@ErrorLine =ERROR_LINE();
IF @@TRANCOUNT> 0 ROLLBACK; –cleanup after error
RAISERROR(‘Error %d caught at line %d: %s’–report error to caller
,@ErrorSeverity
,@ErrorState
,@ErrorNumber
,@ErrorLine
,@ErrorMessage);
RETURN;–stop further code execution
END CATCH
PRINT ‘Done’; –not executed after error
GO
Msg 50000, Level 16, State 1, Line 21
Error 8134 caught at line 10: Divide by zero error encountered
I really like the way structured error handling catches errors declaratively with centralized error handling. But TRY/CATCH introduces a couple of issues. Foremost is reporting of the error to the caller. A caught error prevents the error message from being returned to the client. When TRY/CATCH is employed, the developer assumes responsibility to notify the application that an error occurred. Failure to do so will result in a silent error undetectable by the calling application, which is seldom desirable. Using TRY/CATCH necessitates that you write a bit of code in the CATCH block to capture, report and/or log error details as well as control code flow after the error.
Another downside of TRY/CATCH before Denali is that you cannot raise the original error because RAISERROR does not allow a system error number to be specified (8134 in this example). Consequently, the divide by zero system error here cannot be raised in the CATCH block; a user-defined error in the 50000+ error number range must be raised instead, obfuscating the original error and line number. So instead of returning error information natively, you must write code to return original error details by some other means, such as in the error message text. This often leads to inconsistencies in the way errors are reported.
THROW to the Rescue
Denali introduces a simple THROW statement. THROW in a CATCH block with no parameters raises the caught error and stops further code execution unless an outer CATCH block exists. This greatly simplifies CATCH block error reporting and control flow code since this THROW behavior is exactly what one typically does after handling a T-SQL error. Furthermore, unlike RAISERROR, THROW retains the original error number, message text, state, severity and line number. This is the biggest T-SQL error handling enhancement since the introduction of TRY/CATCH in SQL Server 2005.
The THROW example below raises the original error and stops further code execution and is less verbose and error-prone than other methods:
–Structured error handling example in Denali CTP3
BEGIN TRY–detect errors
BEGIN TRAN;
SELECT 1/0 AS CauseAnError;
COMMIT;
END TRY
BEGIN CATCH
IF @@TRANCOUNT> 0 ROLLBACK; –cleanup after error
THROW; –report error to caller and stop further code execution
END CATCH
PRINT ‘Done’; –not executed after error
GO
Msg 8134, Level 16, State 1, Line 4
Divide by zero error encountered.
There are only a couple of scenarios I can think of not to use THROW in a CATCH block. One is when you need to continue code execution in the same scope after an error. Another is in an outermost catch block when you want to prevent the error from being returned to the client. However, these cases are the exception (no pun intended) rather than the rule.
Summary
THROW is a simple, yet powerful extension to SQL Server error handling. I’ll discuss some other enhancements to the core database engine as outlined in the What’s New section of the SQL Server “Denali” Books Online in future posts as well.

SQL Server Connection Strings

This is the first of a series of posts on SQL Server connection strings. I don’t think connection strings are all that complicated but I often see developers have problems because they simply cloned an existing connection string (or found one on the internet) and tweaked it for the task at hand without really understanding what the keywords and values mean. This often results in run-time errors that can be tricky to diagnose. 
In this post, I’ll provide a connection string overview and discuss SqlClient connection strings and examples. I’ll discuss OLE DB and ODBC (used via ADO or ADO.NET) and JDBC in more detail the future articles.
Overview
SQL Server can be accessed using several technologies, each of which has different connection string particulars. Connection strings are provider/driver specific so one first needs to decide on a client API before formulating the proper string can be created. 
All connection strings share the same basic format, name/value pairs separated by semicolons, but the actual connection string keywords may vary by provider. Which keywords are required or optional also vary by provider and providers often share the same keywords (or provide synonyms) to minimize the connection string changes when switching between different providers. Most connection string keywords are optional and need to be specified only when the default is not appropriate. Connection string values should be enclosed in single or double quotes when the value may include a semicolon or equal sign (e.g. Password=”a&==b=;1@23″)
The purpose of a connection string is to supply a SQL Server provider/driver with the information needed to establish a connection to a SQL Server instance and may also be used to specify other configuration values, such as whether connection pooling is used. At the end of the day, the provider/driver needs to know at least:
·         SQL Server name (or address)
·         Authentication method (Windows or SQL Server)
·         Login credentials (login and password for SQL Server authentication)
SqlClient
One typically uses the .Net Framework Provider for SQL Server (abbreviated to SqlClient here) in managed code and a SQL Server OLE DB provider or ODBC driver from unmanaged code. It is possible to use OLE DB or ODBC for SQL Server data access in managed code but there is seldom a reason to do so since SqlClient offers high-performance access to SQL Server natively.
The authoritative reference for SqlClient connection strings is http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlconnection.connectionstring.aspx. My goal is not to rehash all of the keywords or illustrate the many combinations here but rather show the ones most commonly used along with best practices. I use the primary keywords rather than synonyms or equivalent keywords in the examples.
The SqlConnectionStringBuilder class provides a programmatic way to build connection strings needed by SqlConnection class. The nice thing about SqlConnectionStringBuilder is that it provides IntelliSense and avoids connection string typos. It should always be used when constructing connection strings based in user input (e.g. user id and password prompt). But you still need to know which connection string properties (keywords) you need to set along with the default values. The examples here apply regardless of whether or not you use yjr SqlConnectionStringBuilder class.
SqlClient Connection String Keyword Examples
Unlike other providers, there is no “Provider” or “Driver” connection string keyword in a SqlClient connection string.  The .Net Framework Provider for SQL Server is implicit with a SqlConnection class so it is redundant to also specify the provider.
I’ll start with the minimal keyword(s) needed. The minimal SqlClient connection string need only specify the authentication method.  The example below specifies Windows authentication using “Integrated Security=SSPI”. This connection string will connect the default instance on the local machine using Windows authentication under the current process Windows security credentials. 
Integrated Security=SSPI
Listing 1: Connect to local default instance using Windows authentication
To connect to the local default instance using SQL authentication, just specify the credentials using the “User ID” and “Password” keywords instead of “Integrated Security=SSPI” keyword. SQL authentication is the default when “Integrated Security” or “Trused_Connection” keyword is not specified. Although I commonly see “Persist Security Info=False” also specified (a best practice from a security perspective), that is the default setting and may be omitted. Be aware that you should encrypt connection strings (or passwords in general) stored in configuration files when using SQL authentication.
User ID=MyLogin;Password=MiP@ssw0rd
Listing 2: Connect to local default instance using SQL authentication
One often connects to a remote SQL Server. Along with the authentication method, add the Data Source keyword to specify the desired SQL Server name or network address.
Data Source=SQLSERVERNAME;Integrated Security=SSPI
Listing 3: Connect to default instance on host SQLSERVERNAME using Windows authentication
Data Source=SQLSERVERNAME;User ID=MyLogin;Password=MiP@ssw0rd
Listing 4: Connect to instance on host SQLSERVERNAME using SQL authentication
Note that these same connection strings may be used to connect locally or remotely. Personally, I recommend always specifying the Data Source even when connecting locally. This makes it easy to move the application to another machine using with the same configuration and helps avoid oversights.
It is usually best to let SqlClient determine the appropriate network library to use rather than an explicit specification. SqlClient will figure out the appropriate network library based on the specified Data Source value. When you connect to a local instance using an unqualified name (or the value “(local)”), Shared Memory is used by default. SqlClient will use TCP/IP if a FQDN (e.g. SQLSERVERNAME.MyDOMAIN.COM) or IP address is specified regardless of whether the instance is local or remote. Since TCP/IP is most commonly used nowadays, I’ll focus on TCP/IP in this article and use a FQDN in the subsequent examples to avoid ambiguity.
It is often desirable to specify the initial database context in the connection sting. If omitted, the default database of the authenticated account is used. This is accomplished using either the “Initial Catalog” or “Database” keyword. I suggest always including the “Initial Catalog” keyword.
Data Source=SQLSERVERNAME.MYDOMAIN.COM;Integrated Security=SSPI;Initial Catalog=MyDatabase
Listing 4: Connect to default instance on host SQLSERVERNAME using Windows authentication with initial database context of MyDatabase
Named Instances
The connection strings I’ve shown so far assume the target is a default SQL Server instance listening on port 1433. One can run multiple instances of SQL Server on the same host using the named instance feature. If your target database instance is a named instance, SqlClient will also need to know the instance name or instance port number. The instance name can be specified by appending a backslash and instance name to the Data Source value:
Data Source=SQLSERVERNAME.MYDOMAIN.COM\MYINSTANCE;Integrated Security=SSPI;Initial Catalog=MyDatabase
Listing 5: Connect to named instance on host SQLSERVERNAME using Windows authentication with initial database context of MyDatabase
As an aside, I often see connectivity problems with named instances due to oversights in the SQL Server configuration. When an instance name is specified, SqlClient interrogates the SQL Server Brower service on the SQL Server host to determine the instance port (or named pipe name). The SQL Server Brower service is disabled by default so you need to enable and start it in order to connect by the instance name. This can be done using the SQL Server Configuration Manager tool. Also, since the SQL Server Brower service communicates over UDP port 1434, that port must be allowed through firewalls.
You can specify a port number instead of instance name to directly to a named instance (or to a default instance listing on a non-standard port). The port may be specified by appending a comma and port number to the data source value. The needed port number can be ascertained from the SQL Server Configuration Manager tool.
Data Source=SQLSERVERNAME.MYDOMAIN.COM,60086;Integrated Security=SSPI;Initial Catalog=MyDatabase
Listing 6: Connect to instance on host SQLSERVERNAME listening on port 60086 using Windows authentication with initial database context of MyDatabase
Additional Keywords
In addition to the “Data Source”, “Initial Catalog” and “Integrated Security” (or “User Id” and “Password”) keywords I’ve discussed so far, I recommend that “Application Name” also be specified. The specified string is helps identify the application when monitoring activity on the database server. This is especially useful when an application server or client hosts multiple applications.
Data Source=SQLSERVERNAME.MYDOMAIN.COM;Integrated Security=SSPI;Initial Catalog=MyDatabase;Application Name=Connection String Example
Listing 7: Connect to default instance on host SQLSERVERNAME using Windows authentication with initial database context of MyDatabase with application name specification
In my opinion, the many other keywords are noise unless the default values are inappropriate for your environment. 
Summary

You can get by nicely in most cases with only the 4 or 5 SqlClient connection string keywords I’ve discussed here. I suggest you establish a connection string standard that includes the “Data Source”, “Initial Catalog”, “Application Name” keywords plus the authentication method, “Integrated Security=SSPI” or “User Id” and “Password”.

Move a Partition to a Different File Group Efficiently

SQL Server table partitioning can reduce storage costs associated with large tables while maintaining performance SLAs.  Table partitioning, available in Enterprise and above SKUs, allows you to keep frequently used current data on fast storage while storing infrequently accessed older data on slower, less expensive storage.  But moving vast amounts of data efficiently as data ages can be a challenge.  This post will discuss alternate techniques to accomplish this task.

Consider the scenario of a table partitioned on a datetime column by month.  Your objective is to keep recent (current and prior month) data on a solid state disk and older data on traditional spinning media.  2 filegroups are used for this table, one with files on a solid state device and the other with files on spinning disks.  The table is partitioned with a RANGE RIGHT partition function (inclusive date boundary) and monthly sliding window maintenance is scheduled to create a partition for the new month and perhaps remove the oldest month.  Every month after the slide, you want to move an older partition (prior month minus 1) from fast to slow storage to make room for new data on the fast file group.

The Simple Method

The easiest way to move a partition from the NewerData file group to the OlderData filegroup is with MERGE and SPLIT.  The example below will move the February partition from the NewerData to the OlderData filegroup:  

Simple maintenance script example:

— Monthly Partition Move Scipt

— merge month to be moved into prior month partition

ALTER PARTITION FUNCTION PF_Last12Months()

MERGE RANGE (‘20110201’);

 

— set partition scheme next used to the OlderData filegroup

ALTER PARTITION SCHEME PS_Last12Months

NEXT USED OlderData;

 

— move data from NewData to OlderData filegroup

ALTER PARTITION FUNCTION PF_Last12Months()

SPLIT RANGE (‘20110201’);

 

The figures below show the partitions before and after this script was run against a 10M row test table (setup script with complete DDL and sample data at the end of this post).  Although this method is quite easy, it can take quite a bit of time with large partitions.  This MERGE command will merge February data into the January partition on the OlderData filegroup, requiring all of February’s data to be moved in the process, and then remove the February partition.  The SPLIT will then create a new February partition on the OlderData filegroup, move February data to the new partition and finally remove the February data from the source partition.  So February data is actually moved twice, once by the MERGE and again by the SPLIT. 

This MERGE/SPLIT process took 52 seconds on my test system with a cold buffer cache but I was only moving 738,780 rows.  Think about the performance impact of this method against a much larger production table partition.  The atomic MERGE and SPLIT are offline operations so the entire table is unavailable while those statements are running.  Also, these operations are resource intensive when a lot of data needs to be moved and/or you have many indexes.

Before maintenance:

Rows

Partition Number

Filegroup

Lower Boundary

Upper Boundary

0

1

PartitioningDemo_OlderData

 

4/1/2010 12:00:00 AM

791,549

2

PartitioningDemo_OlderData

4/1/2010 12:00:00 AM

5/1/2010 12:00:00 AM

817,935

3

PartitioningDemo_OlderData

5/1/2010 12:00:00 AM

6/1/2010 12:00:00 AM

791,550

4

PartitioningDemo_OlderData

6/1/2010 12:00:00 AM

7/1/2010 12:00:00 AM

817,935

5

PartitioningDemo_OlderData

7/1/2010 12:00:00 AM

8/1/2010 12:00:00 AM

817,935

6

PartitioningDemo_OlderData

8/1/2010 12:00:00 AM

9/1/2010 12:00:00 AM

791,550

7

PartitioningDemo_OlderData

9/1/2010 12:00:00 AM

10/1/2010 12:00:00 AM

817,935

8

PartitioningDemo_OlderData

10/1/2010 12:00:00 AM

11/1/2010 12:00:00 AM

791,550

9

PartitioningDemo_OlderData

11/1/2010 12:00:00 AM

12/1/2010 12:00:00 AM

817,935

10

PartitioningDemo_OlderData

12/1/2010 12:00:00 AM

1/1/2011 12:00:00 AM

817,935

11

PartitioningDemo_OlderData

1/1/2011 12:00:00 AM

2/1/2011 12:00:00 AM

738,780

12

PartitioningDemo_NewerData

2/1/2011 12:00:00 AM

3/1/2011 12:00:00 AM

817,935

13

PartitioningDemo_NewerData

3/1/2011 12:00:00 AM

4/1/2011 12:00:00 AM

369,476

14

PartitioningDemo_NewerData

4/1/2011 12:00:00 AM

5/1/2011 12:00:00 AM

0

15

PartitioningDemo_NewerData

5/1/2011 12:00:00 AM

 

 

After maintenance:

Rows

Partition Number

Filegroup

Lower Boundary

Upper Boundary

0

1

PartitioningDemo_OlderData

 

4/1/2010 12:00:00 AM

791,549

2

PartitioningDemo_OlderData

4/1/2010 12:00:00 AM

5/1/2010 12:00:00 AM

817,935

3

PartitioningDemo_OlderData

5/1/2010 12:00:00 AM

6/1/2010 12:00:00 AM

791,550

4

PartitioningDemo_OlderData

6/1/2010 12:00:00 AM

7/1/2010 12:00:00 AM

817,935

5

PartitioningDemo_OlderData

7/1/2010 12:00:00 AM

8/1/2010 12:00:00 AM

817,935

6

PartitioningDemo_OlderData

8/1/2010 12:00:00 AM

9/1/2010 12:00:00 AM

791,550

7

PartitioningDemo_OlderData

9/1/2010 12:00:00 AM

10/1/2010 12:00:00 AM

817,935

8

PartitioningDemo_OlderData

10/1/2010 12:00:00 AM

11/1/2010 12:00:00 AM

791,550

9

PartitioningDemo_OlderData

11/1/2010 12:00:00 AM

12/1/2010 12:00:00 AM

817,935

10

PartitioningDemo_OlderData

12/1/2010 12:00:00 AM

1/1/2011 12:00:00 AM

817,935

11

PartitioningDemo_OlderData

1/1/2011 12:00:00 AM

2/1/2011 12:00:00 AM

738,780

12

PartitioningDemo_OlderData

2/1/2011 12:00:00 AM

3/1/2011 12:00:00 AM

817,935

13

PartitioningDemo_NewerData

3/1/2011 12:00:00 AM

4/1/2011 12:00:00 AM

369,476

14

PartitioningDemo_NewerData

4/1/2011 12:00:00 AM

5/1/2011 12:00:00 AM

0

15

PartitioningDemo_NewerData

5/1/2011 12:00:00 AM

 

 

SWITCH and DROP_EXISTING Method

An alternative to the method above is to employ SWITCH along with the DROP EXISTING option of CREATE INDEX.  As you may know, SWITCH of an aligned partition is a metadata-only operation and is very fast because no physical data movement is required.  Furthermore, CREATE INDEX…WITH DROP_EXISTING = ON avoids sorting when the existing table index is already suitably sorted and is especially appropriate for improving performance of large index rebuilds.  Using these commands, instead of relying on SPLIT and MERGE to move data, will greatly reduce the time needed to move a partition from one filegroup to another.  The maintenance script below reduced the time of the partition move from 52 seconds down to 7 seconds, reducing maintenance time by over 85% compared to the MERGE/SPLIT script above.  

Demo Maintenance Script

— Monthly Partition Move Scipt

DECLARE @MonthToMove datetime = ‘20110201’;

 

— create staging table on NewerData filegroup with aligned indexes

IF OBJECT_ID(N’dbo.PartitionMoveDemoStaging’) IS NOT NULL

      DROP TABLE dbo.PartitionMoveDemoStaging;

CREATE TABLE dbo.PartitionMoveDemoStaging(

      PartitioningDateTimeColumn datetime NOT NULL

      ,Column1 bigint NOT NULL

) ON PartitioningDemo_NewerData;

 

CREATE CLUSTERED INDEX cdx_PartitionMoveDemoStaging_PartitioningColumn

      ON dbo.PartitionMoveDemoStaging(PartitioningDateTimeColumn)

      ON PartitioningDemo_NewerData;     

 

CREATE NONCLUSTERED INDEX idx_PartitionMoveDemoStaging_Column1

      ON dbo.PartitionMoveDemoStaging(Column1)

      ON PartitioningDemo_NewerData;     

 

— switch partition into staging table

ALTER TABLE dbo.PartitionMoveDemo

      SWITCH PARTITION $PARTITION.PF_Last12Months(@MonthToMove)

      TO dbo.PartitionMoveDemoStaging;

 

— remove partition

ALTER PARTITION FUNCTION PF_Last12Months()

      MERGE RANGE (@MonthToMove);

     

— set next used to OlderData filegroup

ALTER PARTITION SCHEME PS_Last12Months

      NEXT USED PartitioningDemo_OlderData;

 

— recreate partition on OlderData filegroup

ALTER PARTITION FUNCTION PF_Last12Months()

      SPLIT RANGE (@MonthToMove);

     

— recreate staging table indexes using the partition scheme

— this will move the staging table to OlderData filegroup with aligned indexes

CREATE CLUSTERED INDEX cdx_PartitionMoveDemoStaging_PartitioningColumn

      ON dbo.PartitionMoveDemoStaging(PartitioningDateTimeColumn)

      WITH (DROP_EXISTING = ON)

      ON PS_Last12Months(PartitioningDateTimeColumn);

     

CREATE NONCLUSTERED INDEX idx_PartitionMoveDemoStaging_Column1

      ON dbo.PartitionMoveDemoStaging(Column1)

      WITH (DROP_EXISTING = ON)

      ON PS_Last12Months(PartitioningDateTimeColumn);

 

— switch staging table back into primary table partition

ALTER TABLE dbo.PartitionMoveDemoStaging

      SWITCH PARTITION $PARTITION.PF_Last12Months(@MonthToMove)

      TO dbo.PartitionMoveDemo PARTITION $PARTITION.PF_Last12Months(@MonthToMove);

 

The maintenance steps here are similar to the first method except that the partition is SWITCHed into a staging table before the MERGE and SPLIT.  This way, no data movement is needed during the MERGE or SPLIT.  After the MERGE and SPLIT, staging table indexes are recreated using the same partition scheme as the primary table.  This will move the staging table from the NewerData to the OlderData filegroup and ensure staging table indexes are aligned for the SWITCH.  The DROP_EXISTING = ON option allows the CREATE INDEX to leverage the existing staging table index sequence, thus eliminating the need to sort the index keys.  Finally, the staging table is SWITCHed back into the moved partition.

I hope you find this method useful.  Below is the script I used to create the demo database and objects. 

Demo Setup Script

–create database with monthly filegroups

CREATE DATABASE PartitioningDemo

ON(

      NAME=‘Primary’,

      FILENAME=‘S:\SolidState\PartitioningDemo.mdf’,

      SIZE=10MB),

FILEGROUP NewerData (

      NAME=‘PartitioningDemo_NewerData’,

      FILENAME=‘S:\SolidState\PartitioningDemo_NewerData.ndf’,

      SIZE=400MB,

      FILEGROWTH=10MB),

FILEGROUP OlderData (

      NAME=‘PartitioningDemo_OlderData’,

      FILENAME=‘D:\SpinningDisks\PartitioningDemo_OlderData.ndf’,

      SIZE=600MB,

      FILEGROWTH=10MB)

LOG ON(

      NAME=‘PartitioningDemo_Log’,

      FILENAME=‘L:\LogFiles\PartitioningDemo_Log.ldf’,

      SIZE=10MB,

      FILEGROWTH=10MB);

     

ALTER DATABASE PartitioningDemo

      SET RECOVERY SIMPLE;

GO

 

USE PartitioningDemo;

 

CREATE PARTITION FUNCTION PF_Last12Months( datetime )

AS RANGE RIGHT

FOR VALUES

(               — older_than_current_minus_12

      ‘20100401’  — current_minus_12

      ,‘20100501’ — current_minus_11

      ,‘20100601’ — current_minus_10

      ,‘20100701’ — current_minus_9

      ,‘20100801’ — current_minus_8

      ,‘20100901’ — current_minus_7

      ,‘20101001’ — current_minus_6

      ,‘20101101’ — current_minus_5

      ,‘20101201’ — current_minus_4

      ,‘20110101’ — current_minus_3

      ,‘20110201’ — current_minus_2

      ,‘20110301’ — current_minus_1

      ,‘20110401’ — current

      ,‘20110501’ — future

);

 

CREATE PARTITION SCHEME PS_Last12Months

AS PARTITION PF_Last12Months

TO

      (

      OlderData,

      OlderData,

      OlderData,

      OlderData,

      OlderData,

      OlderData,

      OlderData,

      OlderData,

      OlderData,

      OlderData,

      OlderData,

      NewerData, — minus 2 month (to be moved to OlderData)

      NewerData, — minus 1 month

      NewerData, — current month

      NewerData  — future month+

      );

 

— create table with 10,000,000 rows

ALTER DATABASE PartitioningDemo

      MODIFY FILEGROUP NewerData DEFAULT;

 

WITH

      t1 AS (SELECT 0 AS n UNION ALL SELECT 1 UNION ALL SELECT 2

                            UNION ALL SELECT 3 UNION ALL SELECT 4

                            UNION ALL SELECT 5 UNION ALL SELECT 6

                              UNION ALL SELECT 7 UNION ALL SELECT 8

                              UNION ALL SELECT 9),

      t2 AS (SELECT a.n

                    FROM t1 a, t1 b, t1 c, t1 d, t1 e, t1 f, t1 g)

SELECT

      ISNULL(

            DATEADD(

                  day

                  , (ROW_NUMBER() OVER(ORDER BY t2.n))/26385, ‘20100401’)

                  , ‘20100401’) AS PartitioningDateTimeColumn

      ,ISNULL((ROW_NUMBER() OVER(ORDER BY t2.n)), 0) AS Column1

INTO dbo.PartitionMoveDemo

FROM t2;

 

— create indexes partitioned indexes on table

CREATE CLUSTERED INDEX cdx_PartitionMoveDemo_PartitioningColumn

      ON dbo.PartitionMoveDemo(PartitioningDateTimeColumn)

      ON PS_Last12Months(PartitioningDateTimeColumn);

     

CREATE NONCLUSTERED INDEX idx_PartitionMoveDemo_Column1

      ON dbo.PartitionMoveDemo(Column1)

      ON PS_Last12Months(PartitioningDateTimeColumn);

GO

Stairway Series on SQLServerCentral.com

SQLServerCentral.com launched a new Stairway content series today, targeting specific areas of SQL Server.  Each Stairway includes a series of up to 12 levels focused on a specific SQL Server topic.  The goal is to guide DBAs and developers with little or no understanding of a subject through a sequence of tutorials in order to quickly gain the knowledge one needs to use a SQL Server feature confidently in a production environment.  Kalen Delaney, editor of the Stairway series, is one of the most respected experts in the world-wide SQL Server community.

I was flattered when Kalen gave me the opportunity to contribute to the series with a Stairway on Server-side Tracing.  For years I’ve cautioned against using Profiler indiscriminately both here as well as in the MSDN forums and newsgroups.  But it seems many DBAs still don’t differentiate between Profiler and server-side tracing.  I’m hoping this Server-side Tracing Stairway will empower DBAs with the knowledge to choose the right tool for the job.

My apologies for having gone dark for the last several months.   The subject of this post is the primary reason; there are only so many hours in the day L

Calendar Table and Date/Time Functions

I frequently see questions in the forums and newsgroups about how to best query date/time data and perform date manipulation.  Let me first say that a permanent calendar table that materializes commonly used DATEPART values along with time periods you frequently use is invaluable.  I’ve used such a table for over a decade with great success and strongly recommend you implement one on all of your database servers.  I’ve included a sample calendar table (and numbers table) later in this post and you can find other variations of such a table via an internet search.

Removing the Time Portion

A common requirement I have is to remove the time portion from a date/time value.  This is easy in SQL 2008 since you can simply “CAST(SomeDateTimeValue AS date)”.  But the date data type is not available in older SQL Server versions so you need an alternate method.  In SQL 2005 and earlier versions, I recommend the DATEADD…DATEDIFF method below with an arbitrary base date value specified in a format that is independent of the session DATAFORMAT setting:

SELECT CAST(GETDATE() AS date); --SQL 2008 and later

SELECT DATEADD(day, DATEDIFF(day, '19000101', GETDATE()), '19000101'); --SQL 2005 and earlier

I often see a variation of the DATEADD…DATEDIFF technique with the integer zero (no quotes) specified as the base date.  Although this may provide the expected results (I’ve done it myself), I caution against it because it relies on implicit conversion from the internal SQL Server integer date/time storage format.  If you want to be concise, a better approach is to specify an empty string for the base date value since the default value is ‘1900-01-01 00:00:00’.  In my opinion, an explicit data value is more intuitive, though.

SELECT DATEADD(day, DATEDIFF(day, '', GETDATE()), '');

 

I also sometimes see code that extracts the year, month and day date parts and concatenates with separators.  However, that method is dependent on session DATEFORMAT settings and slower than other methods.  See Tibor Karaszi’s The ultimate guide to the datetime datatypes article for details.

First and Last Day of Period

Another common task is to determine the first or last day of a given period.  The script below shows how to accomplish this of you don’t have a calendar table with the calculated values available.

DECLARE @Date date = GETDATE();

SELECT 'First day of year' [DateDescription], DATEADD(year, DATEDIFF(year,'19000101',@Date), '19000101') AS [CalendarDate]
UNION ALL
SELECT 'Last day of year', DATEADD(day,-1,DATEADD(year,0,DATEADD(year,DATEDIFF(year,'19000101',@Date)+1,'19000101')))
UNION ALL
SELECT 'First day of month', DATEADD(month, DATEDIFF(month,'19000101',@Date), '19000101')
UNION ALL
SELECT 'Last day of month', DATEADD(day,-1,DATEADD(month,0,DATEADD(month,DATEDIFF(month,'19000101',@Date)+1,'19000101')))
UNION ALL
SELECT 'First day week (based on DATEFIRST setting)', DATEADD(day,-(DATEPART(weekday ,@Date)-1),DATEDIFF(day,'19000101', @Date))
UNION ALL
SELECT 'Last day of week (based on DATEFIRST setting)', DATEADD(day,-(DATEPART(weekday ,@Date)-1)+6,DATEDIFF(day,'19000101', @Date));

With a calendar table like the one later in this post:

DECLARE @Date date = GETDATE();
SELECT 'First day of year' [DateDescription],(SELECT FirstDateOfYear FROM dbo.Calendar WHERE CalendarDate = @Date)
UNION ALL
SELECT 'Last day of year',(SELECT LastDateOfYear FROM dbo.Calendar WHERE CalendarDate = @Date)
UNION ALL
SELECT 'First day of month',(SELECT FirstDateOfMonth FROM dbo.Calendar WHERE CalendarDate = @Date)
UNION ALL
SELECT 'Last day of month',(SELECT LastDateOfMonth FROM dbo.Calendar WHERE CalendarDate = @Date)
UNION ALL
SELECT 'First day week (based on DATEFIRST setting)',(SELECT FirstDateOfWeek FROM dbo.Calendar WHERE CalendarDate = @Date)
UNION ALL
SELECT 'Last day of week (based on DATEFIRST setting)',(SELECT LastDateOfWeek FROM dbo.Calendar WHERE CalendarDate = @Date);

Calendar and Numbers Table

I think auxiliary calendar and number tables are a must-have on every database server.  These objects allow you to easily perform set-based processing in a number of scenarios.  In fact, the calendar table population script below uses a numbers table to populate the calendar table with several thousand rows in under a second.  This is much more efficient that a WHILE loop.

This calendar table population script also updates the table with most US holidays and adjusts business/non-business days accordingly.  In addition to customizing the script for holidays as observed by your organization, you might add fiscal period start/end dates to facilitate querying based on those cycles.  Also consider creating user-defined functions or stored procedures to encapsulate frequently used code that uses the calendar table.  For example, here is a function that returns the date that is a specified number of business days from the date provided:

CREATE FUNCTION dbo.udf_AddBusinessDays
(@Date date, @BusinessDays int)
RETURNS date
AS
BEGIN
    RETURN (
        SELECT TOP (1) CalendarDate AS BusinessDate
        FROM (SELECT TOP (@BusinessDays) CalendarDate
        FROM dbo.Calendar
        WHERE
            CalendarDate > @Date
            AND BusinessDay = 1
        ORDER BY CalendarDate) AS BusinessDays
        ORDER BY CalendarDate DESC
    );
END;
GO

Script 1: Example calendar table utility function

--auxiliary number table
CREATE TABLE dbo.Numbers(
    Number int NOT NULL
        CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED
);

--load Numbers table with 1,000,000 numbers
WITH t1 AS (SELECT 0 AS n UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0)
    ,t2 AS (SELECT 0 AS n FROM t1 t1a, t1 t1b, t1 t1c, t1 t1d)
    ,t3 AS (SELECT 0 AS n FROM t2 t2a, t2 t2b, t2 t2c)
    ,numbers AS (SELECT ROW_NUMBER() OVER(ORDER BY n) - 1 AS number FROM t3)
INSERT INTO dbo.Numbers WITH (TABLOCKX) (Number)
SELECT number
FROM numbers;

Script 2: Create and populate numbers table.

CREATE TABLE dbo.Calendar(
	CalendarDate date NOT NULL
		CONSTRAINT PK_Calendar PRIMARY KEY CLUSTERED
	,CalendarYear int NOT NULL
	,CalendarMonth int NOT NULL
	,CalendarDay int NOT NULL
	,DayOfWeekName varchar(10) NOT NULL
	,FirstDateOfWeek date NOT NULL
	,LastDateOfWeek date NOT NULL	
	,FirstDateOfMonth date NOT NULL
	,LastDateOfMonth date NOT NULL
	,FirstDateOfQuarter date NOT NULL
	,LastDateOfQuarter date NOT NULL
	,FirstDateOfYear date NOT NULL
	,LastDateOfYear date NOT NULL
	,BusinessDay bit NOT NULL
	,NonBusinessDay bit NOT NULL
	,Weekend bit NOT NULL
	,Holiday bit NOT NULL
	,Weekday bit NOT NULL
	,CalendarDateDescription varchar(50) NULL
);
GO

--load dates from 2000-01-01 through 2099-12-31
WITH t1 AS (SELECT 0 AS n UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0)
	,t2 AS (SELECT 0 AS n FROM t1 t1a, t1 t1b, t1 t1c, t1 t1d)
	,t3 AS (SELECT 0 AS n FROM t2 t2a, t2 t2b)
	,numbers AS (SELECT ROW_NUMBER() OVER(ORDER BY n) - 1 AS number FROM t3)
INSERT INTO dbo.Calendar WITH (TABLOCKX) (
	CalendarDate
	,CalendarYear
	,CalendarMonth
	,CalendarDay
	,DayOfWeekName
	,FirstDateOfWeek
	,LastDateOfWeek
	,FirstDateOfMonth
	,LastDateOfMonth
	,FirstDateOfQuarter
	,LastDateOfQuarter
	,FirstDateOfYear
	,LastDateOfYear
	,BusinessDay
	,NonBusinessDay
	,Weekend
	,Holiday
	,Weekday
	,CalendarDateDescription
	) 
SELECT
	CalendarDate = DATEADD(day, number, '20000101')
	,CalendarYear = DATEPART(year, DATEADD(day, number, '20000101'))
	,CalendarMonth = DATEPART(month, DATEADD(day, number, '20000101'))
	,CalendarDay = DATEPART(day, DATEADD(day, number, '20000101'))
	,DayOfWeekName = DATENAME(weekday, DATEADD(day, number, '20000101'))
	,FirstDateOfWeek = DATEADD(day,-(DATEPART(weekday ,DATEADD(day, number, '20000101'))-1),DATEADD(day, number, '20000101'))
	,LastDateOfWeek = DATEADD(day,-(DATEPART(weekday ,DATEADD(day, number, '20000101'))-1)+6,DATEADD(day, number, '20000101'))
	,FirstDateOfMonth = DATEADD(month, DATEDIFF(month,'20000101',DATEADD(day, number, '20000101')), '20000101')
	,LastDateOfMonth = DATEADD(day,-1,DATEADD(month,0,DATEADD(month,DATEDIFF(month,'20000101',DATEADD(day, number, '20000101'))+1,'20000101')))
	,FirstDateOfQuarter = DATEADD(quarter, DATEDIFF(quarter,'20000101',DATEADD(day, number, '20000101')), '20000101')
	,LastDateOfQuarter = DATEADD(day, -1, DATEADD(quarter, DATEDIFF(quarter,'20000101',DATEADD(day, number, '20000101'))+1, '20000101'))
	,FirstDateOfYear = DATEADD(year, DATEDIFF(year,'20000101',DATEADD(day, number, '20000101')), '20000101')
	,LastDateOfYear = DATEADD(day,-1,DATEADD(year, DATEDIFF(year,'20000101',DATEADD(day, number, '20000101'))+1, '20000101'))
	--initially set all weekdays are business days
	,BusinessDay = CASE WHEN DATENAME(weekday, DATEADD(day, number, '20000101')) IN('Monday','Tuesday','Wednesday','Thursday','Friday') THEN 1 ELSE 0 END
	--all weekends are non-business days
	,NonBusinessDay = CASE WHEN DATENAME(weekday, DATEADD(day, number, '20000101')) IN('Saturday','Sunday') THEN 1 ELSE 0 END
	,Weekend = CASE WHEN DATENAME(weekday, DATEADD(day, number, '20000101')) IN('Saturday','Sunday') THEN 1 ELSE 0 END
	,Holiday = 0 --initially no holidays
	,Weekday = CASE WHEN DATENAME(weekday, DATEADD(day, number, '20000101')) IN('Monday','Tuesday','Wednesday','Thursday','Friday') THEN 1 ELSE 0 END
	,CalendarDateDescription = NULL
FROM numbers
WHERE number < DATEDIFF(day, '20000101', '20991231') + 1;

--New Year's Day
UPDATE dbo.calendar
SET
	BusinessDay = 0
	,NonBusinessDay = 1
	,Holiday = 1
    ,CalendarDateDescription = 'New Year''s Day'
WHERE
    CalendarMonth = 1
    AND CalendarDay = 1;

--New Year's Day celebrated on Friday, December 31 when January 1 falls on Saturday
UPDATE dbo.Calendar
SET
	BusinessDay = 0
	,NonBusinessDay = 1
    ,CalendarDateDescription = 'New Year''s Day Celebrated'
WHERE
    CalendarMonth = 12
    AND CalendarDay = 31
    AND DayOfWeekName = 'Friday';

--New Year's Day celebrated on Monday, January 2 when January 1 falls on Sunday
UPDATE dbo.Calendar
SET
	BusinessDay = 0
	,NonBusinessDay = 1
    ,CalendarDateDescription = 'New Year''s Day Celebrated'
WHERE
    CalendarMonth = 1
    AND CalendarDay = 2
    AND DayOfWeekName = 'Monday';    

--Martin Luther King Day - 3rd Monday in January
UPDATE dbo.Calendar
SET
	BusinessDay = 0
	,NonBusinessDay = 1
	,Holiday = 1
    ,CalendarDateDescription = 'Martin Luther King Day'
WHERE
    CalendarMonth = 1
    AND DayOfWeekName = 'Monday'
    AND (SELECT COUNT(*) 
		FROM dbo.Calendar c2 
        WHERE 
            c2.CalendarDate <= Calendar.CalendarDate
            AND c2.CalendarYear = Calendar.CalendarYear
            AND c2.CalendarMonth = Calendar.CalendarMonth
            AND c2.DayOfWeekName = 'Monday'
        ) = 3;

--President's Day - 3rd Monday in February
UPDATE dbo.Calendar
SET
	BusinessDay = 0
	,NonBusinessDay = 1
	,Holiday = 1
    ,CalendarDateDescription = 'President''s Day'
WHERE
    CalendarMonth = 2
    AND DayOfWeekName = 'Monday'
    AND (SELECT COUNT(*) 
		FROM dbo.Calendar c2 
        WHERE 
            c2.CalendarDate <= Calendar.CalendarDate
            AND c2.CalendarYear = Calendar.CalendarYear
            AND c2.CalendarMonth = Calendar.CalendarMonth
            AND c2.DayOfWeekName = 'Monday'
        ) = 3;

--Easter - first Sunday after the full moon following the vernal (March 21) equinox 
WITH
	t4 AS (SELECT 0 AS n UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0)
    ,t256 AS (SELECT 0 AS n FROM t4 t4a, t4 t4b, t4 t4c)
    ,years AS (SELECT ROW_NUMBER() OVER(ORDER BY n) + 1999 AS year FROM t256)
	,n AS (SELECT years.year, years.year - (19 * (years.year / 19)) AS n FROM years)
	,century AS (SELECT years.year, years.year / 100 AS century FROM years)
	,i AS (SELECT century.year, century.century - (century.century / 4) - ((century.century - ((century.century - 17) / 25)) / 3) + (19 * n.n) + 15  AS i
		FROM century 
		JOIN n ON n.year = century.year)
	,i2 AS (SELECT i.year, i.i - (30 * (i.i / 30 ) ) AS i2
		FROM i)
	,i3 AS (SELECT i2.year, i2.i2 - ((i2.i2 / 28) * (1 - (i2.i2 / 28) * (29 / (i2.i2 + 1)) * ((21 - n.n) / 11)) ) AS i3 
		FROM i2
		JOIN n ON n.year = i2.year)
	,j AS (SELECT i3.year, i3.year + (i3.year / 4) + i3.i3 + 2 - century.century + (century.century / 4 ) AS j 
		FROM i3
		JOIN century ON century.year = i3.year)
	,j2 AS (SELECT j.year, j.j - (7 * (j.j / 7) ) AS j2 
		FROM j)
	,month AS (SELECT j2.year, 3 + (((i3.i3 - j2.j2) + 40) / 44 ) AS month 
		FROM j2
		JOIN i3 ON i3.year = j2.year)
	,day AS (SELECT month.year, month.month, i3.i3 - j2.j2 + 28 - (31 * ( month.month / 4 ) ) AS day 
		FROM i3
		JOIN j2 ON j2.year = i3.year
		JOIN month ON month.year = j2.year)
	,easter AS (SELECT CAST(DATEADD(year, month.year-1900, DATEADD(day, day.day-1, DATEADD(month, month.month-1, ''))) AS date) AS easter
		FROM month
		JOIN day ON day.month = month.month AND day.year = month.year)
UPDATE dbo.Calendar
SET
	Holiday = 1
    ,CalendarDateDescription = 'Easter'
WHERE
    CalendarDate IN(
		SELECT easter
		FROM easter
		);

--Good Friday - 2 days before Easter Sunday
UPDATE dbo.Calendar
SET
	BusinessDay = 0
	,NonBusinessDay = 1
	,Holiday = 1
    ,CalendarDateDescription = 'Good Friday'
WHERE
    CalendarDate IN(
        SELECT DATEADD(day, -2, c2.CalendarDate)
        FROM dbo.Calendar c2
        WHERE c2.CalendarDateDescription = 'Easter'
        );

--Memorial Day - last Monday in May
UPDATE dbo.Calendar
SET
	BusinessDay = 0
	,NonBusinessDay = 1
	,Holiday = 1
    ,CalendarDateDescription = 'Memorial Day'
WHERE
    CalendarMonth = 5
    AND DayOfWeekName = 'Monday'
    AND CalendarDate IN(
        SELECT MAX(c2.CalendarDate)
        FROM dbo.Calendar c2
        WHERE
            c2.CalendarYear = Calendar.CalendarYear
            AND c2.CalendarMonth = 5
            AND c2.DayOfWeekName = 'Monday'
        );

--Independence Day - July 4th
UPDATE dbo.Calendar
SET
	BusinessDay = 0
	,NonBusinessDay = 1
	,Holiday = 1
    ,CalendarDateDescription = 'Independence Day'
WHERE
    CalendarMonth = 7
    AND CalendarDay = 4;

--Independence Day celebrated on Friday, July 3 when July 4 falls on a Saturday
UPDATE dbo.Calendar
SET
	BusinessDay = 0
	,NonBusinessDay = 1
    ,CalendarDateDescription = 'Independence Day Celebrated'
WHERE
    CalendarMonth = 7
    AND CalendarDay = 3
    AND DayOfWeekName = 'Friday';

--Independence Day celebrated on Friday, July 3 when July 4 falls on a Saturday
UPDATE dbo.Calendar
SET
	BusinessDay = 0
	,NonBusinessDay = 1
    ,CalendarDateDescription = 'Independence Day Celebrated'
WHERE
    CalendarMonth = 7
    AND CalendarDay = 5
    AND DayOfWeekName = 'Monday';

--Labor Day - first Monday in September
UPDATE dbo.Calendar
SET
	BusinessDay = 0
	,NonBusinessDay = 1
	,Holiday = 1
    ,CalendarDateDescription = 'Labor Day'
WHERE
    CalendarMonth = 9
    AND DayOfWeekName = 'Monday'
    AND CalendarDate IN(
        SELECT MIN(c2.CalendarDate)
        FROM dbo.Calendar c2
        WHERE
            c2.CalendarYear = calendar.CalendarYear
            AND c2.CalendarMonth = 9
            AND c2.DayOfWeekName = 'Monday'
        );

--Columbus Day - second Monday in October
UPDATE dbo.Calendar
SET
	Holiday = 1
    ,CalendarDateDescription = 'Columbus Day'
WHERE
    CalendarMonth = 10
    AND DayOfWeekName = 'Monday'
    AND (SELECT COUNT(*) 
		FROM dbo.Calendar c2 
        WHERE 
            c2.CalendarDate <= Calendar.CalendarDate
            AND c2.CalendarYear = Calendar.CalendarYear
            AND c2.CalendarMonth = Calendar.CalendarMonth
            AND c2.DayOfWeekName = 'Monday'
        ) = 2;

--Veteran's Day - November 11
UPDATE dbo.Calendar
SET
	Holiday = 1
    ,CalendarDateDescription = 'Veteran''s Day'
WHERE
    CalendarMonth = 11
    AND CalendarDay = 11;

--Thanksgiving - fourth Thursday in November
UPDATE dbo.Calendar
SET
	BusinessDay = 0
	,NonBusinessDay = 1
	,Holiday = 1
    ,CalendarDateDescription = 'Thanksgiving'
WHERE
    CalendarMonth = 11
    AND DayOfWeekName = 'Thursday'

    AND (SELECT COUNT(*) FROM 
		dbo.Calendar c2 
        WHERE 
            c2.CalendarDate <= Calendar.CalendarDate
            AND c2.CalendarYear = Calendar.CalendarYear
            AND c2.CalendarMonth = Calendar.CalendarMonth
            AND c2.DayOfWeekName = 'Thursday'
        ) = 4;

UPDATE dbo.Calendar
SET
	BusinessDay = 0
	,NonBusinessDay = 1
	,Holiday = 1
    ,CalendarDateDescription = 'Day after Thanksgiving'
WHERE
    CalendarDate IN(
        SELECT DATEADD(day, 1, c2.CalendarDate)
        FROM dbo.Calendar c2
        WHERE c2.CalendarDateDescription = 'Thanksgiving'
        );

--Christmas Day - December 25th
UPDATE dbo.Calendar
SET
	BusinessDay = 0
	,NonBusinessDay = 1
	,Holiday = 1
    ,CalendarDateDescription = 'Christmas Day'
WHERE
    CalendarMonth = 12
    AND CalendarDay = 25;

--Christmas day celebrated on Friday, December 24 when December 25 falls on a Saturday
UPDATE dbo.Calendar
SET
	BusinessDay = 0
	,NonBusinessDay = 1
    ,CalendarDateDescription = 'Christmas Day Celebrated'
WHERE
    CalendarMonth = 12
    AND CalendarDay = 24
    AND DayOfWeekName = 'Friday';

--Christmas day celebrated on Monday, December 24 when December 25 falls on a Sunday
UPDATE dbo.Calendar
SET
	BusinessDay = 0
	,NonBusinessDay = 1
    ,CalendarDateDescription = 'Christmas Day Celebrated'
WHERE
    CalendarMonth = 12
    AND CalendarDay = 26
    AND DayOfWeekName = 'Monday';

Script 3: Create and populate calendar table and update with holidays

 

Secret of SQL Trace Duration Column

Why would a trace of long-running queries not show all queries that exceeded the specified duration filter?  We have a server-side SQL Trace that includes RPC:Completed and SQL:BatchCompleted events with a filter on Duration >= 100000.  Nearly all of the queries on this busy OLTP server run in under this 100 millisecond threshold so any that appear in the trace are candidates for root cause analysis and/or performance tuning opportunities.

After an application experienced query timeouts, the DBA looked at the trace data to corroborate the problem.  Surprisingly, he found no long-running queries in the trace from the application that experienced the timeouts even though the application’s error log clearly showed detail of the problem (query text, duration, start time, etc.).  The trace did show, however, that there were hundreds of other long-running queries from different applications during the problem timeframe.  We later determined those queries were blocked by a large UPDATE query against a critical table that was inadvertently run during this busy period.

So why didn’t the trace include all of the long-running queries?  The reason is because the SQL Trace event duration doesn’t include the time a request was queued while awaiting a worker thread.  Remember that the server was under considerable stress at the time due to the severe blocking episode.  Most of the worker threads were in use by blocked queries and new requests were queued awaiting a worker to free up (a DMV query on the DAC connection will show this queuing: “SELECT scheduler_id, work_queue_count FROM sys.dm_os_schedulers;”).  Technically, those queued requests had not started.  As worker threads became available, queries were dequeued and completed quickly.  These weren’t included in the trace because the duration was under the 100ms duration filter.  The duration reflected the time it took to actually run the query but didn’t include the time queued waiting for a worker thread.

The important point here is that duration is not end-to-end response time.  Duration of RPC:Completed and SQL:BatchCompleted events doesn’t include time before a worker thread is assigned nor does it include the time required to return the last result buffer to the client.  In other words, duration only includes time after the worker thread is assigned until the last buffer is filled.  But be aware that duration does include the time need to return intermediate result set buffers back to the client, which is a factor when large query results are returned.  Clients that are slow in consuming results sets can increase the duration value reported by the trace “completed” events.

Ad-Hoc Rollup by date/time Interval

I often use aggregate queries to rollup data by an arbitrary date/time interval.  I’ll share some techniques that I use to accomplish the task in case you find these useful, using the same table below:

CREATE TABLE dbo.WebStats

(

      RequestTimestamp datetime NOT NULL,

      Page varchar(255) NOT NULL

);

CREATE CLUSTERED INDEX WebStats_cdx ON dbo.WebStats(RequestTimestamp, Page);

 

INSERT INTO dbo.WebStats (RequestTimestamp, Page)

VALUES

      (‘2010-01-01T00:00:00’, ‘Default.aspx’)

      ,(‘2010-01-01T00:00:15’, ‘Default.aspx’)

      ,(‘2010-01-01T00:01:05’, ‘Order.aspx’)

      ,(‘2010-01-01T00:01:30’, ‘Default.aspx’)

      ,(‘2010-01-01T00:01:40’, ‘OrderStatus.aspx’)

      ,(‘2010-01-01T00:02:05’, ‘Default.aspx’)

      ,(‘2010-01-01T00:03:05’, ‘ProductInfo.aspx’)

      ,(‘2010-01-01T00:03:30’, ‘Default.aspx’);

GO

 

Simple Rollup

Without an auxiliary table, a little DATEADD magic can do the trick.  Here’s an example that summarizes web page requests by minute for the specified date/time range:

DECLARE

      @StartTimestamp datetime = ‘2010-01-01T00:00:00’

      ,@EndTimestamp datetime = ‘2010-01-02T00:00:00’;

 

SELECT

      DATEADD(minute, DATEDIFF(minute, @StartTimestamp, RequestTimestamp), @StartTimestamp) AS Interval,

      COUNT(*) AS PageRequests

FROM dbo.WebStats

GROUP BY

      DATEADD(minute, DATEDIFF(minute, @StartTimestamp, RequestTimestamp), @StartTimestamp)

ORDER BY

      Interval; 

 

Results:

Interval

PageRequests

2010-01-01 00:00:00.000

2

2010-01-01 00:01:00.000

3

2010-01-01 00:02:00.000

1

2010-01-01 00:03:00.000

2

2010-01-01 00:29:00.000

1

2010-01-01 00:31:00.000

1

2010-01-01 00:42:00.000

1

2010-01-01 02:01:00.000

2

2010-01-01 02:03:00.000

2

2010-01-01 02:31:00.000

1

2010-01-01 02:44:00.000

1

2010-01-01 02:49:00.000

1

 

Arbitrary Intervals

The simple rollup method works well for any of the pre-defined units provided by the DATEADD function (year, quarter, month, day, hour, minute, second or week).  However, it lacks the flexibility to roll up to an arbitrary interval like 15 minutes or 30 seconds.  A little DATEADD/DATEDIFF math addresses this gap.  Below is an example of a 30-minute interval rollup using this technique:

DECLARE

      @StartTimestamp datetime = ‘2010-01-01T00:00:00’

      ,@EndTimestamp datetime = ‘2010-01-01T04:00:00’

      ,@IntervalSeconds int = 1800; –30 minutes

SELECT

      DATEADD(second

            ,DATEDIFF(second, @StartTimestamp

            ,RequestTimestamp)

            / @IntervalSeconds * @IntervalSeconds, @StartTimestamp) AS Interval

      ,COUNT(*) AS PageRequests

FROM dbo.WebStats

WHERE

      RequestTimestamp >= @StartTimestamp

      AND RequestTimestamp < @EndTimestamp

GROUP BY

      DATEADD(second

            ,DATEDIFF(second, @StartTimestamp

            ,RequestTimestamp) / @IntervalSeconds * @IntervalSeconds, @StartTimestamp)

ORDER BY

      Interval;

 

Interval

PageRequests

2010-01-01 00:00:00.000

9

2010-01-01 00:30:00.000

2

2010-01-01 02:00:00.000

4

2010-01-01 02:30:00.000

3

 

Missing Intervals

You probably noticed that periods with no activity at all are omitted rather than reporting a zero value.  One method to include the missing intervals is with an outer join to a temporal table containing all the desired intervals.  Ideally, the temporal table would be a permanent one but I’ve found it impractical to maintain such a table for ad-hoc needs.  Fortunately, a utility numbers CTE is a handy way to generate the needed intervals dynamically.  The example below provides up to 65,536 interval values and can be easily extended as needed.

DECLARE

      @StartTimestamp datetime = ‘2010-01-01T00:00:00’

      ,@EndTimestamp datetime = ‘2010-01-01T04:00:00’

      ,@IntervalSeconds int = 1800; –30 minutes

 

WITH

      T2 AS (SELECT 0 AS Num UNION ALL SELECT 0),

      T4 AS (SELECT 0 AS Num FROM T2 AS A CROSS JOIN T2 AS B),

      T256 AS (SELECT 0 AS Num FROM T4 AS A CROSS JOIN T4 AS B CROSS JOIN T4 AS C CROSS JOIN T4 AS D),

      T65536 AS (SELECT ROW_NUMBER() OVER(ORDER BY A.Num) AS Num FROM T256 AS A CROSS JOIN T256 AS B)

SELECT

      DATEADD(second

            ,(Num1) * @IntervalSeconds, @StartTimestamp) AS Interval

      ,COUNT(WebStats.RequestTimestamp) AS PageRequests

FROM T65536

LEFT JOIN dbo.WebStats ON

      WebStats.RequestTimestamp >= DATEADD(second, (Num1) * @IntervalSeconds, @StartTimestamp)

      AND WebStats.RequestTimestamp < DATEADD(second, Num * @IntervalSeconds, @StartTimestamp)

WHERE

      Num <= DATEDIFF(second, @StartTimeStamp, @EndTimestamp) / @IntervalSeconds

GROUP BY

      DATEADD(second

            ,(Num1) * @IntervalSeconds, @StartTimestamp)

ORDER BY

      Interval;  

 

Interval

PageRequests

2010-01-01 00:00:00.000

9

2010-01-01 00:30:00.000

2

2010-01-01 01:00:00.000

0

2010-01-01 01:30:00.000

0

2010-01-01 02:00:00.000

4

2010-01-01 02:30:00.000

3

2010-01-01 03:00:00.000

0

2010-01-01 03:30:00.000

0

 

Don’t Bloat Proc Cache with Parameters

Most of us have long been aware that non-parameterized ad-hoc queries can pollute procedure cache with entries that will never be reused.  Each distinct query string is a separate cache entry and, unless the exact query with the same values is executed again, the plan will remain in cache unnecessarily until aged out.

However, I was surprised to learn recently that even parameterized queries can bloat procedure cache if one isn’t careful.  This was brought to by attention by SQL Server MVP Erland Sommarskog and confirmed by fellow MVP Adam Machanic, who experienced this problem at a customer site.

The issue is that parameter declarations are part of cache entry statement text.  Consider the following code:

string lastName = “Sommarskog”;

 

SqlCommand selectCommand =

    new SqlCommand(

        “SELECT ContactID, FirstName, LastName “ +

        “FROM AdventureWorks.Person.Contact “ +

        “WHERE LastName = @LastName”,

        connection);

 

selectCommand.Parameters.AddWithValue(“@LastName”, lastName);

 

SqlClient defined the parameter as nvarchar(10) because the supplied string value length was 10 and .NET strings are Unicode.  Below is the resulting statement text from sys.dm_exec_query_plan:

(@LastName nvarchar(10))SELECT ContactID, FirstName, LastName FROM AdventureWorks.Person.Contact WHERE LastName = @LastName

 

Now run the same code with a different last name value:

string lastName = “Machanic”;

 

SqlCommand selectCommand =

    new SqlCommand(

        “SELECT ContactID, FirstName, LastName “ +

        “FROM AdventureWorks.Person.Contact “ +

        “WHERE LastName = @LastName”,

        connection);

 

selectCommand.Parameters.AddWithValue(“@LastName”, lastName);

 

You probably guessed it – an additional cache entry was created for the same query due to the different name length:

(@LastName nvarchar(8))SELECT ContactID, FirstName, LastName FROM AdventureWorks.Person.Contact WHERE LastName = @LastName

 

It is likely that no more than a dozen or so cache entries will exist for this particular query since the only differentiator is the last name length.  However, with queries that contain multiple string parameters of varying lengths, the number of cache entries increases dramatically while the likelihood of reuse diminishes.  In Adam’s case, the main culprit was a string with a large list of values (between 1 and 1000+ items) that was split using a technique described in Erland’s Arrays and Lists articles http://www.sommarskog.se. 

Preventing Procedure Cache Bloat

An easy way to prevent unneeded procedure cache entries is to explicitly specify the maximum data length for string and other variable length values.  In fact, it’s a good idea to specify both the length and SqlDbType to ensure the parameter declaration matches the target column.  Not only will this help prevent cache bloat due to different parameter lengths, the matching data type can avoid the performance gotcha of a non-sargable expression caused by implicit data type conversion.

There are a number techniques to specify the parameter length and SqlDbType in .NET.  For example, the code below uses an overloaded SqlCommand.Parameters.Add method.  If you use AddWithValue, be sure to set the Size and SqlDbType properties of the newly created parameter afterward.

string lastName = “Sommarskog”;

 

SqlCommand selectCommand =

    new SqlCommand(

        “SELECT ContactID, FirstName, LastName “ +

        “FROM AdventureWorks.Person.Contact “ +

        “WHERE LastName = @LastName”,

        connection);

 

selectCommand.Parameters.Add(“@LastName”, SqlDbType.NVarChar, 50).Value = lastName;

 

 

Why Parameters are a Best Practice

Parameterized queries are arguably the single most important database development best practice. Parameters, with proper SQL data types, provide many benefits. These include:

– improve security by preventing injection of unwanted SQL (barring non-parameterized server-side dynamic SQL)
– eliminate need to enclose literals in quotes (or not) depending on data type
– eliminate need prefix Unicode literal strings with N
– allow single quotes within strings without the need to escape them
– avoid the need to format date/time/datetime string literals in a particular way (which vary by culture)
– do not require decimal separators (which vary by culture)
– improve performance by reducing compilation costs and promoting execution plan cache reuse
– allow WHERE clause predicates with Always Encrypted columns
– code that’s cleaner and easier to maintain

Despite these many benefits, it is all too common to see applications build SQL statements by concatenating program variable values as SQL literals with values that vary by execution. I believe this simply due to ignorance because one often learns SQL using an ad-hoc tool like SSMS in order to focus on SQL syntax and database concepts, overlooking the important database programming aspects needed in practice. Developers unwittingly build SQL queries in application code as if they were typing SQL statements into an SSMS query window. Sadly, instructors sometimes teach database programming using string concatenation instead of parameterized queries and many online examples use improper techniques too, spreading this poor practice. When one posts questions with non-parameterized query code in online help forums, the egregious practice often distracts answerers from addressing the real question, attracts downvotes, and in some cases, a parameterized query would have avoided the need for the question to begin with.

This article discusses parameter basics in commonly used programming languages and Microsoft SQL Server APIs, along with code examples and best practices. Additionally, internals of how parameters are actually sent to SQL Server are described to provide insight into why parameterized queries are so important. This article focuses specifically on Microsoft SQL Server but parameters and their benefits apply to other DBMS products as well.

Introduction

Here’s a C# example of the common mistake of building an ad-hoc query with string concatenation:

connectionString = "Data Source=.;Initial Catalog=YourDatabase;Integrated Security=SSPI";
using (var connection = new SqlConnection(connectionString))
using (var selectCommand =
    new SqlCommand("SELECT UserName, FirstName, LastName FROM dbo.UserTable WHERE UserName = '" + userName + "';", connection))
{
    connection.Open();
    using (var reader = selectCommand.ExecuteReader())
    {
        while (reader.Read())
        {
            Console.WriteLine($"{reader["UserName"]} {reader["FirstName"]} {reader["LastName"]}");
        }
    }
}

A common misconception is that using stored procedures inherently prevent SQL injection. Stored procedures can help mitigate security risks but must also be called properly to release parameterization benefits. The same problems as the previous ad-hoc query exists with the stored procedure example below because parameter values are passed as literals.

connectionString = "Data Source=.;Initial Catalog=YourDatabase;Integrated Security=SSPI";
using (var connection = new SqlConnection(connectionString))
using (var executeCommand =
            new SqlCommand("EXECUTE dbo.usp_GetUser '" + userName + "';", connection))
{
    connection.Open();
    using (var reader = executeCommand.ExecuteReader())
    {
        while (reader.Read())
        {
            Console.WriteLine($"{reader["UserName"]} {reader["FirstName"]} {reader["LastName"]}");
        }
    }
}

Techniques for building parameterized SQL statements vary by client API, driver, programming language, and/or framework. The underlying concept is the same, though. Instead of including literals in SQL statements for values that differ for each execution, specify parameter tokens in their place and add parameters with values to the command separately. The SQL statement itself need not change between executions but parameter values can differ for each execution without modifying the SQL statement itself.

The parameter creation techniques used here are not the only ones available to create parameters. I’ve included documentation links that show other parameter creation methods and guidance as to which should be avoided.

Most ORM frameworks (Entity Framwork, Hibernate, et.al.) build parameterized queries on your behalf. These are beyond the focus of this article but I suggest you peruse the Parameters Under the Covers section of this article to ensure parameters are indeed being passed as expected as an RPC request by the framework you are using.

Parameters with ADO.NET

I’ll first mention one should use an ADO.NET and a managed ADO.NET provider in .NET applications whenever possible. ADO.NET does allow one to use unmanaged ODBC and OLE DB drivers via the System.Data.Odbc and System.Data.OleDb namespace objects but those are intended to be used only when no suitable managed ADO.NET provider is available. For Microsoft SQL Server, the managed providers are System.Data.SqlClient (included with the .NET framework) or the newer Microsoft.Data.SqlClient (.NET Core package, currently in preview). Managed providers provide the best performance in .NET applications by avoiding the cost of interop with unmanaged code, which is especially costly with large result sets.

Also, it’s not uncommon to see VB.NET applications sometimes use the legacy ADODB (ADO classic) API instead of ADO.NET because they started life in the VB6 world and were never changed to use ADO.NET. I’ll discuss ADO classic later in the context of unmanaged code. I suggest such lacy code be remediated to use ADO.NET if possible.

Below are C# parameterized equivalents of the earlier queries. These same objects and techniques can be used in VB.NET and managed C++ with only linguistic differences. The only change to the original SELECT statement is the parameter tokens within the query text plus adding parameters to the command’s Parameters collection (instead of concatenating values as SQL literal text). Parameter names (‘@’ prefix) are used as tokens for parameter values in SqlClient queries and allow for mnemonic names to make it easy to identify individual parameters within the query.

Importantly, note that parameter tokens within a query are never enclosed in quotes because these are not literals. Also, the same parameter may be specified more than once within a query with SqlClient without the need to add an additional parameter with the same value.

connectionString = "Data Source=.;Initial Catalog=YourDatabase;Integrated Security=SSPI";
        using (var connection = new SqlConnection(connectionString))
        using (var selectCommand =
                new SqlCommand("SELECT UserName, FirstName, LastName FROM dbo.UserTable WHERE UserName = @UserName;", connection))
        {
            selectCommand.Parameters.Add("@UserName", SqlDbType.VarChar, 50).Value = userName;
            connection.Open();
            using (var reader = selectCommand.ExecuteReader())
            {
                while (reader.Read())
                {
                    Console.WriteLine($"{reader["UserName"]} {reader["FirstName"]} {reader["LastName"]}");
                }
            }
        }

It’s generally best to specify CommandType.StoredProcedure when calling a stored procedure because this reduces server-side parsing and enforces that stored procedure parameter arguments be passed as client parameters instead of literals. With SqlClient and CommandType.StoredProcedure, only the stored procedure name is specified as the command text (i.e. no parameter tokens). The procedure name should be schema-qualified for maximum performance, a consideration in high-volume OLTP workloads. Parameters are added to the Parameters collection and implicitly mapped to the actual stored procedure parameters by name rather than by ordinal so the parameter collection order does not matter when multiple parameters are involved.

        connectionString = "Data Source=.;Initial Catalog=YourDatabase;Integrated Security=SSPI";
        using (var connection = new SqlConnection(connectionString))
        using (var executeCommand =
            new SqlCommand("dbo.usp_GetUser", connection))
        {
            executeCommand.CommandType = CommandType.StoredProcedure;
            executeCommand.Parameters.Add("@UserName", SqlDbType.VarChar, 50).Value = userName;
            connection.Open();
            using (var reader = executeCommand.ExecuteReader())
            {
                while (reader.Read())
                {
                    Console.WriteLine($"{reader["UserName"]} {reader["FirstName"]} {reader["LastName"]}");
                }
            }
        }

One can also execute a stored procedure as a parameterized query using a T-SQL EXECUTE statement and CommandType.Text similarly to a parameterized query. There is no benefit in doing so compared to CommandType.StoredProcedure for a single proc call (and it does slightly bloat the request). However, CommandType.Text allows one to execute a parameterized batch of multiple statements in a single round trip. The batch may be a mix of EXECUTE and other DML statements.

With an EXECUTE statement, stored procedure parameters can be specified using either named (i.e. @stored_proc_parameter_name1 = @command_parameter_name1, @stored_proc_parameter_name2 = @command_parameter_name2, …) or positional (@command_parameter_name1, @command_parameter_name2, …) arguments. Stating the obvious, positional syntax necessitates the parameter argument order on the EXECUTE statement match the stored procedure definition order. However, the order of parameters within the parameters collection isn’t significant with either named or positional arguments because the parameters collection is mapped to the parameters within the SQL text by name with SqlClient.

Below is a named parameter example. The stored procedure parameter name and the client parameter name are the same in this example.

connectionString = "Data Source=.;Initial Catalog=YourDatabase;Integrated Security=SSPI";
        using (var connection = new SqlConnection(connectionString))
        using (var executeCommand =
            new SqlCommand("EXECUTE dbo.usp_GetUser @UserName = @UserName", connection))
        {
            executeCommand.CommandType = CommandType.Text;
            executeCommand.Parameters.Add("@UserName", SqlDbType.VarChar, 50).Value = userName;
            connection.Open();
            using (var reader = executeCommand.ExecuteReader())
            {
                while (reader.Read())
                {
                    Console.WriteLine($"{reader["UserName"]} {reader["FirstName"]} {reader["LastName"]}");
                }
            }
        }

Note that the EXECUTE keyword could have be omitted in this example because SQL Server assumes the first statement of a batch is EXECUTE. I recommend the EXECUTE be explicitly specified even when the first statement in a batch.

Parameters with ADO.NET System.Data.OleDb

Again, .NET applications should always use a managed provider when one is available (SqlClient in the case of SQL Server). That being said, I use unmanaged SQL Server drivers in these examples to illustrate how queries can be parameterized in ADO.NET for other DBMS products that don’t have a managed .NET provider. The only difference would be the connection string.

Here are the equivalent OLE DB parameterized queries in ADO.NET using the latest SQL Server OLE DB provider. The differences compared to same SqlClient query are the connection string, question mark parameter tokens, and the System.Data.OleDb objects.

        connectionString = "Provider=MSOLEDBSQL;Data Source=.;Initial Catalog=YourDatabase;Integrated Security=SSPI";
        using (var connection = new OleDbConnection(connectionString))
        using (var selectCommand =
                new OleDbCommand("SELECT UserName, FirstName, LastName FROM dbo.UserTable WHERE UserName = ?;", connection))
        {
            selectCommand.Parameters.Add("@UserName", OleDbType.VarChar, 50).Value = userName;
            connection.Open();
            using (var reader = selectCommand.ExecuteReader())
            {
                while (reader.Read())
                {
                    Console.WriteLine($"{reader["UserName"]} {reader["FirstName"]} {reader["LastName"]}");
                }
            }
        }
        connectionString = "Provider=MSOLEDBSQL;Data Source=.;Initial Catalog=YourDatabase;Integrated Security=SSPI";
        using (var connection = new OleDbConnection(connectionString))
        using (var executeCommand =
                new OleDbCommand("dbo.usp_GetUser", connection))
        {
            executeCommand.CommandType = CommandType.StoredProcedure;
            executeCommand.Parameters.Add("@UserName", OleDbType.VarChar, 50).Value = userName;
            connection.Open();
            using (var reader = executeCommand.ExecuteReader())
            {
                while (reader.Read())
                {
                    Console.WriteLine($"{reader["UserName"]} {reader["FirstName"]} {reader["LastName"]}");
                }
            }
        }

Parameters with ADO.NET System.Data.Odbc

Below is the ODBC parameterized query in ADO.NET. The only difference compared to the earlier OLE DB version is the connection string and System.Data.Odbc namespace objects. Although this example uses a DSN-less connection string, one could also specify an ODBC DSN in the connection string (e.g. “DSN=YourDSN”) and omit the attributes that are redundant with the DSN configuration.

        connectionString = "Driver={ODBC Driver 17 for SQL Server};Server=.;Database=YourDatabase;Trusted_Connection=Yes";
        using (var connection = new OdbcConnection(connectionString))
        using (var selectCommand =
            new OdbcCommand("SELECT UserName, FirstName, LastName FROM dbo.UserTable WHERE UserName = ?;", connection))
        {
            selectCommand.Parameters.Add("@UserName", OdbcType.VarChar, 50).Value = userName;
            connection.Open();
            using (var reader = selectCommand.ExecuteReader())
            {
                while (reader.Read())
                {
                    Console.WriteLine($"{reader["UserName"]} {reader["FirstName"]} {reader["LastName"]}");
                }
            }
        }

Similarly, here is the ODBC version of the same previous stored procedure calls. However, with ODBC and CommandType.StoredProcedure, the command text specifies an ODBC procedure call escape sequence with parameter marker(s) rather than just the stored procedure name.

    connectionString = "Driver={ODBC Driver 17 for SQL Server};Server=.;Database=YourDatabase;Trusted_Connection=Yes";
        using (var connection = new OdbcConnection(connectionString))
        using (var executeCommand =
            new OdbcCommand("{call dbo.usp_GetUser(?)}", connection))
        {
            executeCommand.CommandType = CommandType.StoredProcedure;
            executeCommand.Parameters.Add("@UserName", OdbcType.VarChar, 50).Value = userName;
            connection.Open();
            using (var reader = executeCommand.ExecuteReader())
            {
                while (reader.Read())
                {
                    Console.WriteLine($"{reader["UserName"]} {reader["FirstName"]} {reader["LastName"]}");
                }
            }
        }

Parameters with ADO Classic

ADO “Classic” (not to be confused with ADO.NET) is a mature legacy technology that hasn’t been enhanced for about 20 years. An unmanaged COM-based API, ADO classic is still used today in classic ASP web applications, VBA (Visual Basic for Applications), legacy VB 6 code, and VB.NET code that was converted from VB 6 without being refactored to use modern managed ADO.NET APIs.

It is best to use an OLE DB driver with ADO classic because these unmanaged APIs natively use COM-based technologies. That being said, one can use ODBC (which is ultimately a call-level interface rather than COM) with ADO via the Microsoft OLE DB Provider for ODBC drivers (MSDASQL, which is the default OLE DB provider when no provider is specified). This is needed when no OLE DB provider available for a non SQL Server DBMS but should otherwise be avoided since ADO must go through the extra MSDASQL layer. A SQL Server OLE DB provider/driver (i.e. SQLOLEDB, SQLNCLI, and MSOLEDBSQL) should be used directly to access SQL Server from ADO classic.

Regardless of the underlying driver, ADO programming and parameterization is similar with either OLE DB or ODBC. The ADO classic object model is different than its ADO.NET cousin but parameterization concepts and techniques are similar. I’ve included C# examples here only to illustrate the ADO parameter object model differences compared to the previous ADO.NET examples. In practice, one would not typically use ADO classic in managed code.

Parameterized ADO query example:

        connectionString = "Provider=MSOLEDBSQL;Data Source=.;Initial Catalog=YourDatabase;Integrated Security=SSPI";
        var connection = new ADODB.Connection();
        connection.Open(connectionString);
        var selectCommand = new ADODB.Command();
        selectCommand.ActiveConnection = connection;
        selectCommand.CommandText = "SELECT UserName, FirstName, LastName FROM dbo.UserTable WHERE UserName = ?;";
        selectCommand.Parameters.Append(
        selectCommand.CreateParameter("@UserName", ADODB.DataTypeEnum.adVarChar, ADODB.ParameterDirectionEnum.adParamInput, 50, userName));
        object recordCount = 0;
        var recordSet = selectCommand.Execute(out recordCount);
        while (!recordSet.EOF)
        {
            Console.WriteLine($"{recordSet.Fields["UserName"].Value} {recordSet.Fields["LastName"].Value} {recordSet.Fields["LastName"].Value}");
            recordSet.MoveNext();
        }
        recordSet.Close();
        connection.Close();

Parameterized ADO proc call example:

        connectionString = "Provider=MSOLEDBSQL;Data Source=.;Initial Catalog=YourDatabase;Integrated Security=SSPI";
        var connection = new ADODB.Connection();
        connection.Open(connectionString);
        var executeCommand = new ADODB.Command();
        executeCommand.ActiveConnection = connection;
        executeCommand.CommandText = "dbo.usp_GetUser";
        executeCommand.CommandType = ADODB.CommandTypeEnum.adCmdStoredProc;
        executeCommand.Parameters.Append(
        executeCommand.CreateParameter("@UserName", ADODB.DataTypeEnum.adVarChar, ADODB.ParameterDirectionEnum.adParamInput, 50, userName));
        object recordCount = 0;
        var recordSet = executeCommand.Execute(out recordCount);
        while (!recordSet.EOF)
        {
            Console.WriteLine($"{recordSet.Fields["UserName"].Value} {recordSet.Fields["LastName"].Value} {recordSet.Fields["LastName"].Value}");
            recordSet.MoveNext();
        }
        recordSet.Close();
        connection.Close();

ADO Parameters with VBScript

Below is an example stand-alone VBScript example of the query and proc call.

connectionString = "Provider=MSOLEDBSQL;Data Source=.;Database=YourDatabase;Integrated Security=SSPI"
Set connection = CreateObject("ADODB.Connection")
Set command = CreateObject("ADODB.Command")
connection.Open connectionString
command.ActiveConnection = connection
command.CommandText = "SELECT UserName, FirstName, LastName FROM dbo.UserTable WHERE UserName = ?;"
Set parmUserName = command.CreateParameter("@UserName", adVarChar, adParamInput, 50, strUserName)
command.Parameters.Append(parmUserName)
connectionString = "Provider=MSOLEDBSQL;Data Source=.;Database=YourDatabase;Integrated Security=SSPI"
Set connection = CreateObject("ADODB.Connection")
Set command = CreateObject("ADODB.Command")
command.ActiveConnection = connection
command.CommandType = adCmdStoredProcedure
command.CommandText = "dbo.usp_GetUser"
Set parmUserName = command.CreateParameter("@UserName", adVarChar, adParamInput, 50, strUserName)
command.Parameters.Append(parmUserName)

ADO Parameters with ASP classic

The ADO code for classic ASP is the same as the preceding stand-alone VBScript example except that the objects are instantiated using the IIS Server.CreateObject method:

connectionString = "Provider=MSOLEDBSQL;Data Source=.;Database=tempdb;Integrated Security=SSPI"
Set connection = Server.CreateObject("ADODB.Connection")
Set command = Server.CreateObject("ADODB.Command")
command.ActiveConnection = connection
command.CommandText = "SELECT UserName, FirstName, LastName FROM dbo.UserTable WHERE UserName = ?;"
Set parmUserName = command.CreateParameter("@UserName", adVarChar, adParamInput, 50, strUserName)
command.Parameters.Append(parmUserName)
connectionString = "Provider=MSOLEDBSQL;Data Source=.;Database=tempdb;Integrated Security=SSPI"
Set connection = Server.CreateObject("ADODB.Connection")
Set command = Server.CreateObject("ADODB.Command")
command.ActiveConnection = connection
command.CommandType = adCmdStoredProcedure
command.CommandText = "dbo.usp_GetUser"
Set parmUserName = command.CreateParameter("@UserName", adVarChar, adParamInput, 50, strUserName)
command.Parameters.Append(parmUserName)

Parameters Under the Covers

The job of all SQL Server client APIs regardless of programming language is to manage the Tabular Data Stream (TDS) protocol session used for SQL Server communication. This involves establishing the session (network connection and authentication), converting application requests into internal TDS request messages (e.g. batch and RPC requests), sending those to SQL Server, and receiving the results returned by SQL Server over TDS. Although details of the TDS protocol session are an implementation detail transparent to applications, delving into the way TDS requests are sent to the database engine provides valuable insight into the concepts, benefits, and use of parameters in the SQL Server world.

When a non-parameterized query is executed (i.e. an empty parameter collection and CommandType.Text in ADO.NET), the client API sends the command text to SQL Server as a SQL batch request over TDS. There is no provision for parameters in a SQL batch request. The database engine parses the batch text (which may contain multiple statements), generates an execution plan, and executes the batch. If the batch text (including literal values) exactly matches that of a previously executed query (or an auto-parameterized query) that is still cached, the plan is reused. Because a batch request has no parameters, the application must include T-SQL literals within the text for values that vary by execution. This requires the app code to build the query by concatenating SQL language text with literal strings and for SQL Server to parse those literals.

The onus is on the application to construct literal strings embedded within the SQL batch text properly so that they are parsed as desired by SQL Server. This necessitates enclosing literal values in single quotes when required and formatting the string value according to the data type it represents. The app code must double up single quotes within string literal values so that it’s interpreted as a quote instead of literal enclosure. Date/time literals should be formatted as ISO 8601 literal values to ensure the intended value is used regardless of session the DATEFORMAT setting, a common mistake with non-parameterized queries. Decimal literals must specify the ANSI/ISO standard period as the decimal separator instead of a comma used by some cultures. Source values must be validated to ensure adherence to the intended data type and not interpreted as SQL language elements (injecting unintended SQL).

Parameterized queries (e.g. non-empty parameters collection or CommandType.StoredProcedure) are processed differently. The client API sends the command to SQL Server as a remote procedure call (RPC batch) TDS request instead of a SQL batch request. This RPC batch request includes the batch text with the supplied parameter tokens (e.g. CommandType.Text) or stored procedure (e.g. CommandType.StoredProcedure). The actual parameters, containing parameter definition meta-data and values as specified by the client application, are included in separate structures as part the TDS RPC request stream sent to SQL Server. Most importantly, the actual parameter values as passed as native data types as specified by the client application. This has a many positive implications.

The database engine need not parse parameter values since they are already in the native structure of the specified parameter data type, such as a 32-bit integer, datetime structure, decimal structure, Unicode string, etc. SQL Server only checks that parameter values adhere to the TDS protocol spec and then uses the supplied parameter values directly without parsing. There is no notion of datetime formatting, string escape sequences, decimal separators, etc. because no parsed occurs. Unwanted SQL cannot be injected via parameters (barring server-side dynamic SQL) because the values are not part of a T-SQL statement.

I’ll add that the query text you see in a server-side trace of an RPC request is not necessarily what SQL Server actually received by the client API. The query text in trace data is actually a reversed-engineered rendering of the TDS protocol request in familiar T-SQL format. For example, when you pass a table-valued parameter, you’ll see an RPC batch request with batch text containing table variable declared, a bunch of INSERT statements into the variable, followed by it’s use as a parameter in the query or stored procedure. In actuality, the TVP column meta-data tokens were passed over TDS followed by the stream of TVP rows. The database engine bulk-inserted the TVP data into a tempdb structure matching the column meta-data, not individual inserts as the trace suggested.

Similarly, scalar parameter values will show as literal representations in the trace instead of the actual binary structure passed. For example, a datetime value will show as a ‘yyyy-MM-dd HH:mm:ss.fff’ literal (not an unambiguous ISO 8601 datetime format) rather than the actual binary datetime structure passed to SQL Server.

Much of this is implementation details but something you should be aware of when interpreting RPC requests in traces.

Best Practices

Parameter definitions should specify the desired data type, maximum length, precision, and scale of each parameter. Attention to these parameter definition details will help maximize performance by avoiding implicit conversions, promoting sargable expressions, and reusable cached plans.

A common problem (i.e. mistake) is using the AddWithValue method to create parameters. AddWithValue infers the data type and length based on the client variable value and may result in non-sargable expressions and cache bloat as I detailed in this post. The bottom line is that one should use a method that allows specification of the desired SqlDbType, length, etc. such as Add.

Security with Ownership Chains

Security with Ownership Chains

 

Ownership chaining is one of my favorite SQL Server security features.  I like ownership chaining because it allows me to confine data access to specific stored procedures, views and functions while preventing direct access to the underlying tables.  This approach provides a well-defined application interface that insulates applications from the physical implementation and allows me to change things on the back-end as long as I don’t change the interface.  Also, thanks to ownership chaining, I have many options to vertically and horizontally partition data to meet security objectives while ensuring tables are not accessed directly.  

Ignorance is not bliss when it comes to ownership chaining security, though.  I’ve seen cases where data access was inadvertently allowed because ownership chaining wasn’t considered.  It is imperative that SQL Server DBAs understand ownership chaining in the SQL Server security model so that data is properly secured.  I want to point out a couple of important security concepts as it relates to ownership chaining with the hope that it will help SQL Server DBAs implement a security scheme that is appropriate for their environment and also easy to manage.

Ownership Chain Behavior

SQL Server always checks permissions on directly referenced objects using the current connection security context.  However, permissions are evaluated differently when objects are accessed indirectly, such as those used in a trigger, view or stored procedure.  The behavior of ownership chaining is that object permissions are not evaluated when the calling and referenced object owners are the same.  Also, because object permission checking is short-circuited in an unbroken chain, ownership chaining takes precedence over an impersonated security context (EXECUTE AS clause) as well as denied object permissions.   Consequently, permissions on indirectly referenced objects are irrelevant in an unbroken ownership chain.  This ownership chaining behavior gives a DBA control over exactly who can use which objects while preventing ad-hoc access to other objects.

Ownership chaining behavior might not be intuitive at first.  I like to think of it from the perspective that the purpose of granting object permissions is so the object can actually be used.  For example, it is of no use to grant a user SELECT permissions on a view unless data can be retrieved from the view.  The same applies even when a stored procedure or function impersonates with EXECUTE AS; the impersonated principal must be able to use objects referenced by the proc in order for the procedure to be useful.  I keep in mind that with ownership chaining, I effectively grant permissions on exposed object functionality regardless of how the functionality is implemented.

Ownership chaining is limited to a single owner in order to prevent escalation of privileges in environments where non-privileged users can create objects.  Ownership chaining also does not apply to dynamic SQL executed from a stored procedure; the invoking (or impersonated) user needs permissions on objects referenced by dynamic SQL statements.  I should add that a one of the useful purposes of EXECUTE AS and signing modules with certificates is to provide needed permissions in a broken ownership chain or granting statement permissions without granting user permissions directly.  See SQL Server MVP Erland Sommarskog’s article Giving Permissions through Stored Procedures for a discussion of these techniques.

Ownership Best Practices

Ownership is obviously central to ownership chaining.  In the vast majority of environments, it’s probably best to simply inherit the owner (a.k.a. AUTHORIZATION) from the object’s schema instead of overriding on individual objects.  It is possible to assign ownership to individual objects (using ALTER AUTHORIZATION) in order to implement chaining that is independent of schema but I think this is overly complex for most environments.  One reason I can think of why one might specify a different owner at the object level is in the case where a table contains particularly sensitive data and one wants to deliberately prevent inadvertent access via ownership chaining.  I think such cases are rare, though. 

I described some of the practices I follow for schema and object ownership in Keep Schema and Ownership Simple.  In summary, I specify dbo as the schema authorization (owner) unless I want to break the ownership chain between schemas to create a security boundary.  When a security boundary is desired, I create a role (or user without login) with the same name as the schema for the schema authorization.