The Curious Case of Undetected SQL Exceptions

Undetected database errors are insidious. It can be really bad when an error gets dropped on the floor, resulting in incomplete or wrong results. Consider that this simple SELECT query returns an empty result set instead of raising a SqlException for the divide by zero error:

string sqlBatchText = @"
BEGIN TRY

    SELECT 1/0 AS DivideByZeroError;

END TRY
BEGIN CATCH

    THROW;

END CATCH;
";

int rowCount = 0;

try
{

    using (SqlConnection connection = new SqlConnection(@"Data Source=.;Initial Catalog=tempdb;Integrated Security=SSPI"))
    using (SqlCommand command = new SqlCommand(sqlBatchText, connection))
    {
        connection.Open();
        using (SqlDataReader reader = command.ExecuteReader())
        {
            while (reader.Read()) { ++rowCount; };
        }
        Console.WriteLine("{0} rows read", rowCount);
    }
}
catch
{
    throw;
}

One can generally assume SQL errors raised during batch execution will also raise ADO.NET database exceptions. However, there are cases involving multi-statement batches and stored procedures where exceptions are not raised by ADO.NET automatically following SQL errors, as the above example shows. These scenarios can be distilled as:

1) An error is caught in T-SQL while executing a row-returning statement.
2) An error occurs after a row-returning statement successfully executes.

To ensure database exceptions are raised, one must either code T-SQL to avoid these scenarios entirely or ensure the ADO.NET data access layer invokes the data reader NextResult method to consume all results returned even when no more than one result set is expected. Row-returning statements include SELECT (not variable assignment), OUTPUT clause in an INSERT/UPDATE/DELETE/MERGE statement that returns rows to the client, as well as some specialized commands like RESTORE FILELISTONLY, DBCC commands with TABLE_RESULTS, etc.

The defensive programming technique example below executes a command that is expected to return a single result set but still calls NextResult afterwards to ensure database exceptions are raised when SQL errors are subsequently raised in the T-SQL batch.

string sqlBatchText = @"
BEGIN TRY

    SELECT 1/0 AS DivideByZeroError;

END TRY
BEGIN CATCH

    THROW;

END CATCH;
";

int rowCount = 0;

try
{

    using (SqlConnection connection = new SqlConnection(@"Data Source=.;Initial Catalog=tempdb;Integrated Security=SSPI"))
    using (SqlCommand command = new SqlCommand(sqlBatchText, connection))
    {
        connection.Open();
        using (SqlDataReader reader = command.ExecuteReader())
        {
            while (reader.Read()) { ++rowCount; };
            //get any subsequent errors returned
            while (reader.NextResult()) { };
        }
        Console.WriteLine("{0} rows read", rowCount);
    }
}
catch
{
    throw;
}

I use System.Data.SqlClient namespace objects (.NET Framework Data Provider for SQL Server) in this article but the same considerations apply to other ADO.NET managed providers (System.Data.Odbc and System.Data.OleDb namespace) when accessing SQL Server.

Regardless of the method one uses to execute SQL Server queries, ADO.NET uses a data reader to return command results even when higher-level objects (e.g. Dataset) or ORMs (e.g. Entity Framework) are used. The low-level ADO.NET command ExecuteReader method exposes the data reader whereas ExecuteScalar and ExecuteNonQuery do not expose the internal reader.

ExecuteScalar returns the first column of the first row returned as a scalar value but doesn’t call NextResult on the internal data reader to retrieve subsequent results. Consequently, errors may go undetected with ExecuteScalar. ExecuteScalar will not raise an exception if a T-SQL error occurs after the first row is returned. Also, if no rows are returned because the row-returning statement erred and the error was caught in T-SQL, ExecuteScalar returns a null object without raising an exception.

ExecuteNonQuery executes the entire batch of statements and returns the accumulated count of affected rows as a scalar value, discarding rows returned (if any). The returned value will be -1 if SET NOCOUNT ON is specified. Because ExecuteNonQuery internaly consumes all results in the process, errors will be raised without additional ADO.NET programming, albeit one doesn’t typically use ExecuteNonQuery to execute a batch that returns rows. Again, the ADO.NET error detection issue only applies to row-returning statements.

The remainder of this article discusses T-SQL error handling and ADO.NET defensive programming techniques in more detail and discusses techniques to avoid undetected database errors in ADO.NET.

T-SQL Error Handling Objectives
T-SQL and ADO.NET data access code must work in concert with one another to ensure SQL errors are detected in application code. The T-SQL constructs used in multi-statement batches can affect if and how when errors are reported by ADO.NET during batch execution. I’ll start by citing core T-SQL error handling objectives, which can be summarized as:

1) Ensure a multi-statement T-SQL batch doesn’t continue after an error occurs.
2) Rollback transaction after errors.
3) Raise error so that the client application is aware a problem occurred.

The T-SQL building blocks used to achieve these objectives are:
1) SET XACT_ABORT ON
2) Structured error handling (SEH) (a.k.a. TRY/CATCH)
3) Control-of-flow (e.g. IF @@ERROR GOTO ErrorHandler)

T-SQL Behavior Without SEH and XACT_ABORT ON
When a runtime error occurs with the XACT_ABORT session setting ON outside a TRY block, SQL Server will stop batch execution immediately, rollback the transaction (if any), and raise the error. Consequently, a single SET XACT_ABORT ON statement will meet all aforementioned error handling objectives without T-SQL procedural code. However, the XACT_ABORT setting is not considered when user-defined errors are raised with RAISERROR so control-of-flow statements are required to meet objectives #2 and #3 when RAISERROR is employed.

SET XACT_ABORT ON also rolls back open transactions following an attention event like an explicit cancel or query timeout, which would otherwise leave the transaction open. This is one reason why I strongly recommend using SET XACT_ABORT ON, especially in procs that include BEGIN TRAN, regardless of whether or not SEH is also used.

T-SQL Behavior Without SEH and XACT_ABORT OFF
When an error occurs with the SET XACT_ABORT session setting OFF and SEH is not used, SQL Server will raise the error immediately but, depending on the error and severity, batch execution might continue and the transaction not rolled back. The T-SQL batch must use control-of-flow statements after each statement to avoid continuing after errors and roll back the transaction (objectives #1 and #2).

T-SQL Behavior With T-SQL Structured Error Handling
When an error occurs during statement execution with a T-SQL structured error handler is in scope, the CATCH block error handler is entered, after marking the transaction uncommittable if SET XACT_ABORT is ON. SEH meets the first error handling objective by skipping subsequent statements in the TRY block after an error. It is the responsibility of the error handling code in the CATCH BLOCK to roll back the transaction if needed and raise the error. The simple T-SQL handler below achieves objectives #2 and #3 in Azure SQL Database and SQL Server 2012 and later:

BEGIN CATCH
    IF @@TRANCOUNT > 0 ROLLBACK;
    THROW;
END;

THROW is not available In SQL 2008 R2 and earlier so one must use RAISERROR instead in older versions. The error handler below provides similar functionality a THROW, although RAISERROR obfuscates the original error as a user error with message number 50000+.

BEGIN CATCH
    DECLARE
          @ErrorNumber int = ERROR_NUMBER()
        , @ErrorMessage nvarchar(2048) = ERROR_MESSAGE()
        , @ErrorSeverity int = ERROR_SEVERITY()
        , @ErrorState int = ERROR_STATE()
        , @ErrorLine int = ERROR_LINE();

    RAISERROR('Error %d caught at line %d: %s'
        , @ErrorSeverity
        , @ErrorState
        , @ErrorNumber
        , @ErrorLine
        , @ErrorMessage);
END CATCH;

T-SQL Interaction with ADO.NET
ADO.NET will reliably detect errors in batches without T-SQL SEH when no result sets are returned or only the last statement in the batch returns rows. When a row returning statement is not the last statement in the batch, ADO.NET code must call NextResult to ensure ADO.NET raises errors for subsequent statements in the batch that may have erred. Avoid row returning statements that are not the last one in a batch unless you are certain the calling code consumes all results with NextResult.

Using T-SQL SEH when result sets are returned has implications on ADO.NET data access code to ensure database exceptions are raised after SQL errors. As illustrated with the ExecuteReader example at the beginning of this article, when an error is caught in T-SQL during a row-returning statement, ADO.NET will not raise the database exception immediately and instead return the partial or empty result set generated by the failed statement. The next data reader Read method call will return false after the error. The error raised in the CATCH block by the T-SQL THROW or RAISERROR statement is considered a separate result by ADO.NET and will not be raised as a database exception until NextResult is called.

Unless you have a specific reason to use SEH in row-returning batches, I suggest instead using SET XACT_ABORT ON alone as this will address core T-SQL error handling objectives and allow ADO.NET to detect errors without calling NextResult. However, as mentioned earlier, ADO.NET code will still need to call NextResult if the row-returning statement is not the last statement in the batch.

Summary
I hope this information will help you ensure database errors in multi-statement batches are detected. The interaction between T-SQL and ADO.NET isn’t as intuitive as it could be.

Deprecated SQL Server Data Access Technologies

I hope the warning excerpt below from the SQL Server Books Online is not a surprise to you (emphasis mine):

Warning:
SQL Server Native Client (SNAC) is not supported beyond SQL Server 2012
. Avoid using SNAC in new development work, and plan to modify applications that currently use it. The Microsoft ODBC Driver for SQL Server provides native connectivity from Windows to Microsoft SQL Server and Microsoft Azure SQL Database.

Let me first mention that the Books Online should have made it clear that this warning applies only to SNAC data access by applications. The warning does not apply to Microsoft SQL Server features and products that use SNAC as part of the OLE DB stack, like Linked Servers, SSIS, and SSAS. Microsoft SQL Server 2012 Native Client is still installed and used as a component dependency to support these features in SQL Server 2014 and later versions, although that may change in the future.

More concerning is OLE DB in general. Even before SQL Server 2012 was released, Microsoft announced in August, 2011 that ODBC was the preferred technology for relational database access and that OLE DB was deprecated. Below is an excerpt from the Microsoft is Aligning with ODBC for Native Relational Data Access FAQ which should be alarming to folks who continue to use Microsoft OLE DB providers to access SQL Server 2014 and later databases (note Denali is the code name for SQL Server 2012 and, again, emphasis mine):

Question6: If I have an OLE DB application that I write for Denali, will it be supported on a post Denali version of SQL Server that is released during the life of Denali?

Answer: No, in fact we may explicitly block the OLE DB applications on post-Denali versions of SQL Server. It is recommended that you plan your migration soon to ODBC, if you want to start using newer versions of SQL Server as soon as they release.

The explicit block of OLE DB data access has yet not happened (as of this writing) but I think it’s prudent to heed the announcement and warning, at least for applications that currently use SQL Server 2014 and later versions as well as SQL Azure Database, or may need to use newer SQL versions in the future. Microsoft has a history of going out of their way to provide backwards compatibility in the SQL Server space but it’s been 3 SQL Server versions (soon to be 4 versions) since the deprecation announcement.

The bottom line is that native Windows applications (e.g. unmanaged C++, VB6, VBA, Classic ASP, etc.) should generally use ODBC and the latest Microsoft ODBC Driver for SQL Server. This stand-alone ODBC driver is a free download from https://docs.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server. Not only should SNAC not be used against SQL Server 2014 and later versions, the deprecated SQL Server OLE DB provider and ODBC driver that ship with Windows should also be avoided. This practice will facilitate more seamless upgrades to new versions of SQL Server and Azure SQL Database.

Preparing for the Future
The message is loud and clear that ODBC is the supported and preferred path for native applications going forward. The Data Access Technologies Road Map provides an overview and history of Microsoft data access technologies, which I recommend you peruse to ensure you are not inadvertently using deprecated or unsupported technologies for new development and, for existing applications, consider moving from legacy data access technologies to current ones when practical.

The current Microsoft ODBC Driver for SQL Server as of this writing is ODBC Driver 13 for SQL Server. Note that that both the 13.0 and 13.1 versions of this driver have the same “ODBC Driver 13 for SQL Server” display name listed under installed programs and ODBC Data Source Administrator. If installed, the driver will be listed under installed programs along with the corresponding driver version (when viewed detail mode). The 13.1 version adds support for the Always Encrypted feature. These ODBC Drivers are available from the link I mentioned earlier.

The remainder of this article reviews common data access technologies for native (unmanaged) Windows applications that are explicitly identified as unsupported, deprecated, or may have a limited future along with remediation considerations. This isn’t to say these technologies won’t work, just that using them may block upgrades to new versions of SQL Server and Azure SQL Database as well as prohibit using new SQL Server features. Applications that are aligned with current technologies are much better positioned for seamless SQL Server upgrades, both on-prem and in the cloud.

Deprecated Windows Data Access Components
Windows Data Access Components (WDAC), which was formally known as Microsoft Data Access Components (MDAC) in older Windows versions, is included with Windows to provide data access infrastructure for ODBC, OLE DB, ADO Classic, and managed ADO.NET out-of-the box. WDAC includes a mix of deprecated, mature, and modern components. WDAC is part of the operating system and maintained by Windows Update.

Both the SQL Server ODBC driver and OLE DB provider included with WDAC are deprecated; these are provided only for legacy application backwards compatibility and should not be used for new application development. The WDAC ODBC driver named “SQL Server” and OLE DB provider named “Microsoft OLE DB Provider for SQL Server” (SQLOLEDB) were both deprecated in favor of SQL Server Native Client when SQL Server 2005 was released for over 10 years ago. Not only do these old components not support new data types introduced after SQL Server 2000 directly, newer features like Availability Groups, MARS, Always Encrypted, and idle connection resiliency are not supported either. Native applications should install and use a separately installed SQL Server driver rather instead of the WDAC “SQL Server ODBC driver, SQLOLEDB provider, or SNAC ODBC driver/OLE DB provider.

WDAC also includes the Microsoft OLE DB Provider for ODBC Drivers (MSDASQL), which acts as a bridge to allow OLE DB applications to use an ODBC driver instead of an OLE DB provider for low-level data access. The main purpose of MSDASQL was to allow OLE DB programs to access RDBMS products other than SQL Server that had an ODBC driver but no OLE DB provider, with the expectation that vendors would eventually jump on the OLE DB bandwagon. However, that didn’t happen widely because OLE DB relies on Windows-only COM interfaces and many vendors preferred to support only ODBC interfaces. Microsoft recognized the value of ODBC for cross-platform SQL Server data access, which is one of the reasons called out for OLE DB deprecation. Although MSDASQL is an OLE DB provider, it is not deprecated for SQL Server relational data access because it uses ODBC to access the data store. MSDASQL can facilitate transitioning to ODBC in existing OLE DB applications. I’ll discuss this in more detail shortly.

ADO (ActiveX Data Objects, not to be confused with ADO.NET) components are included with WDAC. Although not deprecated, ADO a mature OLE DB technology that hasn’t been enhanced since ADO 6.0 was released with WDAC in Windows Vista over 10 years ago. ADO types do not directly support data types introduced after SQL Server 2000, regardless of the underlying provider/driver used. I’ll leave speculation on the future of ADO as an exercise for the reader.

Changing Existing ODBC Applications
Changing an existing ODBC application (ADO or direct ODBC function calls) to use the Microsoft ODBC Driver for SQL Server is often simply a matter of installing the driver and changing the connection string or DSN to use it. Unmanaged C++ applications that use the ODBC call level interface directly may need to be recompiled using the header files included with the SDK install of the ODBC driver. It’s often trivial to switch from SNAC or the WDAC SQL Server ODBC driver. Application testing should be done to ensure compatibility but ODBC driver upgrades are usually transparent.

Changing from OLE DB to ODBC in ADO Classic
There is still quite a bit of OLE DB SQL Server access in the wild in my experience, varying from C++, VBA, Classic ASP, and even in VB.NET apps migrated from VB6 that were never retrofitted to use to SqlClient (which should be done as it provides high-performance managed data access for .NET applications). From unmanaged code, ADO provides an easy to use object-oriented COM interface on top of the 100+ complex OLE DB interfaces. Low-level data access is performed by the OLE DB provider specified in the connection string or connection object. MSDASQL (Microsoft OLE DB Provider for ODBC Drivers) is the default provider in ADO so ODBC is used when no OLE DB provider is specified. When MSDASQL is used explicitly or by default, the connection string must specify either a DSN or ODBC driver.

Below are DSN-less ADO ODBC connection string examples that use the Microsoft ODBC Driver for SQL Server with and without a trusted connection. “Provider=MSDASQL” could have been specified in the connection string but is not technically required since it’s the default provider:

DataSource=YourServer;Driver={ODBC Driver 13 for SQL Server};Database=YourDatabase;Trusted_Connection=Yes
DataSource=YourServer;Driver={ODBC Driver 13 for SQL Server};Database=YourDatabase;Trusted_Connection=Yes;UID=YourUser;PWD=YourPassword

The examples below use an ODBC DSN with the default database specified in the DSN configuration. The first uses Integrated Windows authentication (specified in the DSN configuration) and the second example is functionally identical except using SQL Server authentication. Be aware that the Microsoft ODBC Driver for SQL Server does not store user credentials in the DSN configuration so those must be specified by the application:

DSN=YourOdbcDataSource
DSN=YourOdbcDataSource;UID=YourUser;PWD=YourPassword

ADO applications use a handful of objects to interact with SQL Server (mostly connection, command, recordset, record, transaction, parameter, and field objects). These objects abstract the implementation details such that it is possible to use ADO with any DBMS product, OLE DB provider, or ODBC driver as long as the low-level driver/provider supports the requested functionality. One can theoretically transition from OLE DB to ODBC with only a connection string change. However, the devil is in the details when switching from OLE DB to ODBC via MSDASQL.

MSDASQL converts OLE DB method calls into their equivalent ODBC function calls. Commonly used fast-forward read-only client-side cursors (a.k.a. firehose cursor, same as the only type ADO.NET uses) tend to work well in my experience. However, ADO provides a plethora of options for cursor location, execution, cursor types, and locking modes that ADO and MSDASQL may or may not translate well to ODBC equivalents. If you run into problems during testing with ODBC, make sure the requested ADO options are appropriate for the task at hand. It is common for ADO applications to use advanced options and pessimistic locking inappropriately. A firehose cursor with optimistic locking is often best for the task at hand and will perform better too.

Like ADO, MSDASQL is a mature technology that hasn’t gotten much love lately. I could be wrong but I don’t expect patches that are not security related. If you run into issues with MSDASQL, you may be better off developing a work-around rather than wait for a fix.

Changing OLE DB Interface Calls to ODBC
C++ applications that use OLE DB interfaces directly rather than the higher-level ADO API can be converted to ODBC by using the MSDASQL provider similarly as discussed in the ADO topic. The same considerations apply.

Note that MSDASQL does add another layer, which may be an issue for applications that are especially performance-sensitive. The best approach for C++ applications that are expected to be around for a while might be to move from OLE DB to ODBC function calls (or via MFC). This will provide the highest level of performance and alignment with ODBC.

Remediation Summary
Inasmuch as I recommend staying aligned with product technology lifecycles, whether or not one should modify an existing application to use currently data access technologies is ultimately a business decision. Business benefits include not only supportability but also agility in making functional enhancements that leverage newer features as well as non-functional ones, such as support for Availability Groups. The costs of the development and testing needed to use the Microsoft ODBC Driver for SQL Server will vary considerably depending on application design. Those costs need to be weighed against the benefits of the effort.

That being said, I always recommend one avoid deprecated technologies in new development work even if that requires moving outside one’s comfort zone. Technical debt is a hidden cost that often needs to be paid back with interest.

p.s. For history buffs, this post by Hal Berenson, retired Microsoft Distinguished Engineer and General Manager, provides additional background on OLE DB and data access technologies.

SQL Server TVP Performance Gotchas

Table-valued parameters have important considerations that developers and DBAs need to be aware of. It is essential that application code specify the proper data type and length for TVP columns in order to achieve optimal performance and reduce unnecessary overhead. Additionally, a trace (Extended Events, server-side SQL Trace, Profiler) that captures RPC events of an inappropriately defined TVP can not only exacerbate performance issues, but affect stability of the SQL Server instance in some cases.

The graph below summarizes the impact the application code max column length specification can have on performance, without and with a trace running. All tests used the same table type of 10 varchar(50) columns and a 10,000 row TVP rows passed via a DataTable object of 10 string columns. The only variables were the max column length specified by the app code and a trace running on the database server. The client application was run on a different machine than the database server and elapsed time measured by the application.

TVP Performance Comparison
Figure 1: Impact of TVP maximum column length specification with and without tracing

The average duration was 113ms without a trace running when the app code used the default -1 max column length. However, when max column length 50 was specified (matching the varchar(50) column of the table type), the average duration dropped significantly to 75ms. The trivial code change of specifying the string column max length of 50 improved performance by 33%.

I then ran the same pair of tests while a trace captured the RPC completed events. The average duration of the default max length test increased from 113ms to 9.324ms with the trace running, an over 80x degradation in performance! The test with the explicit 50 character max length was not nearly impacted as much, increasing from 75ms to 89ms (which is tolerable, IMHO).

TVP Internals
Under the hood, TVP data are passed to SQL Server over the Tabular Data Stream (TDS) protocol. The client API sends TVP column meta-data to SQL Server describing the data type, length, and other meta-data for each TVP column followed by data rows with each column in native format matching the preceding data type specification. SQL Server uses the provided column meta-data to prepare and fill buffers for efficient processing on the server side, leveraging native types to eliminate parsing overhead similarly to other parameterized queries.

Before a query or stored proc with a TVP starts executing, SQL Server creates a table in tempdb with the same schema as the parameter table type and uses bulk insert internally to efficiently load the table with TVP rows streamed by the client application. The size of a TVP is constrained only by available tempdb storage. SQL Server executes the query/proc after the TVP temp table is loaded and the parameterized T-SQL query/proc can then use the TVP data.

The TVP columns provided by the client application do not have to match the schema of the target table type; SQL Server implicitly converts TVP values to match the target table type column when data types differ. Although not optimal, implicit conversion is generally not a major factor in overall TVP performance.

The application-specified TVP max column length can impact performance significantly, and in some cases dramatically, as illustrated by the performance tests shown earlier. SQL Server prepares to receive TVP data up to the max length specified by the client application rather than the defined size of the target table type column. When the specified max length of variable length columns exceed the 8000 byte tipping point, SQL Server uses a different code path to allow for large object (LOB) values up to 2GB. Unless the table type actually contains LOB values (varchar(MAX), nvarchar(MAX)), database server resources are wasted unnecessarily when an inappropriate max column length is specified.

LOB values are especially problematic when a trace captures the RPC completed event of a TVP query. Tracing uses memory from the OBJECTSTORE_LBSS memory pool to build trace records that contain TVP LOB values. From my observations of the sys.dm_os_memory_clerks DMV, each LOB cell of a TVP requires about 8K during tracing regardless of the actual value length. This memory adds up very quickly when many rows and lob columns are passed via a TVP with a trace running. For example, the 10,000 row TVP with 10 LOB columns used in the earlier test required over 800MB memory for a single trace record. Consider that a large number of TVP LOB cells and/or concurrent TVP queries can cause queries to fail with insufficient memory errors. In extreme cases, the entire instance can become unstable and even crash under due to tracing of TVP queries.

Specifying Proper TVP Parameter Column Meta-Data
A SQL Server development best practice has long been to use strongly-typed parameters with attention to detail regarding the parameter data type and length such that it is consistent with the types on the server. This practice improves performance by avoiding implicit data type conversions, promotes sargable expressions, avoids unnecessary procedure cache bloat, and inherently validates data for proper typing on the client before it is sent to the database server. With scalar parameters, one need only specify the correct SqlDbType along with the proper length, precision/scale (avoiding AddWithValue method to add parameters) and all is well in the world.

TVP parameters require additional column meta-data not applicable to scalar parameters. The parameter data type of a TVP in .NET is always SqlDbType.Structured. The additional TVP column meta-data is inferred from the supplied parameter value, which may be a DbDataReader, IEnumerable, or DataTable object. These objects inherently contain column meta-data and methods enumerate rows, which the SqlClient API uses to send the TVP to SQL Server.

DataTable objects are most commonly used as TVP values. DataTables are easy to use and can serve as containers for data beyond just TVP usage. But unlike DbDataReader and IEnumerable objects, a big gotcha with a DataTable is that the default data type String with maximum length of -1 (2GB LOB). This is the .NET equivalent of the SQL Server nvarchar(MAX) data type and has many insidious and negative implications with a TVP. First, values of types other than string that are added to a DataTable string column will be converted to string (DateTime, Integer, GUID, etc.). Consequently, using the default DataTable column string data type for non-string types will:

• increase client memory requirements compared to more compact native types
• incur conversion overhead
• prevent strong-typed data validation on client side
• require date format aware formatting of date and datetime values
• require using a period as decimal separators
• increase network usage compared with smaller native types

When String is the proper column data type, developers must be especially mindful of the max length specification when the DataTable is used as the TVP value. Strings in Windows and .NET are Unicode, requiring 2 bytes per character. This means a max length of over 4000 characters will cross the 8000 byte threshold for LOB data on the server side regardless of the table type on the server. Avoid using long string columns in TVPs when many rows are passed and never use the default -1 length unless a MAX type is actually intended.

DBAs who support applications that use TVPs should be aware of the repercussions tracing can have on performance and SQL Server memory. Avoid capturing TVP RPC completed events of large TVP requests, if possible. When tracing TVP RPC completed events, monitor the OBJECTSTORE_LBSS memory pool for excessive memory usage.

Acknowledgements
I’d like the thank SQL Server MVP Ola Hallengren for his suggestion for me to write this article.

Help Stop SQL Injection Madness

SQL injection continues to be the most exploited security vulnerability on the internet. A SQL injection vulnerability can put companies out of business and expose innocent people to sensitive data disclosure and damage, all due to developer ignorance or carelessness. Sadly, although there is much information available on SQL injection exploits and mitigation, SQL injection exploits continue because developers use flawed database access techniques. This article details the best, and simplest, method to prevent SQL injection in SQL Server: parameterized SQL statements.

Parameterized SQL queries and stored procedure calls (with CommandType.StoredProcedure and no dynamic SQL within proc code) prevent unintended SQL from being executed. I’ll discuss exactly why shortly. Parameterized SQL also provides many benefits besides security. Notably:

  • allows quotes in data without programmatic escaping
  •  avoids the need to format date strings according to DATEFORMAT session settings
  • handles decimal separators automatically
  •  improves performance due to plan reuse
  •  provides cleaner code

Most articles on SQL injection focus on validating user input rather than parameterized queries.  Input validation certainly has benefits related to data integrity and user-friendly validation messages but it is at best a secondary defense against injection compared to parameterized queries.  Input validation using a blacklist of prohibited keywords and characters is especially weak.

SQL Injection Overview
I am puzzled by the number of examples in articles and forum posts that show building SQL statements with literals using a string concatenation technique.  I suspect the main reason for this practice is that developers write code that generates the exact same ad-hoc SQL statement they would write using a query tool like SQL Server Management Studio or Visual Studio.  This leads to the nasty habit of not using parameters.  For example, a developer might develop and test a query like this in SSMS:

SELECT FirstName, LastName, EmailAddress
FROM dbo.Customer
WHERE CustomerID = 12345;

After testing the query, the C# developer includes the SQL statement in the application code but, instead of a hard-coded value for CustomerID, the desired value is concatenated with the remainder of the SQL statement as a literal.

var command = new SqlCommand(
             "SELECT FirstName, LastName, EmailAddress" +
            " FROM dbo.Customer " +
            " WHERE CustomerID = '" + txtBoxCustomerID.Text + "';"
            ,connection);

The query runs perfectly well and returns the expected result in testing.  It is not until after production deployment that a script kiddie or hacker finds the SQL injection vulnerability.  A malicious person can easily manipulate this SQL statement in ways unintended by the developer.  For example, a hacker could enter CustomerID value “1 OR 1 = 1” in the text box to harvest the names and email addresses of all customers in the database.  This vulnerability can also be exploited to execute additional SQL statements in the same batch, which is often used to update data with malicious html script that is subsequently returned and executed by unsuspecting client browsers.  There are many ways to exploit a SQL injection vulnerability that result in sensitive data disclosure and/or database manipulation.  I won’t detail those here but rather recommend a very simple defense; use a parameterized statement instead of literals.

Mitigating SQL Injection
Below is a functionally identical parameterized query.

SELECT FirstName, LastName, EmailAddress
FROM dbo.Customer
WHERE CustomerID = @CustomerID;

This query can be incorporated into the C# application code with the parameter and corresponding value added to the parameterized SQL command as a strongly-typed parameter:

var command = new SqlCommand(
             "SELECT LastName" +
            " FROM dbo.Customer " +
            " WHERE CustomerID = @CustomerID;"
            ,connection);
command.Parameters.Add("@CustomerID", SqlDbType.Int).Value = int.Parse(textBoxCustomerID.Text);

As you can see, the effort needed to do the same job with a parameterized command is about the same as the string concatenation method, yet the security benefits are immeasurable.

Why Parameters Prevent SQL Injection
Parameterized SQL is more secure for a couple of reasons.  In the case of an ad-hoc query, the SQL statement with parameter markers is hard-coded in the application (or generated via an ORM framework) and passed to SQL Server as-is.  Parameter values are passed separately rather than inside the SQL statement itself.  This ensures the SQL statement cannot be modified by the values provided.

Similarly, a stored procedure call (using CommandType.StoredProcedure) sends only the stored procedure name to SQL Server with separately passed parameter values.  Using stored procedures also allow one to more strictly adhere to the security principal of least privilege.  By granting only stored procedure execute permissions, permissions on indirectly referenced objects are not needed as long as the ownership chain is unbroken.  This practices limits ad-hoc access to underlying data should an attacker find a back door into the database under the application security context.  Personally, I’m a staunch advocate of stored procedures for this and other reasons too.  Stored procedures provide a well-defined database interface, allowing queries and database schema to be refactored without breaking changes.

I should add that it is perfectly acceptable to build a SQL statement dynamically in either application code or stored procedures, but only if parameterized and actually needed for the task at hand.  A common use case is a dynamic search query that is constructed based on user criteria. Instead of coding a separate query for each possible search permutation, one can add parameterized WHERE clauses as needed.  Under no circumstances should a SQL statement string be built with concatenation of untrusted values.

Additional Precautions
Use Windows Authentication for SQL Server database access.  This practice eliminates the need to store security credentials in application configuration files.  If SQL authentication must be used, protect the credentials with encryption.  This is easily done in .NET by encrypting the connectionStrings section of the app.config file.  See http://msdn.microsoft.com/en-us/library/ms254494.aspx.

Adhere to Principal of least privilege.  Never use a privileged account, such as a sysadmin role member, for routine application access.  Grant permissions only on database objects which are used directly by applications.  Use stored procedures so that one can leverage ownership chaining, eliminating the need to grant permissions on tables directly.

Specify strongly-typed parameters.  The parameter data type can be inferred (e.g. using SqlCommand.Parameters.AddWithValue method) but a best practice is to explicitly specify a parameter data type that matches the underlying column data type and maximum length.  These practices will minimize the size of the SQL Server procedure cache and avoid implicit conversions that can negatively affect performance.

Maximizing Performance with Table-Valued Parameters

I’ve seen TVPs improve performance by orders of magnitude when used appropriately.  This feature isn’t used as often as it should be so I’ll show how to implement a TVP using a C# application example and discuss TVP considerations.

Why Table-Valued Parameters Improve Performance
TVPs allow one to pass multiple rows or values at once to a query or stored procedure.  By doing so, the query can leverage set-based operations to greatly improve performance compared to executing the query/proc many times.  TVPs decrease network latency by reducing network round trips.  Prior to TVPs, one had to employ workarounds like passing a string containing a delimited list of records (or XML) and parsing on the SQL side.  Those techniques are at best a kludge.  Instead TVPs are the right tool for the job in SQL Server 2008 and later.

TVPs (and table variables) are temp tables.  TVPs and table variables are persisted in tempdb much the same way as regular temp tables (# prefix).  The big differences are that TVPs and table variables are limited in scope to the current batch and do not have statistics.  Also, table parameters/variables allow only indexes declared via primary key nor unique constraints; non-unique indexes are not supported.

When a TVP Is passed to SQL Server from a client application, the data is bulk-inserted into tempdb where it is made available to the SQL statement or stored procedure using the declared TVP.  This bulk insert is handled by the client API transparently to the application and allows a large number of rows to be passed to SQL Server efficiently.  In T-SQL, one can declare and load a table variable, where it can be used directly in a SQL statement or passed as a TVP to a stored procedure or query (invoked with sp_executesql).

The exact threshold where TVPs outperform single-row operations is much lower than you might expect.  Your mileage may vary but, at least in my experience, it’s only a few rows.  The performance tests I ran here show that the threshold where a TVP outperforms individual requests with a simple list of customers is only 4 rows.  That said, I would not recommend TVPs for single-row operations due to the start-up overhead but it’s a good choice when the most common use case is multiple rows.

How to Use TVPs
In order to pass a TVP to a SQL statement or stored procedure, one must first create a user-defined table type in SQL Server.  The table type defines the schema of the table SQL Server expects for the TVP.  Listing 1 shows the CREATE DDL for the table type I use in the AdventureWorks2012 database to pass a list of customers to the stored procedure in Listing 2 that returns all orders for these customers.  Note that the stored procedure TVP must be declared as READONLY.

Listing 1: Table type DDL

CREATE TYPE dbo.CustomerList AS TABLE(
CustomerID int NOT NULL
PRIMARY KEY
);

Listing 2: Stored procedure to return orders for multiple customers

CREATE PROC dbo.usp_select_orders_for_multiple_customers
@CustomerList dbo.CustomerList READONLY
AS
SELECT soh.CustomerID, soh.SalesOrderID
FROM Sales.SalesOrderHeader AS soh
JOIN @CustomerList AS cl ON
cl.CustomerID = soh.CustomerID;
RETURN @@ERROR;

TVPs are passed from a .NET application using parameter data type DbType.Structured.  The actual parameter value can be an object of type DataTable, DbDataReader or IEnumberable<SqlDataRecord>.  A DataTable is appropriate when the required TVP data is already in a DataTable object needed for other purposes.  A DbDataReader is a good choice when the source data is the result of a query (note that any DbDataReader may be used, not just a SqlDataReader).   For other cases, I recommend an IEnumerable<SqlDataRecord>.  It is very easy to implement IEnumerable<SqlDataRecord>, especially if you already have an enumerable object like an array or collection containing the values you need to pass.  See the code at the end of this article for an example class that implements IEnumerable<SqlDataRecord> for a TVP.

A benefit with DbDataReader and IEnumerable<SqlDataRecord> is that the source data can be streamed to SQL server without first loading all the TVP data into memory in either the client application or in SQL Server.  This is a big consideration when a lot of data needs to be passed because you are not constrained by memory on ether the client or SQL Server side.

Listing 3 shows the stored procedure I use to compare performance with the TVP method.  This proc takes only a single @CustomerID parameter so it must be called once for each customer.

Listing 3: Stored procedure to return orders for a single customer

CREATE PROC dbo.usp_select_orders_for_single_customer
@CustomerID int
AS
SELECT soh.CustomerID, soh.SalesOrderID
FROM Sales.SalesOrderHeader AS soh
WHERE
soh.CustomerID = @CustomerID;
RETURN @@ERROR;

The complete C# console application I used for the performance tests is in Listing 4.  The code demonstrates the 3 ways to pass a TVP from C# and logs timings for each invocation.  As you can see in Figure 1, the TVP outperformed individual calls once more than 3 customers were passed.  Note that I had to use a logarithmic y-axis scale because of the huge performance difference.  For example, 10,000 individual calls averaged about 1.5 seconds whereas passing 10,000 customers via a TVP took less to 100 milliseconds.

Figure 1:  TVP versus individual call performance
TVP performance graph

Listing 4: Passing TVPs in C#

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;
using System.Data.SqlClient;
using System.Diagnostics;
using Microsoft.SqlServer.Server;
using System.IO;

namespace TVPTest
{
    class Program
    {

        private static string connectionString = @"Data Source=MyServer;Initial Catalog=AdventureWorks2012;Integrated Security=SSPI";
        private static StreamWriter logFile = new StreamWriter(@"TvpTest.log", false);

        static void Main(string[] args)
        {

            // write column headers to log file
            logFile.WriteLine("{0},{1},{2}"
                , "Test Name"
                , "Customers"
                , "Durations");

            // run each performance test 100 times
            for (int i = 0; i < 100; ++i)
            {
                runPerformanceTest(1);
                runPerformanceTest(2);
                runPerformanceTest(3);
                runPerformanceTest(4);
                runPerformanceTest(5);
                runPerformanceTest(10);
                runPerformanceTest(25);
                runPerformanceTest(50);
                runPerformanceTest(100);
                runPerformanceTest(500);
                runPerformanceTest(1000);
                runPerformanceTest(2500);
                runPerformanceTest(5000);
                runPerformanceTest(10000);
                logFile.Flush();
            }

            logFile.Close();

            Console.WriteLine("Press any key to close.");
            Console.ReadKey();
        }

        private static void runPerformanceTest(int customerCount)
        {

            Console.WriteLine("{0} customers:", customerCount);

            DataTable orderDataTable;
            Stopwatch sw;
            DateTime duration;

            //individual call baseline
            sw = Stopwatch.StartNew();
            orderDataTable = getOrdersForCustomersIndividually(customerCount);
            sw.Stop();
            duration = new DateTime(sw.Elapsed.Ticks);
            Console.WriteLine("\tIndividual call: {0} orders returned, duration is {1}"
                , orderDataTable.Rows.Count, duration.ToString("HH:mm:ss.ffffff"));
            logFile.WriteLine("{0},{1},{2}"
                , "Individual call"
                , customerCount
                , duration.ToString("HH:mm:ss.ffffff"));

            //TVP from DataTable
            sw = Stopwatch.StartNew();
            orderDataTable = getOrdersForCustomersUsingTvp(getCustomerListDataTable(customerCount));
            sw.Stop();
            duration = new DateTime(sw.Elapsed.Ticks);
            Console.WriteLine("\tTVP DataTable: {0} orders returned, duration is {1}"
                , orderDataTable.Rows.Count, duration.ToString("HH:mm:ss.ffffff"));
            logFile.WriteLine("{0},{1},{2}"
                , "TVP DataTable"
                , customerCount
                , duration.ToString("HH:mm:ss.ffffff"));

            //TVP from DataReader
            sw = Stopwatch.StartNew();
            orderDataTable = getOrdersForCustomersUsingTvp(getCustomerListDataReader(customerCount));
            sw.Stop();
            duration = new DateTime(sw.Elapsed.Ticks);
            Console.WriteLine("\tTVP DataReader: {0} orders returned, duration is {1}"
                , orderDataTable.Rows.Count, duration.ToString("HH:mm:ss.ffffff"));
            logFile.WriteLine("{0},{1},{2}"
                , "TVP DataReader"
                , customerCount
                , duration.ToString("HH:mm:ss.ffffff"));

            //TVP from IEnumerable
            sw = Stopwatch.StartNew();
            orderDataTable = getOrdersForCustomersUsingTvp(getCustomerListSqlDataRecords(customerCount));
            sw.Stop();
            duration = new DateTime(sw.Elapsed.Ticks);
            Console.WriteLine("\tTVP IEnumerable: {0} orders returned, duration is {1}"
                , orderDataTable.Rows.Count, duration.ToString("HH:mm:ss.ffffff"));
            logFile.WriteLine("{0},{1},{2}"
                , "TVP IEnumerable"
                , customerCount
                , duration.ToString("HH:mm:ss.ffffff"));

            Console.WriteLine();

        }

        /// 
        /// Call proc individually for each customer
        /// 
        /// Number of customers to return
        /// DataTable of customer orders
        private static DataTable getOrdersForCustomersIndividually(int customerCount)
        {
            using (var connection = new SqlConnection(connectionString))
            using (var command = new SqlCommand(@"dbo.usp_select_orders_for_single_customer", connection))
            {
                command.CommandType = CommandType.StoredProcedure;
                var customerIdParameter = command.Parameters.Add("@CustomerID", System.Data.SqlDbType.Int);
                var orderListDataAdapter = new SqlDataAdapter(command);
                var orderDataTable = new DataTable();
                for (int customerId = 11000; customerId < 11000 + customerCount; ++customerId)
                {
                    customerIdParameter.Value = customerId;
                    orderListDataAdapter.Fill(orderDataTable);
                }
                return orderDataTable;
            }
        }

        /// 
        /// Call proc with TVP of CustomersIDs
        /// 
        /// List of customers
        /// DataTable of customer orders
        private static DataTable getOrdersForCustomersUsingTvp(Object customerList)
        {
            using (var connection = new SqlConnection(connectionString))
            using (var command = new SqlCommand(@"dbo.usp_select_orders_for_multiple_customers", connection))
            {
                command.CommandType = CommandType.StoredProcedure;
                command.Parameters.Add("@CustomerList", System.Data.SqlDbType.Structured).Value = customerList;
                var orderListDataAdapter = new SqlDataAdapter(command);
                var orderDataTable = new DataTable();
                orderListDataAdapter.Fill(orderDataTable);
                return orderDataTable;
            }
        }

        /// 
        /// Create a DataTable containing list of customers for TVP
        /// 
        /// Number of customers to return
        /// DataTable with single CustomerID column
        private static DataTable getCustomerListDataTable(int customerCount)
        {
            var customerListDataTable = new DataTable();
            customerListDataTable.Columns.Add("CustomerID", typeof(int));

            for (int customerId = 11000; customerId < 11000 + customerCount; ++customerId)
            {
                var row = customerListDataTable.NewRow();
                customerListDataTable.Rows.Add(row);
                row[0] = customerId;
            }
            return customerListDataTable;
        }

        /// 
        /// Create a DataReader containing list of customers for TVP
        /// 
        /// Number of customers to return
        /// DataReader with single CustomerID column        
        private static IDataReader getCustomerListDataReader(int customerCount)
        {
            var connection = new SqlConnection(connectionString);
            var command = new SqlCommand(@"SELECT CustomerID FROM Sales.Customer WHERE CustomerID BETWEEN 11000 AND 11000 + @CustomerCount - 1;", connection);
            command.Parameters.Add("@CustomerCount", System.Data.SqlDbType.Int).Value = customerCount;
            connection.Open();
            command.CommandType = CommandType.Text;
            return command.ExecuteReader();

        }

        /// 
        /// Create an IEnumerable<SqlDataRecord> object containing list of customers for TVP
        /// 
        /// Number of customers to return
        /// IEnumerable<SqlDataRecord> with single CustomerID column
        private static IEnumerable getCustomerListSqlDataRecords(int customerCount)
        {
            var customerList = new List();
            for (int customerId = 11000; customerId < 11000 + customerCount; ++customerId)
            {
                customerList.Add(customerId);
            }
            return new CustomerListSqlDataRecords(customerList);
        }

        /// 
        /// Implement IEnumerable<SqlDataRecord> to return a SqlDataRecord for each CustomerID
        /// in the provided IEnumerable<int>
        /// 
        private class CustomerListSqlDataRecords : IEnumerable
        {
            private SqlMetaData metaData = new SqlMetaData("CustomerID", SqlDbType.Int);

            private IEnumerable customerList;

            public CustomerListSqlDataRecords(IEnumerable customerList)
            {
                this.customerList = customerList;
            }

            public IEnumerator GetEnumerator()
            {
                foreach (var customerId in customerList)
                {
                    var record = new SqlDataRecord(metaData);
                    record.SetInt32(0, customerId);
                    yield return record;
                }
            }

            System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
            {
                throw new NotImplementedException();
            }
        }
    }
}

RIP OLE DB

I was very surprised when Microsoft announced deprecation of OLE DB provider for SQL Server data access last week on the Data Access Blog and MSDN Forums Announcement. The next release of SQL Server, code-named “Denali”, will be the last to ship a new SQL Server Native Client OLE DB provider. The SQL Server Native Client OLE DB driver will continue to be supported for 7 years after the Denali release so we have plenty of time to plan accordingly.

The other Microsoft-supplied OLE DB driver for SQL Server, SQLOLEDB, has been deprecated for many years now. The deprecated SQLOLEDB driver (and deprecated SQLSRV32.DLL ODBC driver) is part of the older MDAC package and is currently included in Windows operating systems as part of Windows Data Access Components for backwards compatibility. Windows 7 is the last Windows version that will include a SQL Server OLE DB and ODBC driver out of the box. Microsoft recommends that we use the SQL Server Native Client ODBC driver as the SQL Server data access technology of choice from native code going forward.

What This Means to You

Avoid using OLE DB for new SQL Server application development. Update your technology roadmap to move towards migrating existing SQL Server applications that use the SQLNCLI, SQLNCLI10, SQLNCLI11 or SQLOLEDB OLE DB providers to the SQL Server Native Client ODBC driver.

Note that much is still unknown since current versions of SQL Server rely heavily on OLE DB. Although this is purely speculation on my part, it stands to reason that we will see improved ODBC support across all Microsoft products and SQL Server features that currently rely on OLE DB for relational data access.

New SQL Server Development

Use one of the following SQL Server relational database access technologies for new development:

·         Managed code (e.g. C#, VB.NET, managed C++): Use Sysem.Data SqlClient. SqlClient is part of the .NET framework and is the preferred way to access SQL Server from managed code (C#, VB.NET, managed C++). The only reason I can think why not to use SqlClient from managed code is if an application needs to also support other DBMS products using the same interface without coding an additional abstraction layer. In that case accessing different database products Sysem.Data.Odbc is an alternative.

·         Native code (e.g. unmanaged C++): Use ODBC with the Server Native Client driver. The ODBC call-level interface can be used directly or via the higher-level ADO API. The SQL Server Native Client ODBC driver is included with SQL Server and also available as a separate download. 

Migrating Existing Applications

I sometimes see existing managed applications use ADO (e.g. ADODB.Connection) instead of SqlClient. ADO is a COM-based API primarily intended to be used from native code rather than managed code. Typically, these applications were either converted from VB 6 or the developer used ADO instead of ADO.NET due to unfamiliarity with the ADO.NET object model.  This is a good opportunity to convert such code to use System.Data.SqlClient, which will perform better than OLE DB or ODBC from managed code. 

If you have an ADO application where performance is not a concern or the conversion is not worth the effort, an alternative is to simply change the provider to MSDASQL (OLE DB Provider for ODBC Drivers) and add the SQL Server Native Client ODBC driver specification. This can be done with a simple connection string change and the MSDASQL provider will translate the ADO OLE DB calls to ODBC. For example, to use the SQL Server 2008 SNAC ODBC driver:

Old OLE DB connection string: “Provider=SQLNCLI10.1;Data Source=MyServer;Integrated Security=SSPI”

New ODBC connection string: “Provider=MSDASQL;Driver={SQL Server Native Client 10.0};Server=MyServer;Trusted_Connection=Yes”

 

The same connection string change can be used for any ADO application, including ASP classic, legacy VB 6 or unmanaged C++.

Perhaps the biggest challenge will be native code that uses the OLE DB COM interfaces directly instead of going through higher level APIs like ADO. I’ve seen this most commonly done for performance sensitive applications in C++. The best approach here will be to convert the application to use the ODBC call-level interface directly. This will provide the highest SQL Server data access performance from native code. The difficulty of such a change will depend much on the application object model and design. Ideally, data access libraries are shared and abstracted so that low-level data access code changes only need to be made in one place.

Why SQLOLEDB and SQLNCLI Was Deprecated

If you’ve used SQL Server for a long time like me, you’ve seen a number of APIs come and go (http://blogs.msdn.com/b/data/archive/2006/12/05/data-access-api-of-the-day-part-i.aspx). APIs are largely driven by changes in development and platform technologies that change over time. It is possible for Microsoft to support legacy APIs indefinitely but doing so would waste precious development resources on maintenance instead of adding new features that are important to us. COM-based APIs like OLE DB are complex and it just doesn’t make sense to have many APIs that basically do the same thing. 

So we now have the short list of SQL Server relational data access APIs going forward:

·         SqlClient (managed code)

·         JDBC (Java)

·         ODBC (for native code)

Summary

I’m a big fan of open, cross-platform standards so I’m glad that Microsoft chose ODBC over OLE DB for relational database access. ODBC is an implementation of the SQL call-level interface standard (ISO/IEC 9075-3). In contrast, the COM-based OLE DB SQL Server provider relies on proprietary Microsoft Windows COM technology. The SNAC ODBC driver is a truly native driver and provides the fastest SQL Server database access from native code.

 

SQL Server Connection Strings

This is the first of a series of posts on SQL Server connection strings. I don’t think connection strings are all that complicated but I often see developers have problems because they simply cloned an existing connection string (or found one on the internet) and tweaked it for the task at hand without really understanding what the keywords and values mean. This often results in run-time errors that can be tricky to diagnose. 
In this post, I’ll provide a connection string overview and discuss SqlClient connection strings and examples. I’ll discuss OLE DB and ODBC (used via ADO or ADO.NET) and JDBC in more detail the future articles.
Overview
SQL Server can be accessed using several technologies, each of which has different connection string particulars. Connection strings are provider/driver specific so one first needs to decide on a client API before formulating the proper string can be created. 
All connection strings share the same basic format, name/value pairs separated by semicolons, but the actual connection string keywords may vary by provider. Which keywords are required or optional also vary by provider and providers often share the same keywords (or provide synonyms) to minimize the connection string changes when switching between different providers. Most connection string keywords are optional and need to be specified only when the default is not appropriate. Connection string values should be enclosed in single or double quotes when the value may include a semicolon or equal sign (e.g. Password=”a&==b=;1@23″)
The purpose of a connection string is to supply a SQL Server provider/driver with the information needed to establish a connection to a SQL Server instance and may also be used to specify other configuration values, such as whether connection pooling is used. At the end of the day, the provider/driver needs to know at least:
·         SQL Server name (or address)
·         Authentication method (Windows or SQL Server)
·         Login credentials (login and password for SQL Server authentication)
SqlClient
One typically uses the .Net Framework Provider for SQL Server (abbreviated to SqlClient here) in managed code and a SQL Server OLE DB provider or ODBC driver from unmanaged code. It is possible to use OLE DB or ODBC for SQL Server data access in managed code but there is seldom a reason to do so since SqlClient offers high-performance access to SQL Server natively.
The authoritative reference for SqlClient connection strings is http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlconnection.connectionstring.aspx. My goal is not to rehash all of the keywords or illustrate the many combinations here but rather show the ones most commonly used along with best practices. I use the primary keywords rather than synonyms or equivalent keywords in the examples.
The SqlConnectionStringBuilder class provides a programmatic way to build connection strings needed by SqlConnection class. The nice thing about SqlConnectionStringBuilder is that it provides IntelliSense and avoids connection string typos. It should always be used when constructing connection strings based in user input (e.g. user id and password prompt). But you still need to know which connection string properties (keywords) you need to set along with the default values. The examples here apply regardless of whether or not you use yjr SqlConnectionStringBuilder class.
SqlClient Connection String Keyword Examples
Unlike other providers, there is no “Provider” or “Driver” connection string keyword in a SqlClient connection string.  The .Net Framework Provider for SQL Server is implicit with a SqlConnection class so it is redundant to also specify the provider.
I’ll start with the minimal keyword(s) needed. The minimal SqlClient connection string need only specify the authentication method.  The example below specifies Windows authentication using “Integrated Security=SSPI”. This connection string will connect the default instance on the local machine using Windows authentication under the current process Windows security credentials. 

Integrated Security=SSPI
Listing 1: Connect to local default instance using Windows authentication
To connect to the local default instance using SQL authentication, just specify the credentials using the “User ID” and “Password” keywords instead of “Integrated Security=SSPI” keyword. SQL authentication is the default when “Integrated Security” or “Trused_Connection” keyword is not specified. Although I commonly see “Persist Security Info=False” also specified (a best practice from a security perspective), that is the default setting and may be omitted. Be aware that you should encrypt connection strings (or passwords in general) stored in configuration files when using SQL authentication.

User ID=MyLogin;Password=MiP@ssw0rd
Listing 2: Connect to local default instance using SQL authentication
One often connects to a remote SQL Server. Along with the authentication method, add the Data Source keyword to specify the desired SQL Server name or network address.

Data Source=SQLSERVERNAME;Integrated Security=SSPI
Listing 3: Connect to default instance on host SQLSERVERNAME using Windows authentication

Data Source=SQLSERVERNAME;User ID=MyLogin;Password=MiP@ssw0rd
Listing 4: Connect to instance on host SQLSERVERNAME using SQL authentication
Note that these same connection strings may be used to connect locally or remotely. Personally, I recommend always specifying the Data Source even when connecting locally. This makes it easy to move the application to another machine using with the same configuration and helps avoid oversights.
It is usually best to let SqlClient determine the appropriate network library to use rather than an explicit specification. SqlClient will figure out the appropriate network library based on the specified Data Source value. When you connect to a local instance using an unqualified name (or the value “(local)”), Shared Memory is used by default. SqlClient will use TCP/IP if a FQDN (e.g. SQLSERVERNAME.MyDOMAIN.COM) or IP address is specified regardless of whether the instance is local or remote. Since TCP/IP is most commonly used nowadays, I’ll focus on TCP/IP in this article and use a FQDN in the subsequent examples to avoid ambiguity.
It is often desirable to specify the initial database context in the connection sting. If omitted, the default database of the authenticated account is used. This is accomplished using either the “Initial Catalog” or “Database” keyword. I suggest always including the “Initial Catalog” keyword.

Data Source=SQLSERVERNAME.MYDOMAIN.COM;Integrated Security=SSPI;Initial Catalog=MyDatabase
Listing 4: Connect to default instance on host SQLSERVERNAME using Windows authentication with initial database context of MyDatabase
Named Instances
The connection strings I’ve shown so far assume the target is a default SQL Server instance listening on port 1433. One can run multiple instances of SQL Server on the same host using the named instance feature. If your target database instance is a named instance, SqlClient will also need to know the instance name or instance port number. The instance name can be specified by appending a backslash and instance name to the Data Source value:

Data Source=SQLSERVERNAME.MYDOMAIN.COM\MYINSTANCE;Integrated Security=SSPI;Initial Catalog=MyDatabase
Listing 5: Connect to named instance on host SQLSERVERNAME using Windows authentication with initial database context of MyDatabase
As an aside, I often see connectivity problems with named instances due to oversights in the SQL Server configuration. When an instance name is specified, SqlClient interrogates the SQL Server Brower service on the SQL Server host to determine the instance port (or named pipe name). The SQL Server Brower service is disabled by default so you need to enable and start it in order to connect by the instance name. This can be done using the SQL Server Configuration Manager tool. Also, since the SQL Server Brower service communicates over UDP port 1434, that port must be allowed through firewalls.
You can specify a port number instead of instance name to directly to a named instance (or to a default instance listing on a non-standard port). The port may be specified by appending a comma and port number to the data source value. The needed port number can be ascertained from the SQL Server Configuration Manager tool.

Data Source=SQLSERVERNAME.MYDOMAIN.COM,60086;Integrated Security=SSPI;Initial Catalog=MyDatabase
Listing 6: Connect to instance on host SQLSERVERNAME listening on port 60086 using Windows authentication with initial database context of MyDatabase
Additional Keywords

In addition to the “Data Source”, “Initial Catalog” and “Integrated Security” (or “User Id” and “Password”) keywords I’ve discussed so far, I recommend that “Application Name” also be specified. The specified string is helps identify the application when monitoring activity on the database server. This is especially useful when an application server or client hosts multiple applications.

Data Source=SQLSERVERNAME.MYDOMAIN.COM;Integrated Security=SSPI;Initial Catalog=MyDatabase;Application Name=Connection String Example
Listing 7: Connect to default instance on host SQLSERVERNAME using Windows authentication with initial database context of MyDatabase with application name specification
In my opinion, the many other keywords are noise unless the default values are inappropriate for your environment. 
Summary

You can get by nicely in most cases with only the 4 or 5 SqlClient connection string keywords I’ve discussed here. I suggest you establish a connection string standard that includes the “Data Source”, “Initial Catalog”, “Application Name” keywords plus the authentication method, “Integrated Security=SSPI” or “User Id” and “Password”.

Why Parameters are a Best Practice

Perhaps the single most important SQL Server application development Best Practice is the consistent use of parameters in application code.  Parameterized database calls are more secure, easier to program and promote query plan reuse.  Yet I continue to see code with SQL statements constructed by concatenating literal strings with variable values.  I think this is largely out of ignorance rather than design so I decided to show how easy it is for .NET application developers to use parameters and peek into the internals to show why parameters are so important.

How to Use Parameters

Below is a C# snippet of a SqlClient query built and executed using variable concatenation.  This is an example of how not to supply variable values to a SQL query.

string myTableId = 1;

 

SqlCommand selectCommand =

    new SqlCommand(“SELECT LastName, CreateDate, Amount FROM dbo.MyTable “ +

    “WHERE MyTableID = “ + myTableId.ToString(), connection);

 

Here is a parameterized equivalent of the same query.  The only changes are the parameter token in the query string and that the parameter value was added to the parameter collection instead of embedding in the query text:

int myTableId = 1;

 

SqlCommand selectCommand =

    new SqlCommand(“SELECT LastName, CreateDate, Amount FROM dbo.MyTable “ +

    “WHERE MyTableID = @MyTableID”, connection);

 

selectCommand.Parameters.AddWithValue(“@MyTableID”, myTableId);

 

So all an application developer needs to do is specify parameter tokens in a query string and add values to the parameter collection.  In the case of a parameterized stored procedure call, no tokens are necessary; simply specify the procedure name as the CommandText, specify CommandType.StoredProcedure and add parameters to the collection:

int myTableId = 1;

 

SqlCommand selectCommand =

    new SqlCommand(“dbo.usp_SelectMyTableData”, connection);

selectCommand.CommandType = CommandType.StoredProcedure;

 

selectCommand.Parameters.AddWithValue(“@MyTableID”, myTableId);

 

Parameters Under the Covers

The real differences between a parameterized and non-parameterized statement can be gleaned from a SQL and network trace.  Here is the associated trace event captured with SQL Profiler of the non-parameterized code example:

Profiler trace showing SQL:BatchCompleted

Note the “SQL:BatchCompleted” event.  The SqlClient API simply sent the query to SQL Server as a batch (text containing one or more SQL statements) because the SqlCommand.Parameters collection was empty.

Digging deeper with a tool like Microsoft Network Monitor, we can see the TDS (Tabular Data Stream) protocol data that SqlClient sent to SQL Server for the request:

TDS Packet with SQL statement batch

The TDS packet contains the entire Unicode SQL statement string, including the text value of 1 (Unicode character 0x0031, boxed in blue) that the application passed in the CommandText.

Now let’s take a look at the parameterized query trace event:

Profiler trace showing RPC:Completed

The fact that the EventClass shows “RPC:Completed” event instead of “SQL:BatchCompleted” indicates that SqlClient didn’t really wrap the query in sp_executesql to achieve parameterization like the TextData seems to show.  A RPC (Remote Procedure Call) is a specialized type of SQL request in which parameter values are passed separately from the SQL statement in the TDS request and in native form.  The SqlClient API knew to use a RPC instead of a batch because the non-empty parameters collection.

We can see from the dump of the TDS packet that it contains no reference to sp_executesql nor does it contain the string value 1 for the parameter value that the Profiler trace showed (see SQL Trace Parameter values are not always as they seem for more info).

TDS packet showing RPC

The important point with the RPC request is that the value 1 (binary integer value, boxed in blue) was passed separately from the query text.  This fact has a number of important implications for security, performance and programming. The difference between the TDS packets shows the essence of parameters in communication between the application code and SQL Server. 

Use Parameters for Security

If you don’t know what SQL injection is, Google is your friend.  Basically, injection vulnerability allows a user to manipulate the intended SQL by supplying mal-formed values.  Injection can allow a malicious user to retrieve sensitive data or change database information, limited only by the database security context.

Parameterized calls protect against SQL injection because the executed SQL statement is constant regardless of the supplied values.  As long as no dynamic SQL is constructed, the intended SQL statement cannot be manipulated.  An application developer only needs to ensure that the desired parameter values can be passed (e.g. key value from session state instead of from http query string) to guard against unintended data disclosure or manipulation.  Using stored procedures (with CommandType.StoredProcedure) always results in an RPC call and provides an additional security layer.

Use Parameters for Performance

Parameterized statements are cached to facilitate reuse and avoid compilation overhead, which is especially important in high-volume OLTP environments.  Furthermore, using stored procedures further promotes caching and reuse. 

The only downside to parameters from a performance perspective is that a given plan may be optimal for some values but not others.  In cases where sub-optimal cached plans are an issue, techniques such as RECOMPILE hints or compromise plan guides can help avoid a sub-optimal execution plans due to varying parameter values.  See article Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005 for more information.

Use Parameters for Cleaner Code

Parameters are easier to program and provide cleaner code because there is no need to escape (double-up) quotes, format datetime strings or use “.” as a decimal separator.   Remember that the API takes care of passing parameter values in native form so string formatting is not needed when parameters are used.

String lastName = “O’Leary”;

DateTime hireDate = DateTime.Now;

decimal amount = 12.34m;

 

SqlCommand insertCommand =

    new SqlCommand(

        “INSERT INTO dbo.MyTable “ +

            “(LastName, CreateDate, Amount) “ +

            “VALUES(‘” + lastName.Replace(“‘”, “””) + “‘, ‘” +

            DateTime.Now.ToString(“yyyy-MM-ddTHH:mm:ss.fff”) + “‘, “ +

            amount.ToString(“0.00”) + “)”,

        connection);

 

string lastName = “O’Leary”;

DateTime createDate = DateTime.Now;

decimal amount = 12.34m;

 

SqlCommand insertCommand =

    new SqlCommand(

        “INSERT INTO dbo.MyTable “ +

            “(LastName, CreateDate, Amount) “ +

            “VALUES(@LastName, @CreateDate, @Amount)”,

        connection);

 

insertCommand.Parameters.AddWithValue(“@LastName”, lastName);

insertCommand.Parameters.AddWithValue(“@CreateDate”, createDate);

insertCommand.Parameters.AddWithValue(“@Amount”, amount);

 

Security with Ownership Chains

Security with Ownership Chains

 

Ownership chaining is one of my favorite SQL Server security features.  I like ownership chaining because it allows me to confine data access to specific stored procedures, views and functions while preventing direct access to the underlying tables.  This approach provides a well-defined application interface that insulates applications from the physical implementation and allows me to change things on the back-end as long as I don’t change the interface.  Also, thanks to ownership chaining, I have many options to vertically and horizontally partition data to meet security objectives while ensuring tables are not accessed directly.  

Ignorance is not bliss when it comes to ownership chaining security, though.  I’ve seen cases where data access was inadvertently allowed because ownership chaining wasn’t considered.  It is imperative that SQL Server DBAs understand ownership chaining in the SQL Server security model so that data is properly secured.  I want to point out a couple of important security concepts as it relates to ownership chaining with the hope that it will help SQL Server DBAs implement a security scheme that is appropriate for their environment and also easy to manage.

Ownership Chain Behavior

SQL Server always checks permissions on directly referenced objects using the current connection security context.  However, permissions are evaluated differently when objects are accessed indirectly, such as those used in a trigger, view or stored procedure.  The behavior of ownership chaining is that object permissions are not evaluated when the calling and referenced object owners are the same.  Also, because object permission checking is short-circuited in an unbroken chain, ownership chaining takes precedence over an impersonated security context (EXECUTE AS clause) as well as denied object permissions.   Consequently, permissions on indirectly referenced objects are irrelevant in an unbroken ownership chain.  This ownership chaining behavior gives a DBA control over exactly who can use which objects while preventing ad-hoc access to other objects.

Ownership chaining behavior might not be intuitive at first.  I like to think of it from the perspective that the purpose of granting object permissions is so the object can actually be used.  For example, it is of no use to grant a user SELECT permissions on a view unless data can be retrieved from the view.  The same applies even when a stored procedure or function impersonates with EXECUTE AS; the impersonated principal must be able to use objects referenced by the proc in order for the procedure to be useful.  I keep in mind that with ownership chaining, I effectively grant permissions on exposed object functionality regardless of how the functionality is implemented.

Ownership chaining is limited to a single owner in order to prevent escalation of privileges in environments where non-privileged users can create objects.  Ownership chaining also does not apply to dynamic SQL executed from a stored procedure; the invoking (or impersonated) user needs permissions on objects referenced by dynamic SQL statements.  I should add that a one of the useful purposes of EXECUTE AS and signing modules with certificates is to provide needed permissions in a broken ownership chain or granting statement permissions without granting user permissions directly.  See SQL Server MVP Erland Sommarskog’s article Giving Permissions through Stored Procedures for a discussion of these techniques.

Ownership Best Practices

Ownership is obviously central to ownership chaining.  In the vast majority of environments, it’s probably best to simply inherit the owner (a.k.a. AUTHORIZATION) from the object’s schema instead of overriding on individual objects.  It is possible to assign ownership to individual objects (using ALTER AUTHORIZATION) in order to implement chaining that is independent of schema but I think this is overly complex for most environments.  One reason I can think of why one might specify a different owner at the object level is in the case where a table contains particularly sensitive data and one wants to deliberately prevent inadvertent access via ownership chaining.  I think such cases are rare, though. 

I described some of the practices I follow for schema and object ownership in Keep Schema and Ownership Simple.  In summary, I specify dbo as the schema authorization (owner) unless I want to break the ownership chain between schemas to create a security boundary.  When a security boundary is desired, I create a role (or user without login) with the same name as the schema for the schema authorization.

Keep Schema and Ownership Simple

I like to keep things simple because simplicity is easier to manage and less prone to error.  When I’m faced with schema design decisions, I pick the selection with the least complexity that meets my objectives.  Here are some of my thoughts regarding schema and ownership in SQL 2005.

Schema

A schema is basically a container that categorizes database objects and simplifies security administration.  As a namespace, schemas logically organize objects without the need for special object naming rules.  Different objects can have the same name as long as they exist in different schemas because the schema name is essentially an extension of the object name that will “uniqueify” the name within a database. 

Categorizing objects by schema is particularly useful in complex databases with many objects.  There is some subjectivity on exactly how one might draw schema boundaries but the basic concept is the same; group related objects into different schema to provide organization to an otherwise unwieldy schema.  Classifying related objects by schema makes complex databases easier to understand and manage.

Schema also simplifies security administration because permissions can be granted en mass at the schema level.  For example, I can grant EXECUTE permissions on all objects in a schema with a single statement like “GRANT EXECUTE ON SCHEMA::Sales TO SalesRole”.  I can grant CONTROL permissions on a schema to allow privileged users to full control over a specific schema but not others in the same database.

Even with the option to use multiple schemas, I tend to use the built-in dbo schema.  I do this because most of the applications I maintain were developed before SQL 2005 and all objects are already in the dbo schema.  Some of those legacy systems could benefit from multiple schemas but I’ll continue to use dbo for those applications to be consistent until I need to add a group of new objects that are appropriate for a separate schema.  The new SQL 2005 databases I’ve developed thus far have been fairly simple and haven’t warranted using multiple schemas for either classification or security purposes.

Ownership

The owner is important for two main reasons:  1) the owner has powerful CONTROL permissions over all owned objects and 2) the owner determines whether or not the ownership chain is broken between objects.  Schema-contained objects will inherit the schema owner unless explicitly overridden using ALTER AUTHORIZATION.  Personally, I think it best for objects to inherit the schema owner in the vast majority of cases; if an object warrants a different owner than the schema, the object probably belongs in a different schema.

I use the built-in dbo principal for ownership unless I have a reason to do otherwise.  This approach is perfect in environments where only db-owner role members can create objects and schemas are used solely as a namespace rather than a security boundary.  The dbo principal exists in all databases so there is no a need to create a user or role for ownership purposes.  Simple is good.

Different schema owners provide a security boundary between objects in different schema because this breaks the ownership chain.  With a broken chain, explicit permissions on indirectly referenced objects are needed by the end user or the impersonated principal.  Different schema owners ensure that I don’t inadvertently provide access to data in different schema via ownership chaining.

Note that an owner can be any database principal and does not necessarily need to be associated with a login.  I find this feature especially handy in situations where I want to specify an owner other than dbo.  Since I can specify an owner that is not a real person, I won’t need to change ownership if the owner leaves the company or moves on to other roles in the organization. 

It’s probably best to create a principal (role or a user without login) with the same name as the schema for ownership purposes when different owners are desired.  The only case I can think of where it might be appropriate for a real person to own a schema (at least initially) is in a development environment when non-dbo users might create schemas and objects.