AddWithValue is Evil

AddWithValue is the root cause of many SQL Server performance and concurrency problems. Albeit the SqlParameterCollection.AddWithValue method is a slightly more convenient way to add parameters and values to a SqlCommand, the implications are insidious because the high costs of SQL Server resource utilization is not apparent during development. It’s often not until the application is in production and under load that it is realized that queries are slow due to full scans, excessive resource usage, lock escalation, blocking, cache bloat, and many deadlocks.

A best practice in SQL Server development has long been strongly-typed parameters. Not only does this practice thwart SQL injection (barring non-parameterized dynamic SQL statements), performance is generally improved due to caching of queries that differ only by values, avoiding the cost of compiling the query each time it is executed. Importantly, matching parameter definitions with referenced columns promotes efficiency by avoiding implicit conversions that can preclude efficient index use.

To follow best practices, avoid AddWithValue and instead add parameters and values using a method that allows specification of the desired SqlDbType along with the appropriate length, precision, and scale. Although slightly more verbose than AddWithValue, the performance benefits are huge and worth a few keystrokes.

The Problem with Addwithvalue
The nastiness with AddWithValue is that ADO.NET infers the parameter definition from the supplied object value. Parameters in SQL Server are inherently strongly-typed, including the SQL Server data type, length, precision, and scale. Types in .NET don’t always map precisely to SQL Server types, and are sometimes ambiguous, so AddWithValue has to makes guesses about the intended parameter type.

The guesses AddWithValue makes can have huge implications when wrong because SQL Server uses well-defined data type precedence rules when expressions involve unlike data types; the value with the lower precedence is implicitly converted to the higher type. The implicit conversion itself isn’t particularly costly but is a major performance concern when it is the column value rather than the parameter value must be converted, especially in a WHERE or JOIN clause predicate. The implicit column value conversion can prevent indexes on the column from being used with an index seek (i.e. non-sargable expression), resulting in a full scan of every row in the table or index.

Below are some of the most common pitfalls with AddWithValue.

.NET Strings
Strings are arguably the single biggest problem with AddWithValue. AddWithValue infers SqlDbType.NVarChar when a string object is provided because strings in .NET are Unicode. The resultant SQL Server parameter type is nvarchar(n) or nvarchar(MAX), depending on the length of the string value provided. The type is nvarchar(n) when the string length is 4000 characters or less and nvarchar(MAX) when over 4000 characters.

The inferred nvarchar parameter type is especially problematic when the parameter value is compared to varchar columns in queries because nvarchar has a higher precedence. This will prevent indexes on the varchar columns from being used as I mentioned earlier. Because varchar columns are widely used when Unicode characters are not required (saving space on disk and memory needs), resultant performance problems due to AddWithValue are quite common in the wild.

Another issue with strings and AddWithValue is cache bloat and lack of reuse. AddWithValue uses the actual length of the supplied value as the nvarchar parameter length. This results in a separate cached plan for queries that differ only by parameter length. For example, you’ll end up with 20 different cached plans with 20 different @LastName parameter value lengths. Consider the number of plans is greatly exacerbated with many string parameters (e.g. @LastName, @FirstName, @Address) and is especially an issue with large parameterized IN clauses.

A better method to add string parameters is with the SqlParameterCollection.Add method. In addition to the desired name and data type, this overload allows one to specify the length, which should match the column maximum length (-1 for MAX types). The value can be easily assigned to the returned SqlParameter instance Value propery in a one-liner:

.NET DateTime
AddWithValue infers SqlDbType.Datetime parameter type when a DateTime object value is provided. Although a .NET DateTime aligns closely with SQL Server datetime2(7), AddWithValue uses datetime for backwards compatibility.

The SQL Server datetime type has a fixed precision of 3 with accuracy to 1/300 fractional seconds so ADO.NET will round the value accordingly and sub-millisecond values will be lost, resulting in a value that ends with 0, 3, or 7 milliseconds. This behavior is compatible with a SQL Server datetime column value but loses accuracy and precision when used with a datetime2 column of precision 3 or greater. Values will be rounded to the lesser datetime precision and sub-millisecond values will be lost.

One can explicitly specify SqlDbType.DateTime2 to avoid loss of accuracy when working with the SQL Server datetime2 type. Note that the parameter Scale property is used to specify datetime precision in ADO.NET rather than the Precision property as one might expect. The default Scale is 7, the same as the default datetime2 precision in SQL Server. Although the value will be automatically be rounded when the target column precision is less, I recommend one match the column precision regardless.

This example uses Add instead of AddWithValue, specifying Scale of 3 to match a target column datetime2(3), which is more precise than datetime with 1 byte less storage.

.NET Decimal
AddWithValue infers a SqlDbType.Decimal with the same precision and scale as the supplied object value. Varying precision and scale properties can bloat cache similarly to string lengths as described earlier so it’s best to explicitly specify the precision and scale matching the column.

Here’s an example that adds a decimal value without AddWithValue, specifying precision and scale (decimal(18,4)):

Summary
I hope this article helps you understand the importance of the strongly-typed parameter best practice. Note only will this help avoid performance and concurrency problems, the attention to detail will save compute costs in the cloud.

Bulk Load Batch Size Considerations in SQL Server 2016

Bulk load has long been the fastest way to mass insert rows into a SQL Server table, providing orders of magnitude better performance compared to traditional INSERTs. SQL Server database engine bulk load capabilities are leveraged by T-SQL BULK INSERT, INSERT…SELECT, and MERGE statements as well as by SQL Server client APIs like ODBC, OLE DB, ADO.NET, and JDBC. SQL Server tools like BCP and components like SSIS leverage these client APIs to optimize insert performance.

SQL Server 2016 and later improves performance further by turning on bulk load context and minimal logging by default when bulk loading into SIMPLE and BULK LOGGED recovery model databases, which previously required turning on trace flags as detailed in this blog post by Parikshit Savjani of the MSSQL Tiger team. That post also includes links to other great resources that thoroughly cover minimal logging and data loading performance, which I recommend you peruse if you use bulk load often. I won’t repeat all that information here but do want to call attention to the fact that these new bulk load optimizations can result in much more unused space when a small batch size is used compared to SQL Server 2014 and older versions.

Bulk Load Batch Size Implications
An important consideration in SQL 2016 and later with bulk load context and minimal logging is that each batch allocates new extent(s) (8 contiguous 8K pages, 64K) rather than using existing unused pages in existing extents. This improves concurrency and space allocation performance but possibly with the cost of significantly higher unused space than previous versions when few rows are loaded per batch.

The optimal batch size for bulk load operations involves trade-offs between throughput, data space utilization, concurrency, and transaction log space (FULL recovery model). To avoid excessive unused space with bulk load context, adjust the batch size when possible such that data are loaded in multiples of the 64K extent size.

There are cases when one must load with small batches, such as when loading small files. One solution to mitigate unused space in this scenario is to not use TABLOCK so that bulk load context isn’t used. This isn’t much of a performance concern with small batches anyway. Another method us to turn on trace flag 692 to disable the default bulk load context in SQL Server 2016, effectively reverting to pre-SQL 2016 behavior.

Don’t confuse batch size with the rows per batch hint. The rows per batch hint is used by the SQL Server optimizer to help optimize the load process because the number of rows that will be loaded is otherwise unknown, defaulting to an estimate of 10,000 rows.

SQL Server 2016 SP1 Standard Edition Enhancements

I seldom get excited about service packs but the changes released with SQL Server 2016 SP1 are the most significant I’ve seen in a SQL Server service pack in 20+ years. Microsoft announced this week at the Microsoft Connect(); developer’s conference that SQL Server 2016 SP1, which is available for download immediately, allows features previously available only in Enterprise/Developer Editions to be used in lessor Standard, Web, Express, and LocalDB Editions too. Features like table partitioning, In-Memory OLTP, and columnstore are now options for developers and DBAs using SQL Server Standard Edition and even the free Express Edition in production. See SQL Server 2016 Service Pack 1 (SP1) released !!! for the complete matrix of programmability features by edition along with other cool SP1 information.

The implications are huge now that SQL Server has the same programmability surface area among editions. The choice of the production edition can be made independently based on operational needs rather than programmability features. Developers can use a free edition (i.e. LocalDB, Express or Developer) without fear a feature won’t be available in production as long as prod is running SQL Server 2016 SP1 or greater. DBAs can now choose the appropriate edition for production based on other considerations like advanced high availability, TDE, Auditing as well as performance features like higher supported memory, more number of cores, and advanced scanning. This separation of concerns avoids the need to lock in the production edition early in the application lifecycle, making development easier and production implementation more flexible.

Real World Use Case Scenario
I work with an ISV with hundreds of customers running a mix of Standard and Enterprise Edition. Their needs vary widely and SQL Server Enterprise Edition is not an option for some due to budget constraints. Some tables are often quite large so partitioning is required for manageability and, for their reporting workload, partitioning also improves performance of large scans due to partition elimination. The ugliness though, is that table partitioning (and/or columnstore) is the right tool for the job but was not an option for customers on Standard Edition.

The ISV initially compromised and used view partitioning instead of table partitioning so that the same code would run regardless of edition. Although that provided the expected manageability benefits, there were some downsides. Compilation time increased significantly as the number of partitioned view member tables increased as did the query plan complexity. This sometimes resulted in poor query plans against very large tables and especially impacted larger and most valued customers, most of which were running Enterprise Edition.

To address the problem before SQL Server 2016 SP1, the ISV added conditional code to the application so that either view or table partitioning could be used depending on the SQL Server edition. This wasn’t ideal as it added code complexity and doubled the number of QA test cases for application features that performed partition maintenance. However, since the resultant benefits for their larger customers on Enterprise Edition were quite significant; the additional costs of development and testing were well-justified.

Now that table partitioning is available in SQL Server 2016 SP1 Standard Edition, they plan to require SQL Server 2016 SP1 (or later) going forward, use table partitioning unconditionally, and perhaps introduce usage of other features like columnstore that were previously Enterprise only. Not only will this simplify the code base and test cases, customers on Standard Edition will be happier with their experience and can upgrade to Enterprise if they so choose without reinstalling or reconfiguring the application. It will of course take some time before all their customers upgrade to the latest product version and SQL 2016 SP1+ but the future is much brighter now.

Perform Due Diligence
If you are new to features previously available only in Enterprise Edition, I suggest you perform due diligence before using these features. Memory-optimized features like columnstore and In-Memory OLTP require additional physical memory and insufficient memory with memory-optimized features will be a production show-stopper. Make sure your hardware is sized appropriately regardless of edition and, in the case of editions other than Enterprise or Developer, memory requirements don’t exceed the maximum capacity limits for that edition. Although very powerful, In-Memory OLTP is a fundamentally different paradigm that you might be accustomed to regarding transactional behavior and isolation levels. Be sure you fully understand these features before using it in development or production.

Summary
I hope these changes are enough motivation for you to consider upgrading to SQL Server 2016 SP1, especially if you are running Standard Edition or are currently on an older SQL Server version. Together with the fact that SQL Server 2016 just runs faster, the time and effort spend in upgrading is a solid investment that will pay dividends regardless of edition.

SQL Saturday – St. Louis 2016

Please join us for SQL Saturday, St. Louis 2016. We have 25 sessions scheduled covering a wide range of topics on the Microsoft Data Platform stack.

As in the past few years, the event will be held at the SLU Center for Workforce & Organizational Development, 3545 Lindell Blvd , Saint Louis, Missouri 63103. Visit the event home page for registration, directions and session details.

See you there!

SSMS 2016 is Free

The general available release of SQL Server Management Studio 2016 is free. No SQL Server license is required to install and use it for either development or production purposes. This GA version supports SQL Server versions 2008 through 2016 plus Azure SQL Database. You might be able to use it against earlier SQL versions too but that’s not officially supported and features will be limited. SSMS 2016, now based on the Visual Studio 2015 isolated shell, can be downloaded from the SSMS Download Page.

Prior to SQL Server 2016, one typically installed SSMS from the SQL Server installation media and was upgraded only by CUs, service packs, or SQL Server version upgrades. This changed with the release of SQL Server 2016; SSMS is a now a separate download. After installation, regular updates are made available (monthly) and can be installed from the SSMS Tools–>Check for Updates… menu option. These frequent updates allow the tooling to stay aligned with both on-prem and Azure SQL Database while delivering enhancements and bug fixes to SSMS much more quickly. I am quite impressed by the responsiveness of the SSMS development team in this regard, having used it since the initial preview versions of the VS 2015 isolated shell were made available.

An important change in this GA version compared to earlier versions of SSMS is that SET XACT_ABORT ON is now the default session setting. That’s a good thing, in my opinion, because it helps ensure one doesn’t accidentally leave a transaction open in a query window following an error (I have many horror stories about that). But it does change script execution behavior because queries, with or without error handling, will generally not continue after an error with XACT_ABORT ON. Be aware of this when running scripts from SSMS. This and other SET options can be specified from the SSMS Query–>Query options–>Execution–>Advanced. I strongly recommend specifying SET XAT_ABORT ON unless you have a specific reason not to do so.

Kudos to the Microsoft for providing SSMS as a free tool and to the SSMS development team for their ongoing commitment to SQL Server tooling improvements and timely delivery.

QUERYTRACEON Permissions

I recently had a SQL Server 2014 stored procedure that ran for several hours with high CPU usage. I narrowed the problem down to a single query with dozens of joins was stuck in compilation. Suspecting an issue with the new cardinality estimator, I changed the database compatibility level from 120 (SQL Server 2014) to 110 (SQL Server 2012) so that the legacy CE would be used. The query then compiled immediately and executed quickly, which confirmed the new CE as the culprit of the excessive compilation times.

OPTION QUERYTRACEON (9481) to the Rescue
Changing the database compatibility level was a heavy-handed option I didn’t want to pursue because some queries benefited from the new CE behavior. Instead, I specified trace flag 4891 using the OPTION QUERYTRACEON(9481) query hint to use the legacy CE for the problem query. This is an acceptable workaround for me until the bug is addressed. However, you might not be so lucky. A more common use case for OPTION QUERYTRACEON(9481) is performance regression with the new CE. Although not documented as far as I know, the QUERYTRACEON query hint requires sysadmin permissions unless executed from within a stored procedure. Consequently, QUERYTRACEON won’t be an option for those who need the hint for queries run by non-sysadmins outside stored procs. I found SQL Server MVP Ola Hallengren’s Connect suggestion to allow QUERYTRACEON with no additional permissions. I upvoted and encourage you to do so too.

Beware Job Recruiting SCAM

I’ve had multiple reports of a scam where persons have used my identity to entice IT professionals into applying for positions they have no business recruiting for. The scammers later ask for a PayPal payment for their services. The authorities have been notified of this fraudulent activity.

I urge everyone to be vigilant with regards to social engineering scams like this. Beware of unsolicited emails from persons you don’t know personally. It’s is easy for unscrupulous individuals to glean much information from publicly available sources (job descriptions, professional profiles, etc.) and tailor a message customized for a particular job seeker.

Stairway Series on SQLServerCentral.com

SQLServerCentral.com launched a new Stairway content series today, targeting specific areas of SQL Server.  Each Stairway includes a series of up to 12 levels focused on a specific SQL Server topic.  The goal is to guide DBAs and developers with little or no understanding of a subject through a sequence of tutorials in order to quickly gain the knowledge one needs to use a SQL Server feature confidently in a production environment.  Kalen Delaney, editor of the Stairway series, is one of the most respected experts in the world-wide SQL Server community.

I was flattered when Kalen gave me the opportunity to contribute to the series with a Stairway on Server-side Tracing.  For years I’ve cautioned against using Profiler indiscriminately both here as well as in the MSDN forums and newsgroups.  But it seems many DBAs still don’t differentiate between Profiler and server-side tracing.  I’m hoping this Server-side Tracing Stairway will empower DBAs with the knowledge to choose the right tool for the job.

My apologies for having gone dark for the last several months.   The subject of this post is the primary reason; there are only so many hours in the day L

Don’t Bloat Proc Cache with Parameters

Most of us have long been aware that non-parameterized ad-hoc queries can pollute procedure cache with entries that will never be reused.  Each distinct query string is a separate cache entry and, unless the exact query with the same values is executed again, the plan will remain in cache unnecessarily until aged out.

However, I was surprised to learn recently that even parameterized queries can bloat procedure cache if one isn’t careful.  This was brought to by attention by SQL Server MVP Erland Sommarskog and confirmed by fellow MVP Adam Machanic, who experienced this problem at a customer site.

The issue is that parameter declarations are part of cache entry statement text.  Consider the following code:

string lastName = “Sommarskog”;

 

SqlCommand selectCommand =

    new SqlCommand(

        “SELECT ContactID, FirstName, LastName “ +

        “FROM AdventureWorks.Person.Contact “ +

        “WHERE LastName = @LastName”,

        connection);

 

selectCommand.Parameters.AddWithValue(“@LastName”, lastName);

 

SqlClient defined the parameter as nvarchar(10) because the supplied string value length was 10 and .NET strings are Unicode.  Below is the resulting statement text from sys.dm_exec_query_plan:

(@LastName nvarchar(10))SELECT ContactID, FirstName, LastName FROM AdventureWorks.Person.Contact WHERE LastName = @LastName

 

Now run the same code with a different last name value:

string lastName = “Machanic”;

 

SqlCommand selectCommand =

    new SqlCommand(

        “SELECT ContactID, FirstName, LastName “ +

        “FROM AdventureWorks.Person.Contact “ +

        “WHERE LastName = @LastName”,

        connection);

 

selectCommand.Parameters.AddWithValue(“@LastName”, lastName);

 

You probably guessed it – an additional cache entry was created for the same query due to the different name length:

(@LastName nvarchar(8))SELECT ContactID, FirstName, LastName FROM AdventureWorks.Person.Contact WHERE LastName = @LastName

 

It is likely that no more than a dozen or so cache entries will exist for this particular query since the only differentiator is the last name length.  However, with queries that contain multiple string parameters of varying lengths, the number of cache entries increases dramatically while the likelihood of reuse diminishes.  In Adam’s case, the main culprit was a string with a large list of values (between 1 and 1000+ items) that was split using a technique described in Erland’s Arrays and Lists articles http://www.sommarskog.se. 

Preventing Procedure Cache Bloat

An easy way to prevent unneeded procedure cache entries is to explicitly specify the maximum data length for string and other variable length values.  In fact, it’s a good idea to specify both the length and SqlDbType to ensure the parameter declaration matches the target column.  Not only will this help prevent cache bloat due to different parameter lengths, the matching data type can avoid the performance gotcha of a non-sargable expression caused by implicit data type conversion.

There are a number techniques to specify the parameter length and SqlDbType in .NET.  For example, the code below uses an overloaded SqlCommand.Parameters.Add method.  If you use AddWithValue, be sure to set the Size and SqlDbType properties of the newly created parameter afterward.

string lastName = “Sommarskog”;

 

SqlCommand selectCommand =

    new SqlCommand(

        “SELECT ContactID, FirstName, LastName “ +

        “FROM AdventureWorks.Person.Contact “ +

        “WHERE LastName = @LastName”,

        connection);

 

selectCommand.Parameters.Add(“@LastName”, SqlDbType.NVarChar, 50).Value = lastName;

 

 

Keep Schema and Ownership Simple

I like to keep things simple because simplicity is easier to manage and less prone to error.  When I’m faced with schema design decisions, I pick the selection with the least complexity that meets my objectives.  Here are some of my thoughts regarding schema and ownership in SQL 2005.

Schema

A schema is basically a container that categorizes database objects and simplifies security administration.  As a namespace, schemas logically organize objects without the need for special object naming rules.  Different objects can have the same name as long as they exist in different schemas because the schema name is essentially an extension of the object name that will “uniqueify” the name within a database. 

Categorizing objects by schema is particularly useful in complex databases with many objects.  There is some subjectivity on exactly how one might draw schema boundaries but the basic concept is the same; group related objects into different schema to provide organization to an otherwise unwieldy schema.  Classifying related objects by schema makes complex databases easier to understand and manage.

Schema also simplifies security administration because permissions can be granted en mass at the schema level.  For example, I can grant EXECUTE permissions on all objects in a schema with a single statement like “GRANT EXECUTE ON SCHEMA::Sales TO SalesRole”.  I can grant CONTROL permissions on a schema to allow privileged users to full control over a specific schema but not others in the same database.

Even with the option to use multiple schemas, I tend to use the built-in dbo schema.  I do this because most of the applications I maintain were developed before SQL 2005 and all objects are already in the dbo schema.  Some of those legacy systems could benefit from multiple schemas but I’ll continue to use dbo for those applications to be consistent until I need to add a group of new objects that are appropriate for a separate schema.  The new SQL 2005 databases I’ve developed thus far have been fairly simple and haven’t warranted using multiple schemas for either classification or security purposes.

Ownership

The owner is important for two main reasons:  1) the owner has powerful CONTROL permissions over all owned objects and 2) the owner determines whether or not the ownership chain is broken between objects.  Schema-contained objects will inherit the schema owner unless explicitly overridden using ALTER AUTHORIZATION.  Personally, I think it best for objects to inherit the schema owner in the vast majority of cases; if an object warrants a different owner than the schema, the object probably belongs in a different schema.

I use the built-in dbo principal for ownership unless I have a reason to do otherwise.  This approach is perfect in environments where only db-owner role members can create objects and schemas are used solely as a namespace rather than a security boundary.  The dbo principal exists in all databases so there is no a need to create a user or role for ownership purposes.  Simple is good.

Different schema owners provide a security boundary between objects in different schema because this breaks the ownership chain.  With a broken chain, explicit permissions on indirectly referenced objects are needed by the end user or the impersonated principal.  Different schema owners ensure that I don’t inadvertently provide access to data in different schema via ownership chaining.

Note that an owner can be any database principal and does not necessarily need to be associated with a login.  I find this feature especially handy in situations where I want to specify an owner other than dbo.  Since I can specify an owner that is not a real person, I won’t need to change ownership if the owner leaves the company or moves on to other roles in the organization. 

It’s probably best to create a principal (role or a user without login) with the same name as the schema for ownership purposes when different owners are desired.  The only case I can think of where it might be appropriate for a real person to own a schema (at least initially) is in a development environment when non-dbo users might create schemas and objects.