Don’t Bloat Proc Cache with Parameters

Most of us have long been aware that non-parameterized ad-hoc queries can pollute procedure cache with entries that will never be reused.  Each distinct query string is a separate cache entry and, unless the exact query with the same values is executed again, the plan will remain in cache unnecessarily until aged out.

However, I was surprised to learn recently that even parameterized queries can bloat procedure cache if one isn’t careful.  This was brought to by attention by SQL Server MVP Erland Sommarskog and confirmed by fellow MVP Adam Machanic, who experienced this problem at a customer site.

The issue is that parameter declarations are part of cache entry statement text.  Consider the following code:

string lastName = “Sommarskog”;

 

SqlCommand selectCommand =

    new SqlCommand(

        “SELECT ContactID, FirstName, LastName “ +

        “FROM AdventureWorks.Person.Contact “ +

        “WHERE LastName = @LastName”,

        connection);

 

selectCommand.Parameters.AddWithValue(“@LastName”, lastName);

 

SqlClient defined the parameter as nvarchar(10) because the supplied string value length was 10 and .NET strings are Unicode.  Below is the resulting statement text from sys.dm_exec_query_plan:

(@LastName nvarchar(10))SELECT ContactID, FirstName, LastName FROM AdventureWorks.Person.Contact WHERE LastName = @LastName

 

Now run the same code with a different last name value:

string lastName = “Machanic”;

 

SqlCommand selectCommand =

    new SqlCommand(

        “SELECT ContactID, FirstName, LastName “ +

        “FROM AdventureWorks.Person.Contact “ +

        “WHERE LastName = @LastName”,

        connection);

 

selectCommand.Parameters.AddWithValue(“@LastName”, lastName);

 

You probably guessed it – an additional cache entry was created for the same query due to the different name length:

(@LastName nvarchar(8))SELECT ContactID, FirstName, LastName FROM AdventureWorks.Person.Contact WHERE LastName = @LastName

 

It is likely that no more than a dozen or so cache entries will exist for this particular query since the only differentiator is the last name length.  However, with queries that contain multiple string parameters of varying lengths, the number of cache entries increases dramatically while the likelihood of reuse diminishes.  In Adam’s case, the main culprit was a string with a large list of values (between 1 and 1000+ items) that was split using a technique described in Erland’s Arrays and Lists articles http://www.sommarskog.se. 

Preventing Procedure Cache Bloat

An easy way to prevent unneeded procedure cache entries is to explicitly specify the maximum data length for string and other variable length values.  In fact, it’s a good idea to specify both the length and SqlDbType to ensure the parameter declaration matches the target column.  Not only will this help prevent cache bloat due to different parameter lengths, the matching data type can avoid the performance gotcha of a non-sargable expression caused by implicit data type conversion.

There are a number techniques to specify the parameter length and SqlDbType in .NET.  For example, the code below uses an overloaded SqlCommand.Parameters.Add method.  If you use AddWithValue, be sure to set the Size and SqlDbType properties of the newly created parameter afterward.

string lastName = “Sommarskog”;

 

SqlCommand selectCommand =

    new SqlCommand(

        “SELECT ContactID, FirstName, LastName “ +

        “FROM AdventureWorks.Person.Contact “ +

        “WHERE LastName = @LastName”,

        connection);

 

selectCommand.Parameters.Add(“@LastName”, SqlDbType.NVarChar, 50).Value = lastName;

 

 

Why Parameters are a Best Practice

Perhaps the single most important SQL Server application development Best Practice is the consistent use of parameters in application code.  Parameterized database calls are more secure, easier to program and promote query plan reuse.  Yet I continue to see code with SQL statements constructed by concatenating literal strings with variable values.  I think this is largely out of ignorance rather than design so I decided to show how easy it is for .NET application developers to use parameters and peek into the internals to show why parameters are so important.

How to Use Parameters

Below is a C# snippet of a SqlClient query built and executed using variable concatenation.  This is an example of how not to supply variable values to a SQL query.

string myTableId = 1;

 

SqlCommand selectCommand =

    new SqlCommand(“SELECT LastName, CreateDate, Amount FROM dbo.MyTable “ +

    “WHERE MyTableID = “ + myTableId.ToString(), connection);

 

Here is a parameterized equivalent of the same query.  The only changes are the parameter token in the query string and that the parameter value was added to the parameter collection instead of embedding in the query text:

int myTableId = 1;

 

SqlCommand selectCommand =

    new SqlCommand(“SELECT LastName, CreateDate, Amount FROM dbo.MyTable “ +

    “WHERE MyTableID = @MyTableID”, connection);

 

selectCommand.Parameters.AddWithValue(“@MyTableID”, myTableId);

 

So all an application developer needs to do is specify parameter tokens in a query string and add values to the parameter collection.  In the case of a parameterized stored procedure call, no tokens are necessary; simply specify the procedure name as the CommandText, specify CommandType.StoredProcedure and add parameters to the collection:

int myTableId = 1;

 

SqlCommand selectCommand =

    new SqlCommand(“dbo.usp_SelectMyTableData”, connection);

selectCommand.CommandType = CommandType.StoredProcedure;

 

selectCommand.Parameters.AddWithValue(“@MyTableID”, myTableId);

 

Parameters Under the Covers

The real differences between a parameterized and non-parameterized statement can be gleaned from a SQL and network trace.  Here is the associated trace event captured with SQL Profiler of the non-parameterized code example:

Profiler trace showing SQL:BatchCompleted

Note the “SQL:BatchCompleted” event.  The SqlClient API simply sent the query to SQL Server as a batch (text containing one or more SQL statements) because the SqlCommand.Parameters collection was empty.

Digging deeper with a tool like Microsoft Network Monitor, we can see the TDS (Tabular Data Stream) protocol data that SqlClient sent to SQL Server for the request:

TDS Packet with SQL statement batch

The TDS packet contains the entire Unicode SQL statement string, including the text value of 1 (Unicode character 0x0031, boxed in blue) that the application passed in the CommandText.

Now let’s take a look at the parameterized query trace event:

Profiler trace showing RPC:Completed

The fact that the EventClass shows “RPC:Completed” event instead of “SQL:BatchCompleted” indicates that SqlClient didn’t really wrap the query in sp_executesql to achieve parameterization like the TextData seems to show.  A RPC (Remote Procedure Call) is a specialized type of SQL request in which parameter values are passed separately from the SQL statement in the TDS request and in native form.  The SqlClient API knew to use a RPC instead of a batch because the non-empty parameters collection.

We can see from the dump of the TDS packet that it contains no reference to sp_executesql nor does it contain the string value 1 for the parameter value that the Profiler trace showed (see SQL Trace Parameter values are not always as they seem for more info).

TDS packet showing RPC

The important point with the RPC request is that the value 1 (binary integer value, boxed in blue) was passed separately from the query text.  This fact has a number of important implications for security, performance and programming. The difference between the TDS packets shows the essence of parameters in communication between the application code and SQL Server. 

Use Parameters for Security

If you don’t know what SQL injection is, Google is your friend.  Basically, injection vulnerability allows a user to manipulate the intended SQL by supplying mal-formed values.  Injection can allow a malicious user to retrieve sensitive data or change database information, limited only by the database security context.

Parameterized calls protect against SQL injection because the executed SQL statement is constant regardless of the supplied values.  As long as no dynamic SQL is constructed, the intended SQL statement cannot be manipulated.  An application developer only needs to ensure that the desired parameter values can be passed (e.g. key value from session state instead of from http query string) to guard against unintended data disclosure or manipulation.  Using stored procedures (with CommandType.StoredProcedure) always results in an RPC call and provides an additional security layer.

Use Parameters for Performance

Parameterized statements are cached to facilitate reuse and avoid compilation overhead, which is especially important in high-volume OLTP environments.  Furthermore, using stored procedures further promotes caching and reuse. 

The only downside to parameters from a performance perspective is that a given plan may be optimal for some values but not others.  In cases where sub-optimal cached plans are an issue, techniques such as RECOMPILE hints or compromise plan guides can help avoid a sub-optimal execution plans due to varying parameter values.  See article Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005 for more information.

Use Parameters for Cleaner Code

Parameters are easier to program and provide cleaner code because there is no need to escape (double-up) quotes, format datetime strings or use “.” as a decimal separator.   Remember that the API takes care of passing parameter values in native form so string formatting is not needed when parameters are used.

String lastName = “O’Leary”;

DateTime hireDate = DateTime.Now;

decimal amount = 12.34m;

 

SqlCommand insertCommand =

    new SqlCommand(

        “INSERT INTO dbo.MyTable “ +

            “(LastName, CreateDate, Amount) “ +

            “VALUES(‘” + lastName.Replace(“‘”, “””) + “‘, ‘” +

            DateTime.Now.ToString(“yyyy-MM-ddTHH:mm:ss.fff”) + “‘, “ +

            amount.ToString(“0.00”) + “)”,

        connection);

 

string lastName = “O’Leary”;

DateTime createDate = DateTime.Now;

decimal amount = 12.34m;

 

SqlCommand insertCommand =

    new SqlCommand(

        “INSERT INTO dbo.MyTable “ +

            “(LastName, CreateDate, Amount) “ +

            “VALUES(@LastName, @CreateDate, @Amount)”,

        connection);

 

insertCommand.Parameters.AddWithValue(“@LastName”, lastName);

insertCommand.Parameters.AddWithValue(“@CreateDate”, createDate);

insertCommand.Parameters.AddWithValue(“@Amount”, amount);

 

Security with Ownership Chains

Security with Ownership Chains

 

Ownership chaining is one of my favorite SQL Server security features.  I like ownership chaining because it allows me to confine data access to specific stored procedures, views and functions while preventing direct access to the underlying tables.  This approach provides a well-defined application interface that insulates applications from the physical implementation and allows me to change things on the back-end as long as I don’t change the interface.  Also, thanks to ownership chaining, I have many options to vertically and horizontally partition data to meet security objectives while ensuring tables are not accessed directly.  

Ignorance is not bliss when it comes to ownership chaining security, though.  I’ve seen cases where data access was inadvertently allowed because ownership chaining wasn’t considered.  It is imperative that SQL Server DBAs understand ownership chaining in the SQL Server security model so that data is properly secured.  I want to point out a couple of important security concepts as it relates to ownership chaining with the hope that it will help SQL Server DBAs implement a security scheme that is appropriate for their environment and also easy to manage.

Ownership Chain Behavior

SQL Server always checks permissions on directly referenced objects using the current connection security context.  However, permissions are evaluated differently when objects are accessed indirectly, such as those used in a trigger, view or stored procedure.  The behavior of ownership chaining is that object permissions are not evaluated when the calling and referenced object owners are the same.  Also, because object permission checking is short-circuited in an unbroken chain, ownership chaining takes precedence over an impersonated security context (EXECUTE AS clause) as well as denied object permissions.   Consequently, permissions on indirectly referenced objects are irrelevant in an unbroken ownership chain.  This ownership chaining behavior gives a DBA control over exactly who can use which objects while preventing ad-hoc access to other objects.

Ownership chaining behavior might not be intuitive at first.  I like to think of it from the perspective that the purpose of granting object permissions is so the object can actually be used.  For example, it is of no use to grant a user SELECT permissions on a view unless data can be retrieved from the view.  The same applies even when a stored procedure or function impersonates with EXECUTE AS; the impersonated principal must be able to use objects referenced by the proc in order for the procedure to be useful.  I keep in mind that with ownership chaining, I effectively grant permissions on exposed object functionality regardless of how the functionality is implemented.

Ownership chaining is limited to a single owner in order to prevent escalation of privileges in environments where non-privileged users can create objects.  Ownership chaining also does not apply to dynamic SQL executed from a stored procedure; the invoking (or impersonated) user needs permissions on objects referenced by dynamic SQL statements.  I should add that a one of the useful purposes of EXECUTE AS and signing modules with certificates is to provide needed permissions in a broken ownership chain or granting statement permissions without granting user permissions directly.  See SQL Server MVP Erland Sommarskog’s article Giving Permissions through Stored Procedures for a discussion of these techniques.

Ownership Best Practices

Ownership is obviously central to ownership chaining.  In the vast majority of environments, it’s probably best to simply inherit the owner (a.k.a. AUTHORIZATION) from the object’s schema instead of overriding on individual objects.  It is possible to assign ownership to individual objects (using ALTER AUTHORIZATION) in order to implement chaining that is independent of schema but I think this is overly complex for most environments.  One reason I can think of why one might specify a different owner at the object level is in the case where a table contains particularly sensitive data and one wants to deliberately prevent inadvertent access via ownership chaining.  I think such cases are rare, though. 

I described some of the practices I follow for schema and object ownership in Keep Schema and Ownership Simple.  In summary, I specify dbo as the schema authorization (owner) unless I want to break the ownership chain between schemas to create a security boundary.  When a security boundary is desired, I create a role (or user without login) with the same name as the schema for the schema authorization.

Keep Schema and Ownership Simple

I like to keep things simple because simplicity is easier to manage and less prone to error.  When I’m faced with schema design decisions, I pick the selection with the least complexity that meets my objectives.  Here are some of my thoughts regarding schema and ownership in SQL 2005.

Schema

A schema is basically a container that categorizes database objects and simplifies security administration.  As a namespace, schemas logically organize objects without the need for special object naming rules.  Different objects can have the same name as long as they exist in different schemas because the schema name is essentially an extension of the object name that will “uniqueify” the name within a database. 

Categorizing objects by schema is particularly useful in complex databases with many objects.  There is some subjectivity on exactly how one might draw schema boundaries but the basic concept is the same; group related objects into different schema to provide organization to an otherwise unwieldy schema.  Classifying related objects by schema makes complex databases easier to understand and manage.

Schema also simplifies security administration because permissions can be granted en mass at the schema level.  For example, I can grant EXECUTE permissions on all objects in a schema with a single statement like “GRANT EXECUTE ON SCHEMA::Sales TO SalesRole”.  I can grant CONTROL permissions on a schema to allow privileged users to full control over a specific schema but not others in the same database.

Even with the option to use multiple schemas, I tend to use the built-in dbo schema.  I do this because most of the applications I maintain were developed before SQL 2005 and all objects are already in the dbo schema.  Some of those legacy systems could benefit from multiple schemas but I’ll continue to use dbo for those applications to be consistent until I need to add a group of new objects that are appropriate for a separate schema.  The new SQL 2005 databases I’ve developed thus far have been fairly simple and haven’t warranted using multiple schemas for either classification or security purposes.

Ownership

The owner is important for two main reasons:  1) the owner has powerful CONTROL permissions over all owned objects and 2) the owner determines whether or not the ownership chain is broken between objects.  Schema-contained objects will inherit the schema owner unless explicitly overridden using ALTER AUTHORIZATION.  Personally, I think it best for objects to inherit the schema owner in the vast majority of cases; if an object warrants a different owner than the schema, the object probably belongs in a different schema.

I use the built-in dbo principal for ownership unless I have a reason to do otherwise.  This approach is perfect in environments where only db-owner role members can create objects and schemas are used solely as a namespace rather than a security boundary.  The dbo principal exists in all databases so there is no a need to create a user or role for ownership purposes.  Simple is good.

Different schema owners provide a security boundary between objects in different schema because this breaks the ownership chain.  With a broken chain, explicit permissions on indirectly referenced objects are needed by the end user or the impersonated principal.  Different schema owners ensure that I don’t inadvertently provide access to data in different schema via ownership chaining.

Note that an owner can be any database principal and does not necessarily need to be associated with a login.  I find this feature especially handy in situations where I want to specify an owner other than dbo.  Since I can specify an owner that is not a real person, I won’t need to change ownership if the owner leaves the company or moves on to other roles in the organization. 

It’s probably best to create a principal (role or a user without login) with the same name as the schema for ownership purposes when different owners are desired.  The only case I can think of where it might be appropriate for a real person to own a schema (at least initially) is in a development environment when non-dbo users might create schemas and objects.