Why Parameters are a Best Practice

Parameterized queries are arguably the single most important database development best practice. Parameters, with proper SQL data types, provide many benefits. These include:

– improve security by preventing injection of unwanted SQL (barring non-parameterized server-side dynamic SQL)
– eliminate need to enclose literals in quotes (or not) depending on data type
– allow single quotes within strings without the need to escape them
– avoid the need to format date/time/datetime string literals in a particular way (which vary by culture)
– do not require decimal separators (which vary by culture)
– improve performance by reducing compilation costs and promoting execution plan cache reuse
– allow WHERE clause predicates with Always Encrypted columns
– code that’s cleaner and easier to maintain

Despite these many benefits, it is all too common to see applications build SQL statements by concatenating program variable values as SQL literals with values that vary by execution. I believe this simply due to ignorance because one often learns SQL using an ad-hoc tool like SSMS in order to focus on SQL syntax and database concepts, overlooking the important database programming aspects needed in practice. Developers unwittingly build SQL queries in application code as if they were typing SQL statements into an SSMS query window. Sadly, instructors sometimes teach database programming using string concatenation instead of parameterized queries and many online examples use improper techniques too, spreading this poor practice. When one posts questions with non-parameterized query code in online help forums, the egregious practice often distracts answerers from addressing the real question, attracts downvotes, and in some cases, a parameterized query would have avoided the need for the question to begin with.

This article discusses parameter basics in commonly used programming languages and Microsoft SQL Server APIs, along with code examples and best practices. Additionally, internals of how parameters are actually sent to SQL Server are described to provide insight into why parameterized queries are so important. This article focuses specifically on Microsoft SQL Server but parameters and their benefits apply to other DBMS products as well.

Introduction

Here’s a C# example of the common mistake of building an ad-hoc query with string concatenation:

A common misconception is that using stored procedures inherently prevent SQL injection. Stored procedures can help mitigate security risks but must also be called properly to release parameterization benefits. The same problems as the previous ad-hoc query exists with the stored procedure example below because parameter values are passed as literals.

Techniques for building parameterized SQL statements vary by client API, driver, programming language, and/or framework. The underlying concept is the same, though. Instead of including literals in SQL statements for values that differ for each execution, specify parameter tokens in their place and add parameters with values to the command separately. The SQL statement itself need not change between executions but parameter values can differ for each execution without modifying the SQL statement itself.

The parameter creation techniques used here are not the only ones available to create parameters. I’ve included documentation links that show other parameter creation methods and guidance as to which should be avoided.

Most ORM frameworks (Entity Framwork, Hibernate, et.al.) build parameterized queries on your behalf. These are beyond the focus of this article but I suggest you peruse the Parameters Under the Covers section of this article to ensure parameters are indeed being passed as expected as an RPC request by the framework you are using.

Parameters with ADO.NET

I’ll first mention one should use an ADO.NET and a managed ADO.NET provider in .NET applications whenever possible. ADO.NET does allow one to use unmanaged ODBC and OLE DB drivers via the System.Data.Odbc and System.Data.OleDb namespace objects but those are intended to be used only when no suitable managed ADO.NET provider is available. For Microsoft SQL Server, the managed providers are System.Data.SqlClient (included with the .NET framework) or the newer Microsoft.Data.SqlClient (.NET Core package, currently in preview). Managed providers provide the best performance in .NET applications by avoiding the cost of interop with unmanaged code, which is especially costly with large result sets.

Also, it’s not uncommon to see VB.NET applications sometimes use the legacy ADODB (ADO classic) API instead of ADO.NET because they started life in the VB6 world and were never changed to use ADO.NET. I’ll discuss ADO classic later in the context of unmanaged code. I suggest such lacy code be remediated to use ADO.NET if possible.

Below are C# parameterized equivalents of the earlier queries. These same objects and techniques can be used in VB.NET and managed C++ with only linguistic differences. The only change to the original SELECT statement is the parameter tokens within the query text plus adding parameters to the command’s Parameters collection (instead of concatenating values as SQL literal text). Parameter names (‘@’ prefix) are used as tokens for parameter values in SqlClient queries and allow for mnemonic names to make it easy to identify individual parameters within the query.

Importantly, note that parameter tokens within a query are never enclosed in quotes because these are not literals. Also, the same parameter may be specified more than once within a query with SqlClient without the need to add an additional parameter with the same value.

It’s generally best to specify CommandType.StoredProcedure when calling a stored procedure because this reduces server-side parsing and enforces that stored procedure parameter arguments be passed as client parameters instead of literals. With SqlClient and CommandType.StoredProcedure, only the stored procedure name is specified as the command text (i.e. no parameter tokens). The procedure name should be schema-qualified for maximum performance, a consideration in high-volume OLTP workloads. Parameters are added to the Parameters collection and implicitly mapped to the actual stored procedure parameters by name rather than by ordinal so the parameter collection order does not matter when multiple parameters are involved.

One can also execute a stored procedure as a parameterized query using a T-SQL EXECUTE statement and CommandType.Text similarly to a parameterized query. There is no benefit in doing so compared to CommandType.StoredProcedure for a single proc call (and it does slightly bloat the request). However, CommandType.Text allows one to execute a parameterized batch of multiple statements in a single round trip. The batch may be a mix of EXECUTE and other DML statements.

With an EXECUTE statement, stored procedure parameters can be specified using either named (i.e. @stored_proc_parameter_name1 = @command_parameter_name1, @stored_proc_parameter_name2 = @command_parameter_name2, …) or positional (@command_parameter_name1, @command_parameter_name2, …) arguments. Stating the obvious, positional syntax necessitates the parameter argument order on the EXECUTE statement match the stored procedure definition order. However, the order of parameters within the parameters collection isn’t significant with either named or positional arguments because the parameters collection is mapped to the parameters within the SQL text by name with SqlClient.

Below is a named parameter example. The stored procedure parameter name and the client parameter name are the same in this example.

Note that the EXECUTE keyword could have be omitted in this example because SQL Server assumes the first statement of a batch is EXECUTE. I recommend the EXECUTE be explicitly specified even when the first statement in a batch.

Parameters with ADO.NET System.Data.OleDb

Again, .NET applications should always use a managed provider when one is available (SqlClient in the case of SQL Server). That being said, I use unmanaged SQL Server drivers in these examples to illustrate how queries can be parameterized in ADO.NET for other DBMS products that don’t have a managed .NET provider. The only difference would be the connection string.

Here are the equivalent OLE DB parameterized queries in ADO.NET using the latest SQL Server OLE DB provider. The differences compared to same SqlClient query are the connection string, question mark parameter tokens, and the System.Data.OleDb objects.

Parameters with ADO.NET System.Data.Odbc

Below is the ODBC parameterized query in ADO.NET. The only difference compared to the earlier OLE DB version is the connection string and System.Data.Odbc namespace objects. Although this example uses a DSN-less connection string, one could also specify an ODBC DSN in the connection string (e.g. “DSN=YourDSN”) and omit the attributes that are redundant with the DSN configuration.

Similarly, here is the ODBC version of the same previous stored procedure calls. However, with ODBC and CommandType.StoredProcedure, the command text specifies an ODBC procedure call escape sequence with parameter marker(s) rather than just the stored procedure name.

Parameters with ADO Classic

ADO “Classic” (not to be confused with ADO.NET) is a mature legacy technology that hasn’t been enhanced for about 20 years. An unmanaged COM-based API, ADO classic is still used today in classic ASP web applications, VBA (Visual Basic for Applications), legacy VB 6 code, and VB.NET code that was converted from VB 6 without being refactored to use modern managed ADO.NET APIs.

It is best to use an OLE DB driver with ADO classic because these unmanaged APIs natively use COM-based technologies. That being said, one can use ODBC (which is ultimately a call-level interface rather than COM) with ADO via the Microsoft OLE DB Provider for ODBC drivers (MSDASQL, which is the default OLE DB provider when no provider is specified). This is needed when no OLE DB provider available for a non SQL Server DBMS but should otherwise be avoided since ADO must go through the extra MSDASQL layer. A SQL Server OLE DB provider/driver (i.e. SQLOLEDB, SQLNCLI, and MSOLEDBSQL) should be used directly to access SQL Server from ADO classic.

Regardless of the underlying driver, ADO programming and parameterization is similar with either OLE DB or ODBC. The ADO classic object model is different than its ADO.NET cousin but parameterization concepts and techniques are similar. I’ve included C# examples here only to illustrate the ADO parameter object model differences compared to the previous ADO.NET examples. In practice, one would not typically use ADO classic in managed code.

Parameterized ADO query example:

Parameterized ADO proc call example:

ADO Parameters with VBScript

Below is an example stand-alone VBScript example of the query and proc call.

ADO Parameters with ASP classic

The ADO code for classic ASP is the same as the preceding stand-alone VBScript example except that the objects are instantiated using the IIS Server.CreateObject method:

Parameters Under the Covers

The job of all SQL Server client APIs regardless of programming language is to manage the Tabular Data Stream (TDS) protocol session used for SQL Server communication. This involves establishing the session (network connection and authentication), converting application requests into internal TDS request messages (e.g. batch and RPC requests), sending those to SQL Server, and receiving the results returned by SQL Server over TDS. Although details of the TDS protocol session are an implementation detail transparent to applications, delving into the way TDS requests are sent to the database engine provides valuable insight into the concepts, benefits, and use of parameters in the SQL Server world.

When a non-parameterized query is executed (i.e. an empty parameter collection and CommandType.Text in ADO.NET), the client API sends the command text to SQL Server as a SQL batch request over TDS. There is no provision for parameters in a SQL batch request. The database engine parses the batch text (which may contain multiple statements), generates an execution plan, and executes the batch. If the batch text (including literal values) exactly matches that of a previously executed query (or an auto-parameterized query) that is still cached, the plan is reused. Because a batch request has no parameters, the application must include T-SQL literals within the text for values that vary by execution. This requires the app code to build the query by concatenating SQL language text with literal strings and for SQL Server to parse those literals.

The onus is on the application to construct literal strings embedded within the SQL batch text properly so that they are parsed as desired by SQL Server. This necessitates enclosing literal values in single quotes when required and formatting the string value according to the data type it represents. The app code must double up single quotes within string literal values so that it’s interpreted as a quote instead of literal enclosure. Date/time literals should be formatted as ISO 8601 literal values to ensure the intended value is used regardless of session the DATEFORMAT setting, a common mistake with non-parameterized queries. Decimal literals must specify the ANSI/ISO standard period as the decimal separator instead of a comma used by some cultures. Source values must be validated to ensure adherence to the intended data type and not interpreted as SQL language elements (injecting unintended SQL).

Parameterized queries (e.g. non-empty parameters collection or CommandType.StoredProcedure) are processed differently. The client API sends the command to SQL Server as a remote procedure call (RPC batch) TDS request instead of a SQL batch request. This RPC batch request includes the batch text with the supplied parameter tokens (e.g. CommandType.Text) or stored procedure (e.g. CommandType.StoredProcedure). The actual parameters, containing parameter definition meta-data and values as specified by the client application, are included in separate structures as part the TDS RPC request stream sent to SQL Server. Most importantly, the actual parameter values as passed as native data types as specified by the client application. This has a many positive implications.

The database engine need not parse parameter values since they are already in the native structure of the specified parameter data type, such as a 32-bit integer, datetime structure, decimal structure, Unicode string, etc. SQL Server only checks that parameter values adhere to the TDS protocol spec and then uses the supplied parameter values directly without parsing. There is no notion of datetime formatting, string escape sequences, decimal separators, etc. because no parsed occurs. Unwanted SQL cannot be injected via parameters (barring server-side dynamic SQL) because the values are not part of a T-SQL statement.

I’ll add that the query text you see in a server-side trace of an RPC request is not necessarily what SQL Server actually received by the client API. The query text in trace data is actually a reversed-engineered rendering of the TDS protocol request in familiar T-SQL format. For example, when you pass a table-valued parameter, you’ll see an RPC batch request with batch text containing table variable declared, a bunch of INSERT statements into the variable, followed by it’s use as a parameter in the query or stored procedure. In actuality, the TVP column meta-data tokens were passed over TDS followed by the stream of TVP rows. The database engine bulk-inserted the TVP data into a tempdb structure matching the column meta-data, not individual inserts as the trace suggested.

Similarly, scalar parameter values will show as literal representations in the trace instead of the actual binary structure passed. For example, a datetime value will show as a ‘yyyy-MM-dd HH:mm:ss.fff’ literal (not an unambiguous ISO 8601 datetime format) rather than the actual binary datetime structure passed to SQL Server.

Much of this is implementation details but something you should be aware of when interpreting RPC requests in traces.

Best Practices

Parameter definitions should specify the desired data type, maximum length, precision, and scale of each parameter. Attention to these parameter definition details will help maximize performance by avoiding implicit conversions, promoting sargable expressions, and reusable cached plans.

A common problem (i.e. mistake) is using the AddWithValue method to create parameters. AddWithValue infers the data type and length based on the client variable value and may result in non-sargable expressions and cache bloat as I detailed in this post. The bottom line is that one should use a method that allows specification of the desired SqlDbType, length, etc. such as Add.