Help Stop SQL Injection Madness

SQL injection continues to be the most exploited security vulnerability on the internet. A SQL injection vulnerability can put companies out of business and expose innocent people to sensitive data disclosure and damage, all due to developer ignorance or carelessness. Sadly, although there is much information available on SQL injection exploits and mitigation, SQL injection exploits continue because developers use flawed database access techniques. This article details the best, and simplest, method to prevent SQL injection in SQL Server: parameterized SQL statements.

Parameterized SQL queries and stored procedure calls (with CommandType.StoredProcedure and no dynamic SQL within proc code) prevent unintended SQL from being executed. I’ll discuss exactly why shortly. Parameterized SQL also provides many benefits besides security. Notably:

  • allows quotes in data without programmatic escaping
  •  avoids the need to format date strings according to DATEFORMAT session settings
  • handles decimal separators automatically
  •  improves performance due to plan reuse
  •  provides cleaner code

Most articles on SQL injection focus on validating user input rather than parameterized queries.  Input validation certainly has benefits related to data integrity and user-friendly validation messages but it is at best a secondary defense against injection compared to parameterized queries.  Input validation using a blacklist of prohibited keywords and characters is especially weak.

SQL Injection Overview
I am puzzled by the number of examples in articles and forum posts that show building SQL statements with literals using a string concatenation technique.  I suspect the main reason for this practice is that developers write code that generates the exact same ad-hoc SQL statement they would write using a query tool like SQL Server Management Studio or Visual Studio.  This leads to the nasty habit of not using parameters.  For example, a developer might develop and test a query like this in SSMS:

SELECT FirstName, LastName, EmailAddress
FROM dbo.Customer
WHERE CustomerID = 12345;

After testing the query, the C# developer includes the SQL statement in the application code but, instead of a hard-coded value for CustomerID, the desired value is concatenated with the remainder of the SQL statement as a literal.

var command = new SqlCommand(
             "SELECT FirstName, LastName, EmailAddress" +
            " FROM dbo.Customer " +
            " WHERE CustomerID = '" + txtBoxCustomerID.Text + "';"
            ,connection);

The query runs perfectly well and returns the expected result in testing.  It is not until after production deployment that a script kiddie or hacker finds the SQL injection vulnerability.  A malicious person can easily manipulate this SQL statement in ways unintended by the developer.  For example, a hacker could enter CustomerID value “1 OR 1 = 1” in the text box to harvest the names and email addresses of all customers in the database.  This vulnerability can also be exploited to execute additional SQL statements in the same batch, which is often used to update data with malicious html script that is subsequently returned and executed by unsuspecting client browsers.  There are many ways to exploit a SQL injection vulnerability that result in sensitive data disclosure and/or database manipulation.  I won’t detail those here but rather recommend a very simple defense; use a parameterized statement instead of literals.

Mitigating SQL Injection
Below is a functionally identical parameterized query.

SELECT FirstName, LastName, EmailAddress
FROM dbo.Customer
WHERE CustomerID = @CustomerID;

This query can be incorporated into the C# application code with the parameter and corresponding value added to the parameterized SQL command as a strongly-typed parameter:

var command = new SqlCommand(
             "SELECT LastName" +
            " FROM dbo.Customer " +
            " WHERE CustomerID = @CustomerID;"
            ,connection);
command.Parameters.Add("@CustomerID", SqlDbType.Int).Value = int.Parse(textBoxCustomerID.Text);

As you can see, the effort needed to do the same job with a parameterized command is about the same as the string concatenation method, yet the security benefits are immeasurable.

Why Parameters Prevent SQL Injection
Parameterized SQL is more secure for a couple of reasons.  In the case of an ad-hoc query, the SQL statement with parameter markers is hard-coded in the application (or generated via an ORM framework) and passed to SQL Server as-is.  Parameter values are passed separately rather than inside the SQL statement itself.  This ensures the SQL statement cannot be modified by the values provided.

Similarly, a stored procedure call (using CommandType.StoredProcedure) sends only the stored procedure name to SQL Server with separately passed parameter values.  Using stored procedures also allow one to more strictly adhere to the security principal of least privilege.  By granting only stored procedure execute permissions, permissions on indirectly referenced objects are not needed as long as the ownership chain is unbroken.  This practices limits ad-hoc access to underlying data should an attacker find a back door into the database under the application security context.  Personally, I’m a staunch advocate of stored procedures for this and other reasons too.  Stored procedures provide a well-defined database interface, allowing queries and database schema to be refactored without breaking changes.

I should add that it is perfectly acceptable to build a SQL statement dynamically in either application code or stored procedures, but only if parameterized and actually needed for the task at hand.  A common use case is a dynamic search query that is constructed based on user criteria. Instead of coding a separate query for each possible search permutation, one can add parameterized WHERE clauses as needed.  Under no circumstances should a SQL statement string be built with concatenation of untrusted values.

Additional Precautions
Use Windows Authentication for SQL Server database access.  This practice eliminates the need to store security credentials in application configuration files.  If SQL authentication must be used, protect the credentials with encryption.  This is easily done in .NET by encrypting the connectionStrings section of the app.config file.  See http://msdn.microsoft.com/en-us/library/ms254494.aspx.

Adhere to Principal of least privilege.  Never use a privileged account, such as a sysadmin role member, for routine application access.  Grant permissions only on database objects which are used directly by applications.  Use stored procedures so that one can leverage ownership chaining, eliminating the need to grant permissions on tables directly.

Specify strongly-typed parameters.  The parameter data type can be inferred (e.g. using SqlCommand.Parameters.AddWithValue method) but a best practice is to explicitly specify a parameter data type that matches the underlying column data type and maximum length.  These practices will minimize the size of the SQL Server procedure cache and avoid implicit conversions that can negatively affect performance.

Why Parameters are a Best Practice

Perhaps the single most important SQL Server application development Best Practice is the consistent use of parameters in application code.  Parameterized database calls are more secure, easier to program and promote query plan reuse.  Yet I continue to see code with SQL statements constructed by concatenating literal strings with variable values.  I think this is largely out of ignorance rather than design so I decided to show how easy it is for .NET application developers to use parameters and peek into the internals to show why parameters are so important.

How to Use Parameters

Below is a C# snippet of a SqlClient query built and executed using variable concatenation.  This is an example of how not to supply variable values to a SQL query.

string myTableId = 1;

 

SqlCommand selectCommand =

    new SqlCommand(“SELECT LastName, CreateDate, Amount FROM dbo.MyTable “ +

    “WHERE MyTableID = “ + myTableId.ToString(), connection);

 

Here is a parameterized equivalent of the same query.  The only changes are the parameter token in the query string and that the parameter value was added to the parameter collection instead of embedding in the query text:

int myTableId = 1;

 

SqlCommand selectCommand =

    new SqlCommand(“SELECT LastName, CreateDate, Amount FROM dbo.MyTable “ +

    “WHERE MyTableID = @MyTableID”, connection);

 

selectCommand.Parameters.AddWithValue(“@MyTableID”, myTableId);

 

So all an application developer needs to do is specify parameter tokens in a query string and add values to the parameter collection.  In the case of a parameterized stored procedure call, no tokens are necessary; simply specify the procedure name as the CommandText, specify CommandType.StoredProcedure and add parameters to the collection:

int myTableId = 1;

 

SqlCommand selectCommand =

    new SqlCommand(“dbo.usp_SelectMyTableData”, connection);

selectCommand.CommandType = CommandType.StoredProcedure;

 

selectCommand.Parameters.AddWithValue(“@MyTableID”, myTableId);

 

Parameters Under the Covers

The real differences between a parameterized and non-parameterized statement can be gleaned from a SQL and network trace.  Here is the associated trace event captured with SQL Profiler of the non-parameterized code example:

Profiler trace showing SQL:BatchCompleted

Note the “SQL:BatchCompleted” event.  The SqlClient API simply sent the query to SQL Server as a batch (text containing one or more SQL statements) because the SqlCommand.Parameters collection was empty.

Digging deeper with a tool like Microsoft Network Monitor, we can see the TDS (Tabular Data Stream) protocol data that SqlClient sent to SQL Server for the request:

TDS Packet with SQL statement batch

The TDS packet contains the entire Unicode SQL statement string, including the text value of 1 (Unicode character 0x0031, boxed in blue) that the application passed in the CommandText.

Now let’s take a look at the parameterized query trace event:

Profiler trace showing RPC:Completed

The fact that the EventClass shows “RPC:Completed” event instead of “SQL:BatchCompleted” indicates that SqlClient didn’t really wrap the query in sp_executesql to achieve parameterization like the TextData seems to show.  A RPC (Remote Procedure Call) is a specialized type of SQL request in which parameter values are passed separately from the SQL statement in the TDS request and in native form.  The SqlClient API knew to use a RPC instead of a batch because the non-empty parameters collection.

We can see from the dump of the TDS packet that it contains no reference to sp_executesql nor does it contain the string value 1 for the parameter value that the Profiler trace showed (see SQL Trace Parameter values are not always as they seem for more info).

TDS packet showing RPC

The important point with the RPC request is that the value 1 (binary integer value, boxed in blue) was passed separately from the query text.  This fact has a number of important implications for security, performance and programming. The difference between the TDS packets shows the essence of parameters in communication between the application code and SQL Server. 

Use Parameters for Security

If you don’t know what SQL injection is, Google is your friend.  Basically, injection vulnerability allows a user to manipulate the intended SQL by supplying mal-formed values.  Injection can allow a malicious user to retrieve sensitive data or change database information, limited only by the database security context.

Parameterized calls protect against SQL injection because the executed SQL statement is constant regardless of the supplied values.  As long as no dynamic SQL is constructed, the intended SQL statement cannot be manipulated.  An application developer only needs to ensure that the desired parameter values can be passed (e.g. key value from session state instead of from http query string) to guard against unintended data disclosure or manipulation.  Using stored procedures (with CommandType.StoredProcedure) always results in an RPC call and provides an additional security layer.

Use Parameters for Performance

Parameterized statements are cached to facilitate reuse and avoid compilation overhead, which is especially important in high-volume OLTP environments.  Furthermore, using stored procedures further promotes caching and reuse. 

The only downside to parameters from a performance perspective is that a given plan may be optimal for some values but not others.  In cases where sub-optimal cached plans are an issue, techniques such as RECOMPILE hints or compromise plan guides can help avoid a sub-optimal execution plans due to varying parameter values.  See article Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005 for more information.

Use Parameters for Cleaner Code

Parameters are easier to program and provide cleaner code because there is no need to escape (double-up) quotes, format datetime strings or use “.” as a decimal separator.   Remember that the API takes care of passing parameter values in native form so string formatting is not needed when parameters are used.

String lastName = “O’Leary”;

DateTime hireDate = DateTime.Now;

decimal amount = 12.34m;

 

SqlCommand insertCommand =

    new SqlCommand(

        “INSERT INTO dbo.MyTable “ +

            “(LastName, CreateDate, Amount) “ +

            “VALUES(‘” + lastName.Replace(“‘”, “””) + “‘, ‘” +

            DateTime.Now.ToString(“yyyy-MM-ddTHH:mm:ss.fff”) + “‘, “ +

            amount.ToString(“0.00”) + “)”,

        connection);

 

string lastName = “O’Leary”;

DateTime createDate = DateTime.Now;

decimal amount = 12.34m;

 

SqlCommand insertCommand =

    new SqlCommand(

        “INSERT INTO dbo.MyTable “ +

            “(LastName, CreateDate, Amount) “ +

            “VALUES(@LastName, @CreateDate, @Amount)”,

        connection);

 

insertCommand.Parameters.AddWithValue(“@LastName”, lastName);

insertCommand.Parameters.AddWithValue(“@CreateDate”, createDate);

insertCommand.Parameters.AddWithValue(“@Amount”, amount);

 

Keep Schema and Ownership Simple

I like to keep things simple because simplicity is easier to manage and less prone to error.  When I’m faced with schema design decisions, I pick the selection with the least complexity that meets my objectives.  Here are some of my thoughts regarding schema and ownership in SQL 2005.

Schema

A schema is basically a container that categorizes database objects and simplifies security administration.  As a namespace, schemas logically organize objects without the need for special object naming rules.  Different objects can have the same name as long as they exist in different schemas because the schema name is essentially an extension of the object name that will “uniqueify” the name within a database. 

Categorizing objects by schema is particularly useful in complex databases with many objects.  There is some subjectivity on exactly how one might draw schema boundaries but the basic concept is the same; group related objects into different schema to provide organization to an otherwise unwieldy schema.  Classifying related objects by schema makes complex databases easier to understand and manage.

Schema also simplifies security administration because permissions can be granted en mass at the schema level.  For example, I can grant EXECUTE permissions on all objects in a schema with a single statement like “GRANT EXECUTE ON SCHEMA::Sales TO SalesRole”.  I can grant CONTROL permissions on a schema to allow privileged users to full control over a specific schema but not others in the same database.

Even with the option to use multiple schemas, I tend to use the built-in dbo schema.  I do this because most of the applications I maintain were developed before SQL 2005 and all objects are already in the dbo schema.  Some of those legacy systems could benefit from multiple schemas but I’ll continue to use dbo for those applications to be consistent until I need to add a group of new objects that are appropriate for a separate schema.  The new SQL 2005 databases I’ve developed thus far have been fairly simple and haven’t warranted using multiple schemas for either classification or security purposes.

Ownership

The owner is important for two main reasons:  1) the owner has powerful CONTROL permissions over all owned objects and 2) the owner determines whether or not the ownership chain is broken between objects.  Schema-contained objects will inherit the schema owner unless explicitly overridden using ALTER AUTHORIZATION.  Personally, I think it best for objects to inherit the schema owner in the vast majority of cases; if an object warrants a different owner than the schema, the object probably belongs in a different schema.

I use the built-in dbo principal for ownership unless I have a reason to do otherwise.  This approach is perfect in environments where only db-owner role members can create objects and schemas are used solely as a namespace rather than a security boundary.  The dbo principal exists in all databases so there is no a need to create a user or role for ownership purposes.  Simple is good.

Different schema owners provide a security boundary between objects in different schema because this breaks the ownership chain.  With a broken chain, explicit permissions on indirectly referenced objects are needed by the end user or the impersonated principal.  Different schema owners ensure that I don’t inadvertently provide access to data in different schema via ownership chaining.

Note that an owner can be any database principal and does not necessarily need to be associated with a login.  I find this feature especially handy in situations where I want to specify an owner other than dbo.  Since I can specify an owner that is not a real person, I won’t need to change ownership if the owner leaves the company or moves on to other roles in the organization. 

It’s probably best to create a principal (role or a user without login) with the same name as the schema for ownership purposes when different owners are desired.  The only case I can think of where it might be appropriate for a real person to own a schema (at least initially) is in a development environment when non-dbo users might create schemas and objects.

Database Owner Troubles

Do you know who owns your databases?  Execute sp_helpdb on your SQL Server instances and you might find some surprises under the “owner” column.  It isn’t uncommon to see accounts of people who have left the company or moved on to other roles in the organization that don’t require privileged database access.  Yet these owners still have full database permissions, including the ability to drop the database.  To prevent these security issues and other problems, consider establishing an appropriate database ownership standard for your environments.

Database ownership is an often forgotten detail because it is implicitly set to the database creator’s account.  The owner will initially be a Windows account or SQL login, depending on the authentication method used by the creator.  Note that the owner is always an individual account, not a group or role, so a database created by a sysadmin role member is actually owned by the creator’s individual account instead of a built-in security principal (unless the creator logged in using the “sa” account).

A Best Practice is to change the database owner immediately after creating, restoring or attaching a database.  Unless I have a reason to do otherwise, I specify “sa” as the database owner.  This can be done with sp_changedbowner in SQL 2000 or with ALTER AUTHORIZATION in SQL 2005:

SQL 2000:

EXEC MyDatabase..sp_changedbowner ‘sa’;

SQL 2005 and SQL 2008

ALTER AUTHORIZATION ON DATABASE::MyDatabase to sa;

 

The Significance of the Database Owner

Database ownership is important from a security perspective because the owner account is mapped to the built-in “dbo” user.   The “dbo” user, sysadmin role members and db_owner role members all have full database permissions and can also DROP the database.  The database owner is also used as the authorization of the “dbo” schema, which comes into play with ownership chaining.  With cross-database chaining, the databases involved must have the same owner in order to provide an unbroken chain for “dbo” schema objects.

A difference between the database owner and db_owner role members is that there is exactly one “dbo” user (the database owner) but there may be many users that are db_owner role members.  The owner’s account cannot be explicitly added to the database because the owner is already implicitly mapped to the “dbo” user and an account can be mapped to no more than one user per database.  If you attempt to add the owner as a database user, error message “The proposed new database owner is already a user or aliased in the database” results.

 

Troubleshooting Database Ownership

The database owner is ultimately identified by the account SID (security identifier).  The creator’s account SID is recorded in 2 places:  1)  at the server level in sys.databases/sysdatabases and  2)  in the database as the dbo user SID in sys.database_principals/sysusers.  These SIDs will normally match but can get out-of-sync following a database restore or attach.  You will also end up with a NULL database owner if the owner’s Windows account is deleted because of the orphaned SID.

Mismatched owner SIDs can result in problems such as

·         Problems executing system stored procedures

·         Problems with tools

·         Broken cross-database ownership chains

The sample queries below will help identify problem database owners.  Problem owners will have mismatched SIDs and/or NULL owner names.  Fortunately, problem database ownership is easy to remedy.  Simply change the database owner using sp_changedbowner or ALTER AUTHORIZATION as you would after creating a new database.  In some cases, you might get an erroneous “The proposed new database owner is already a user or aliased in the database” due to the mismatch.  A workaround in this situation is to temporarily change the database owner to a non-conflicting login and then back to the desired owner.

 

SQL 2000:

IF OBJECT_ID(N‘tempdb..#owners’, ‘U’) IS NOT NULL

      DROP TABLE #owners;

 

CREATE TABLE #owners

(

      database_name sysname NOT NULL,

      sys_databases_sid varbinary(85) NOT NULL,

      sys_databases_owner nvarchar(256) NULL,

      sys_users_sid varbinary(85) NULL,

      sys_users_owner nvarchar(256) NULL

);

 

INSERT INTO #owners

      (

            database_name,

            sys_databases_sid,

            sys_databases_owner

      )

      SELECT

            name,

            sid,

            SUSER_SNAME(sid)

      FROM master.dbo.sysdatabases;

 

EXEC sp_MSforeachdb

      UPDATE #owners

      SET sys_users_sid = (

                  SELECT sid

                  FROM [?].dbo.sysusers

                  WHERE name = ”dbo”),

            sys_users_owner = (

                  SELECT SUSER_SNAME(sid)

                  FROM [?].dbo.sysusers

                  WHERE name = ”dbo”)

      WHERE database_name = ”?”

      ;

 

SELECT * FROM #owners

WHERE sys_databases_sid <> sys_users_sid;

 

IF OBJECT_ID(N‘tempdb..#owners’, ‘U’) IS NOT NULL

      DROP TABLE #owners;

GO

SQL 2005:

IF OBJECT_ID(N‘tempdb..#owners’, ‘U’) IS NOT NULL

      DROP TABLE #owners;

 

CREATE TABLE #owners

(

      database_name sysname NOT NULL,

      sys_databases_sid varbinary(85) NOT NULL,

      sys_databases_owner nvarchar(256) NULL,

      sys_users_sid varbinary(85) NULL,

      sys_users_owner nvarchar(256) NULL

);

 

INSERT INTO #owners

      (

            database_name,

            sys_databases_sid,

            sys_databases_owner

      )

      SELECT

            name,

            owner_sid,

            SUSER_SNAME(owner_sid)

      FROM sys.databases;

 

EXEC sp_MSforeachdb

      UPDATE #owners

      SET sys_users_sid = (

                  SELECT sid

                  FROM [?].sys.database_principals

                  WHERE name = ”dbo”),

            sys_users_owner = (

                  SELECT SUSER_SNAME(sid)

                  FROM [?].sys.database_principals

                  WHERE name = ”dbo”)

      WHERE database_name = ”?”

      ;

 

SELECT * FROM #owners

WHERE

      sys_databases_sid <> sys_users_sid

      OR sys_databases_owner IS NULL;

 

IF OBJECT_ID(N‘tempdb..#owners’, ‘U’) IS NOT NULL

      DROP TABLE #owners;

GO