Keep Schema and Ownership Simple

I like to keep things simple because simplicity is easier to manage and less prone to error.  When I’m faced with schema design decisions, I pick the selection with the least complexity that meets my objectives.  Here are some of my thoughts regarding schema and ownership in SQL 2005.

Schema

A schema is basically a container that categorizes database objects and simplifies security administration.  As a namespace, schemas logically organize objects without the need for special object naming rules.  Different objects can have the same name as long as they exist in different schemas because the schema name is essentially an extension of the object name that will “uniqueify” the name within a database. 

Categorizing objects by schema is particularly useful in complex databases with many objects.  There is some subjectivity on exactly how one might draw schema boundaries but the basic concept is the same; group related objects into different schema to provide organization to an otherwise unwieldy schema.  Classifying related objects by schema makes complex databases easier to understand and manage.

Schema also simplifies security administration because permissions can be granted en mass at the schema level.  For example, I can grant EXECUTE permissions on all objects in a schema with a single statement like “GRANT EXECUTE ON SCHEMA::Sales TO SalesRole”.  I can grant CONTROL permissions on a schema to allow privileged users to full control over a specific schema but not others in the same database.

Even with the option to use multiple schemas, I tend to use the built-in dbo schema.  I do this because most of the applications I maintain were developed before SQL 2005 and all objects are already in the dbo schema.  Some of those legacy systems could benefit from multiple schemas but I’ll continue to use dbo for those applications to be consistent until I need to add a group of new objects that are appropriate for a separate schema.  The new SQL 2005 databases I’ve developed thus far have been fairly simple and haven’t warranted using multiple schemas for either classification or security purposes.

Ownership

The owner is important for two main reasons:  1) the owner has powerful CONTROL permissions over all owned objects and 2) the owner determines whether or not the ownership chain is broken between objects.  Schema-contained objects will inherit the schema owner unless explicitly overridden using ALTER AUTHORIZATION.  Personally, I think it best for objects to inherit the schema owner in the vast majority of cases; if an object warrants a different owner than the schema, the object probably belongs in a different schema.

I use the built-in dbo principal for ownership unless I have a reason to do otherwise.  This approach is perfect in environments where only db-owner role members can create objects and schemas are used solely as a namespace rather than a security boundary.  The dbo principal exists in all databases so there is no a need to create a user or role for ownership purposes.  Simple is good.

Different schema owners provide a security boundary between objects in different schema because this breaks the ownership chain.  With a broken chain, explicit permissions on indirectly referenced objects are needed by the end user or the impersonated principal.  Different schema owners ensure that I don’t inadvertently provide access to data in different schema via ownership chaining.

Note that an owner can be any database principal and does not necessarily need to be associated with a login.  I find this feature especially handy in situations where I want to specify an owner other than dbo.  Since I can specify an owner that is not a real person, I won’t need to change ownership if the owner leaves the company or moves on to other roles in the organization. 

It’s probably best to create a principal (role or a user without login) with the same name as the schema for ownership purposes when different owners are desired.  The only case I can think of where it might be appropriate for a real person to own a schema (at least initially) is in a development environment when non-dbo users might create schemas and objects.

Conditional INSERT/UPDATE Race Condition

Conditional INSERT/UPDATE Race Condition

 

 

I often see conditional INSERT/UPDATE code like:

CREATE PROCEDURE dbo.Insert_Or_Update_Foo

      @ID int,

      @Bar int

AS

 

SET NOCOUNT ON

 

IF EXISTS(SELECT * FROM dbo.Foo WHERE ID = @ID)

BEGIN

      UPDATE dbo.Foo

      SET bar = @bar

      WHERE ID = @ID

END

ELSE

BEGIN

      INSERT INTO dbo.Foo (ID, Bar)

      VALUES (@ID, @Bar)

END

 

RETURN @@ERROR

 

Despite its prevalence, this “UPSERT” code is fundamentally flawed because it assumes only a single user will execute the proc at a time.  A primary key violation error can occur when executed simultaneously on different connections with the same @ID value instead of the intended UPDATE. 

An even worse situation is when a surrogate key (e.g. IDENTITY) is used as the primary key and the conditional insert is based on a natural key that has no unique constraint or unique index.  In that case, the both INSERTs will succeed but duplicate data (same natural key) will be inserted.  A unique constraint (or unique index) should of course be specified on every key but that’s sometimes overlooked.

You can test the concurrency problem yourself by creating the following table for use with the above stored procedure on a multi-processor box:

CREATE TABLE dbo.Foo

(

      ID int NOT NULL

            CONSTRAINT PK_Foo PRIMARY KEY,

      Bar int NOT NULL

)

 

Then run the following script on different database connections after changing the WAITFOR TIME to the near future:

WAITFOR TIME ’08:00:00′

 

EXEC dbo.Insert_Or_Update_Foo

      @ID = 1,

      @Bar = 1

 

I ran this test and got a primary key violation because parallel execution of the IF EXISTS test returned false on both connections so the INSERT was attempted on both. 

Fortunately, there’s a simple solution to prevent the error:  1) add an explicit transaction and 2) specify SELECT locking hints.  Below is a modified version of the original stored procedure with these changes:

CREATE PROCEDURE dbo.Insert_Or_Update_Foo

      @ID int,

      @Bar int

AS

 

SET NOCOUNT, XACT_ABORT ON

 

BEGIN TRAN

 

IF EXISTS(SELECT * FROM dbo.Foo WITH (UPDLOCK, HOLDLOCK) WHERE ID = @ID)

BEGIN

      UPDATE dbo.Foo

      SET bar = @bar

      WHERE ID = @ID

END

ELSE

BEGIN

      INSERT INTO dbo.Foo (ID, Bar)

      VALUES (@ID, @Bar)

END

 

COMMIT

 

RETURN @@ERROR

 

The UPDLOCK hint instructs SQL Server to use an update lock instead of the shared lock that would normally be acquired for the SELECT.  HOLDLOCK is needed in the default READ COMMITTED isolation level to ensure that the lock is held until the end of the transaction.  This more restrictive update lock will prevent simultaneous queries from either selecting or changing the locked resource.  If the row exists during the SELECT, the locked resource is the existing row.  If no row exists with the specified key, a range lock on the primary key is acquired to prevent inserts of the same key until the lock is released. 

Although not actually required, I also added SET XACT_ABORT ON to help ensure the explicit transaction is closed following a timeout or unexpected error.  See Use Caution with Explicit Transactions in Stored Procedures to see why I recommend this as a standard practice.

I should add that this explicit transaction and locking hints technique will prevent the above mentioned problems but it doesn’t technically prevent a race condition.  The issue with any “UPSERT”/MERGE technique is that the last one in the race wins.  If you run the above proc simultaneously on different connections with the same @ID value but different @Bar values, the last Bar value UPDATE will of course overwrite any previous Bar value.

The issue doesn’t apply only to IF statements.  Even intra-query subqueries like the example below can fall victim to the issue.  Just like IF EXISTS, there is nothing to prevent the same concurrency problem when the NOT EXISTS predicate is evaluated simultaneously on different connections.

CREATE PROCEDURE dbo.Insert_Foo

      @ID int,

      @Bar int

AS

 

SET NOCOUNT ON

 

INSERT INTO dbo.Foo (ID, Bar)

VALUES (@ID, @Bar)

WHERE NOT EXISTS

      (

      SELECT *

      FROM dbo.Foo

      WHERE ID = @ID

      )

 

RETURN @@ERROR

 

Below is the modified proc with the transaction and locking hint changes:

CREATE PROCEDURE dbo.Insert_Foo

      @ID int,

      @Bar int

AS

 

SET NOCOUNT, XACT_ABORT ON

 

BEGIN TRAN

 

INSERT INTO dbo.Foo (ID, Bar)

VALUES (@ID, @Bar)

WHERE NOT EXISTS

      (

      SELECT *

      FROM dbo.Foo WITH (UPDLOCK, HOLDLOCK)

      WHERE ID = @ID

      )

 

COMMIT

 

RETURN @@ERROR

 

I’m not certain if the SQL 2008 MERGE statement suffers from the same concurrency issues since I don’t currently have an adequate (multi-processor) SQL 2008 test environment for the test.  I plan to install the next SQL 2008 CTP on a real (not virtual) machine and test the proc below.  I’ll post the results.

MERGE Stored Procedure

Use Caution with Explicit Transactions in Stored Procedures

Use Caution with Explicit Transactions in Stored Procedures

 

 

 

Explicit transactions are often used within stored procedures to guarantee all-or-nothing data integrity.  However, a little known fact is that a query timeout will leave the transaction open unless non-default session settings and/or special exception handling are used.  I’ll describe how to protect your application from problems following timeouts and other unexpected errors.

Consider the following stored procedure containing an explicit transaction:

CREATE PROCEDURE dbo.UpdateWithExplicitTransaction
	@MyKey int,
	@MyColumnValue int
AS

DECLARE @Error int

BEGIN TRAN

UPDATE dbo.Foo
SET MyColumn = @MyColumnValue
WHERE MyKey = @MyKey

SET @Error = @@ERROR
IF @Error <> 0
BEGIN
	GOTO Done
END

UPDATE dbo.Bar
SET MyColumn = @MyColumnValue
WHERE MyKey = @MyKey

SET @Error = @@ERROR
IF @Error <> 0
BEGIN
	GOTO Done
END

COMMIT

Done:

IF @Error <> 0
BEGIN
	ROLLBACK
END

RETURN @Error

You execute the script below from SQL Server Management Studio or Query Analyzer with the query timeout option set to 30 seconds and the second UPDATE statement in the proc times out. 

timeout option set to 30 seconds and the second UPDATE statement in the proc times out.

EXEC dbo.UpdateWithExplicitTransaction
PRINT ‘execution completed’
GO
SELECT
	@@ERROR AS [@@ERROR],
	@@TRANCOUNT AS [@@TRANCOUNT]
GO

Assuming default session settings, check all that apply:

a)      Proc execution continues after the failed UPDATE

b)      @@ERROR is zero

c)       @@TRANCOUNT is zero

d)      The PRINT statement is executed

Let me first mention something important about timeouts before I provide the correct answer(s).  A command timeout occurs in the client application, not the SQL Server backend.  A timeout is basically just a cancel request that is sent by the client API when a command executes longer than the specified interval.  A timeout is very much like pressing the stop button in Query Analyzer or Management Studio because you feel a query has been running too long.  The only difference is that the stop is issued by the client API on behalf of the application.

Both “A” (proc continues) and “D” (PRINT executes) are false because the attention event from the client instructed SQL Server to cancel the currently executing batch in its entirety.  No code after the UPDATE executes, including the PRINT statement following the stored procedure execute.  This is logical since a query cancel or timeout wouldn’t be much use if SQL Server continued executing statements afterward.

“B” (zero @@ERROR) is true.  @@ERROR is zero because no error occurred on the backed; SQL Server successfully canceled the batch per the client cancel request after the timeout.  The timeout error is raised only on the client by the API to notify the application (SSMS/QA in this example) that the command timed out.  SSMS and QA simply catch the error and display the error message from the API.

“C” (zero @@TRANCOINT) is false because XACT_ABORT OFF is the default session setting.  With XACT_ABORT OFF, it is the client application’s responsibility to trap the timeout error and rollback the transaction if necessary.  The transaction is left open and uncommitted following the timeout error.  This can have serious and undesirable consequences if the application performs other work on the connection, unaware of the open transaction.

Using SET XACT_ABORT

SET XACT_ABORT specifies what action SQL Server should take following run-time errors.  The default session setting is SET XACT_ABORT OFF, which indicates that only the Transact-SQL statement that raised the error is rolled back and the transaction continues.  Depending on the severity of the error, the entire transaction may be rolled back and batch aborted, even with SET XACT_ABORT is OFF.   

A side effect of SET XACT_ABORT OFF is that a cancel/timeout error can leave an open transaction so it’s the client’s responsibility to cleanup following cancel/timeout.  To safeguard against leaving an open transaction, applications that execute transactions with SET XACT_ABORT OFF need to roll back transactions and perhaps close the connection following SQL exceptions.  

Note that with connection pooling, simply closing the connection without a rollback will only return the connection to the pool and the transaction will remain open until later reused or removed from the pool.  This can result in locks begin held unnecessary and cause other timeouts and rolling blocks.

SET XACT_ABORT ON instructs SQL Server to rollback the entire transaction and abort the batch when a run-time error occurs.  Compile errors (e.g. syntax errors) are not affected by SET XACT_ABORT. 

In my experience, SET XACT_ABORT ON provides the desired behavior in most cases.  I’ve never run into a situation where I wouldn’t want to rollback a transaction following a cancel or timeout.   I nearly always specify SET XACT_ABORT ON in stored procedures that contain explicit transactions to ensure that transactions are rolled back even if the application code doesn’t clean up properly.  The only time I don’t use XACT_ABORT is in rare cases where I need to trap and handle specific errors in Transact-SQL and continue.

I strongly recommend that SET XACT_ABORT ON be included in all stored procedures with explicit transactions unless you have a specific reason to do otherwise.  The consequences of an application unwittingly performing work on a connection with an open transaction are disastrous.

SQL Server error handling in general is a huge topic I focused on only on timeout errors and SET XACT_ABORT here.  For a thorough discussion of SQL Server error handling, I suggest perusing articles Implementing Error Handling with Stored Procedures and Error Handling in SQL Server – a Background by SQL Server MVP Erland Sommarskog.