Conditional INSERT/UPDATE Race Condition

Conditional INSERT/UPDATE Race Condition

 

 

I often see conditional INSERT/UPDATE code like:

CREATE PROCEDURE dbo.Insert_Or_Update_Foo

      @ID int,

      @Bar int

AS

 

SET NOCOUNT ON

 

IF EXISTS(SELECT * FROM dbo.Foo WHERE ID = @ID)

BEGIN

      UPDATE dbo.Foo

      SET bar = @bar

      WHERE ID = @ID

END

ELSE

BEGIN

      INSERT INTO dbo.Foo (ID, Bar)

      VALUES (@ID, @Bar)

END

 

RETURN @@ERROR

 

Despite its prevalence, this “UPSERT” code is fundamentally flawed because it assumes only a single user will execute the proc at a time.  A primary key violation error can occur when executed simultaneously on different connections with the same @ID value instead of the intended UPDATE. 

An even worse situation is when a surrogate key (e.g. IDENTITY) is used as the primary key and the conditional insert is based on a natural key that has no unique constraint or unique index.  In that case, the both INSERTs will succeed but duplicate data (same natural key) will be inserted.  A unique constraint (or unique index) should of course be specified on every key but that’s sometimes overlooked.

You can test the concurrency problem yourself by creating the following table for use with the above stored procedure on a multi-processor box:

CREATE TABLE dbo.Foo

(

      ID int NOT NULL

            CONSTRAINT PK_Foo PRIMARY KEY,

      Bar int NOT NULL

)

 

Then run the following script on different database connections after changing the WAITFOR TIME to the near future:

WAITFOR TIME ’08:00:00′

 

EXEC dbo.Insert_Or_Update_Foo

      @ID = 1,

      @Bar = 1

 

I ran this test and got a primary key violation because parallel execution of the IF EXISTS test returned false on both connections so the INSERT was attempted on both. 

Fortunately, there’s a simple solution to prevent the error:  1) add an explicit transaction and 2) specify SELECT locking hints.  Below is a modified version of the original stored procedure with these changes:

CREATE PROCEDURE dbo.Insert_Or_Update_Foo

      @ID int,

      @Bar int

AS

 

SET NOCOUNT, XACT_ABORT ON

 

BEGIN TRAN

 

IF EXISTS(SELECT * FROM dbo.Foo WITH (UPDLOCK, HOLDLOCK) WHERE ID = @ID)

BEGIN

      UPDATE dbo.Foo

      SET bar = @bar

      WHERE ID = @ID

END

ELSE

BEGIN

      INSERT INTO dbo.Foo (ID, Bar)

      VALUES (@ID, @Bar)

END

 

COMMIT

 

RETURN @@ERROR

 

The UPDLOCK hint instructs SQL Server to use an update lock instead of the shared lock that would normally be acquired for the SELECT.  HOLDLOCK is needed in the default READ COMMITTED isolation level to ensure that the lock is held until the end of the transaction.  This more restrictive update lock will prevent simultaneous queries from either selecting or changing the locked resource.  If the row exists during the SELECT, the locked resource is the existing row.  If no row exists with the specified key, a range lock on the primary key is acquired to prevent inserts of the same key until the lock is released. 

Although not actually required, I also added SET XACT_ABORT ON to help ensure the explicit transaction is closed following a timeout or unexpected error.  See Use Caution with Explicit Transactions in Stored Procedures to see why I recommend this as a standard practice.

I should add that this explicit transaction and locking hints technique will prevent the above mentioned problems but it doesn’t technically prevent a race condition.  The issue with any “UPSERT”/MERGE technique is that the last one in the race wins.  If you run the above proc simultaneously on different connections with the same @ID value but different @Bar values, the last Bar value UPDATE will of course overwrite any previous Bar value.

The issue doesn’t apply only to IF statements.  Even intra-query subqueries like the example below can fall victim to the issue.  Just like IF EXISTS, there is nothing to prevent the same concurrency problem when the NOT EXISTS predicate is evaluated simultaneously on different connections.

CREATE PROCEDURE dbo.Insert_Foo

      @ID int,

      @Bar int

AS

 

SET NOCOUNT ON

 

INSERT INTO dbo.Foo (ID, Bar)

VALUES (@ID, @Bar)

WHERE NOT EXISTS

      (

      SELECT *

      FROM dbo.Foo

      WHERE ID = @ID

      )

 

RETURN @@ERROR

 

Below is the modified proc with the transaction and locking hint changes:

CREATE PROCEDURE dbo.Insert_Foo

      @ID int,

      @Bar int

AS

 

SET NOCOUNT, XACT_ABORT ON

 

BEGIN TRAN

 

INSERT INTO dbo.Foo (ID, Bar)

VALUES (@ID, @Bar)

WHERE NOT EXISTS

      (

      SELECT *

      FROM dbo.Foo WITH (UPDLOCK, HOLDLOCK)

      WHERE ID = @ID

      )

 

COMMIT

 

RETURN @@ERROR

 

I’m not certain if the SQL 2008 MERGE statement suffers from the same concurrency issues since I don’t currently have an adequate (multi-processor) SQL 2008 test environment for the test.  I plan to install the next SQL 2008 CTP on a real (not virtual) machine and test the proc below.  I’ll post the results.

MERGE Stored Procedure