Performance testing with DBCC DROPCLEANBUFFERS

DBCC DROPCLEANBUFFERS is a common practice when unit testing SQL Server performance on an isolated test instance. This allows one to evaluate different candidates for query, stored procedure, and index tuning based on execution times in a worst-case cold buffer cache scenario and provides better test repeatability by leveling the playing field before each test. However, clearing cache in this way has considerations one should be aware of.

An important detail sometimes overlooked is that one must first execute a CHECKPOINT command in the context of the database(s) to be tested before executing DBCC DROPCLEANBUFFERS. DBCC DROPCLEANBUFFERS only frees pages that are not dirty (cached version same as on disk version) so modified pages will remain in cache when CHECKPOINT isn’t first executed. Overlooking the CHECKPOINT can result in non-repeatable test timings. One should always run CHECKPOINT before DBCC DROPCLEANBUFFERS.

One can make the argument that DBCC DROPCLEANBUFFERS might not be particularly valuable for testing. First, the storage engine in SQL Server Enterprise Edition (or Developer Edition, which is often used when testing) behaves differently with a cold cache versus a warm one. With a warm cache, a page not already in cache (e.g. index seek by primary key) will be fetched from disk using a single 8K page IO request as one expects. However, when the cache isn’t fully warmed up (Buffer Manager’s Target Pages not yet met), the entire 64K extent (8 contiguous 8K pages) is read for the single page request regardless of whether the adjacent pages are actually needed by the query. This has the benefit of warming the cache much more quickly than would otherwise occur, but given that the normal steady state of a production SQL Server is a warm cache, testing with a cold cache isn’t a fair comparison of different plans. More data than normal will be transferred from storage so timings may not be indicative of actual performance.

The storage engine also behaves differently during scans when data are not already cached regardless of the SQL Server edition. During sequential scans, read-ahead prefetches multiple extents from storage at a time so that data is in cache by the time it is actually needed by the query. This greatly reduces the time needed for large scans because fewer IOPS are required and sequential access reduces costly seek time against spinning media. Furthermore, Enterprise and Developer editions perform read-ahead reads more aggressively than lesser editions, up to 4MB (500 pages) in a single scatter-gather IO in later SQL Server versions.

The implication with cold cache performance testing is that both full extent reads and read-ahead prefetches are much more likely to occur such that test timings of different execution plans are not fairly comparable. These timings will over emphasize hardware (storage) performance rather than query performance as intended. Given hardware differences on a test system and that cold cache is not the typical production state, cold cache testing isn’t a good indicator of query performance and resource usage one will experience in a production system.

I recommend using logical reads as a primary performance measure when evaluating query and index tuning candidates. Logical reads is a count of the number of pages touched by the query regardless of whether data was read from storage or already cached, making it a better comparison indicator of data access resource utilization. The number of logical reads can be determined by running the query or procedure with SET STATISTICS IO ON and will be consistent regardless of whether physical IO was needed or not. Query times may be used as a secondary measure by running the query more than once, discarding the results of first run, and taking the average of subsequent executions. This is not to say these logical read measurements and timings will predict actual production performance but will allow one to more accurately evaluate resource usage of different execution plans.

SQL Server System Table Statistics Update

I’ve seen a few questions in the SQL Server forums recently reporting slow performance of DMV queries, such as those in queries executed by SQL Server Data Tools. This can result in query timeouts and is particularly an issue with databases that contain many objects and/or columns. Like many query performance problems, the root cause may be stale statistics. The last statistics update date on system table indexes can be gleaned by this query.

--identify last stats update date
SELECT
       QUOTENAME(OBJECT_SCHEMA_NAME(i.object_id)) + N'.' + QUOTENAME(OBJECT_NAME(i.object_id)) + N'.' + QUOTENAME(name) AS index_name
     , STATS_DATE(i.object_id, i.index_id) AS stats_date
FROM sys.indexes AS I
JOIN sys.partitions AS p ON
     p.object_id = i.object_id
     AND p.index_id = i.index_id
WHERE 
     OBJECTPROPERTYEX(i.object_id, 'IsSystemTable') = 1
     AND i.index_id > 0
     AND p.rows > 0;

One can often simply update stats on the underlying system tables to improve catalog view query performance. Below is a script for SQL Server 2017+ and Azure SQL Database that generates and executes DDL to update stats on all non-empty system tables.

--system table stats update for SQL 2017 and Azure SQL Database
DECLARE @SQL nvarchar(MAX) =
(
     SELECT
         STRING_AGG(
               N'UPDATE STATISTICS ' 
             + QUOTENAME(OBJECT_SCHEMA_NAME(i.object_id))
             + N'.'
             + QUOTENAME(OBJECT_NAME(i.object_id))
         ,';')
FROM sys.indexes AS I
JOIN sys.partitions AS p ON
     p.object_id = i.object_id
     AND p.index_id = i.index_id
WHERE 
     OBJECTPROPERTYEX(i.object_id, 'IsSystemTable') = 1
     AND i.index_id > 0
     AND p.rows > 0
);
EXEC sp_executesql @SQL;
GO

Similarly, the script below will update system table stats for SQL Server 2016 and earlier versions. The only difference is the XML path technique for aggregate string concatenation because STRING_AGG() isn’t available in older SQL Server versions.

--generate script SQL 2016 and earlier using XML PATH
DECLARE @SQL nvarchar(MAX) =
(
     SELECT 
       N'UPDATE STATISTICS ' 
     + QUOTENAME(OBJECT_SCHEMA_NAME(i.object_id)) 
     + N'.' 
     + QUOTENAME(OBJECT_NAME(i.object_id)) 
     + N';'
FROM sys.indexes AS I
JOIN sys.partitions AS p ON
     p.object_id = i.object_id
AND p.index_id = i.index_id
WHERE
     OBJECTPROPERTYEX(i.object_id, 'IsSystemTable') = 1
     AND i.index_id > 0
     AND p.rows > 0
FOR XML PATH(''), TYPE).value('.','nvarchar(MAX)'
);
EXEC sp_executesql @SQL;
GO

On a side note, I’ll mention the “SELECT @SQL = @SQL + …” technique is sometimes used for set-based aggregate string concatenation instead of XML PATH or STRING_AGG(). However, that should be avoided because it is nether supported nor reliable. A quote from Microsoft in response to a bug report on the now retired Connect feedback site: “Even without ORDER BY, we do not guarantee that @var = @var + will produce the concatenated value for any statement that affects multiple rows. The right-hand side of the expression can be evaluated either once or multiple times during query execution and the behavior as I said is plan dependent.

The Curious Case of Undetected SQL Exceptions

Undetected database errors are insidious. It can be really bad when an error gets dropped on the floor, resulting in incomplete or wrong results. Consider that this simple SELECT query returns an empty result set instead of raising a SqlException for the divide by zero error:

string sqlBatchText = @"
 BEGIN TRY

  SELECT 1/0 AS DivideByZeroError;

 END TRY
 BEGIN CATCH
  THROW;
 END CATCH;
";
int rowCount = 0;
try
{
 using (SqlConnection connection = new SqlConnection(@"Data Source=.;Initial Catalog=tempdb;Integrated Security=SSPI"))
 using (SqlCommand command = new SqlCommand(sqlBatchText, connection))
 {
  connection.Open();
  using (SqlDataReader reader = command.ExecuteReader())
  {
   while (reader.Read()) { ++rowCount; };
  }
  Console.WriteLine("{0} rows read", rowCount);
 }
}
catch
{
 throw;
}

One generally assumes SQL errors raised during batch execution will also raise a SQL exception in the application. However, there are cases involving multi-statement batches and stored procedures where errors that occur might not be raised in the client application, as the above example shows. These scenarios can be distilled as:

1) An error is caught by T-SQL TRY/CATCH while executing a row-returning statement.
2) An error occurs after a row-returning statement successfully executes.

To ensure SQL exceptions are raised as expected, one must either code T-SQL to avoid these scenarios entirely or ensure the client application data access layer consumes all subsequent results returned by SQL Server even when only a single result set is expected. Row-returning statements include SELECT (not variable assignment), OUTPUT clause in an INSERT/UPDATE/DELETE/MERGE statement that returns rows to the client, as well as some specialized commands like RESTORE FILELISTONLY, DBCC commands with TABLE_RESULTS, etc.

Below is a C# ADO.NET example of this defensive programming technique. Even though a single result set is expected, the code still invokes NextResult afterwards to process the entire result stream and ensure SQL exceptions are raised in the app when errors occur.

string sqlBatchText = @"
    BEGIN TRY
        SELECT 1/0 AS DivideByZeroError;
    END TRY
    BEGIN CATCH
        THROW;
    END CATCH;
";
int rowCount = 0;
try
{
    using (SqlConnection connection = new SqlConnection(@"Data Source=.;Initial Catalog=tempdb;Integrated Security=SSPI"))
    using (SqlCommand command = new SqlCommand(sqlBatchText, connection))
    {
        connection.Open();
        using (SqlDataReader reader = command.ExecuteReader())
        {
            while (reader.Read()) { ++rowCount; };
            //get any subsequent errors returned
            while (reader.NextResult()) { };
        }
        Console.WriteLine("{0} rows read", rowCount);
    }
}
catch
{
    throw;
}

This consideration applies to all SQL Server API and programming languages although I focus on C# and ADO.NET with System.Data.SqlClient (.NET Framework Data Provider for SQL Server) in this article. The specific methods for consuming all results will vary depending on the API (e.g. getMoreResults() in JDBC) but the basic concept is the same.

Regardless of the method one uses to execute SQL Server queries, ADO.NET uses a data reader to return command results even when higher-level objects (e.g. Dataset) or ORMs (e.g. Entity Framework) are used. The low-level ADO.NET command ExecuteReader method exposes the data reader whereas ExecuteScalar and ExecuteNonQuery do not expose the internal reader.

ExecuteScalar returns the first column of the first row returned as a scalar value but doesn’t call NextResult on the internal data reader to retrieve subsequent results. Consequently, errors may go undetected with ExecuteScalar. ExecuteScalar will not raise an exception if a T-SQL error occurs after the first row is returned. Also, if no rows are returned because the row-returning statement erred and the error was caught in T-SQL, ExecuteScalar returns a null object without raising an exception.

ExecuteNonQuery executes the entire batch of statements and returns the accumulated count of affected rows as a scalar value, discarding rows returned (if any). The returned value will be -1 if SET NOCOUNT ON is specified. Because ExecuteNonQuery internaly consumes all results in the process, errors will be raised without additional ADO.NET programming, albeit one doesn’t typically use ExecuteNonQuery to execute a batch that returns rows. Again, the ADO.NET error detection issue only applies to row-returning statements.

The remainder of this article discusses T-SQL error handling and ADO.NET defensive programming techniques in more detail and discusses techniques to avoid undetected database errors in ADO.NET.

T-SQL Error Handling Objectives
T-SQL and ADO.NET data access code must work in concert with one another to ensure SQL errors are detected in application code. The T-SQL constructs used in multi-statement batches can affect if and how when errors are reported by ADO.NET during batch execution. I’ll start by citing core T-SQL error handling objectives, which can be summarized as:

1) Ensure a multi-statement T-SQL batch doesn’t continue after an error occurs.
2) Rollback transaction after errors.
3) Raise error so that the client application is aware a problem occurred.

The T-SQL building blocks used to achieve these objectives are:
1) SET XACT_ABORT ON
2) Structured error handling (SEH) (a.k.a. TRY/CATCH)
3) Control-of-flow (e.g. IF @@ERROR GOTO ErrorHandler)

T-SQL Behavior Without SEH and XACT_ABORT ON
When a runtime error occurs with the XACT_ABORT session setting ON outside a TRY block, SQL Server will stop batch execution immediately, rollback the transaction (if any), and raise the error. Consequently, a single SET XACT_ABORT ON statement will meet all aforementioned error handling objectives without T-SQL procedural code. However, the XACT_ABORT setting is not considered when user-defined errors are raised with RAISERROR so control-of-flow statements are required to meet objectives #2 and #3 when RAISERROR is employed.

SET XACT_ABORT ON also rolls back open transactions following an attention event like an explicit cancel or query timeout, which would otherwise leave the transaction open. This is one reason why I strongly recommend using SET XACT_ABORT ON, especially in procs that include BEGIN TRAN, regardless of whether or not SEH is also used.

T-SQL Behavior Without SEH and XACT_ABORT OFF
When an error occurs with the SET XACT_ABORT session setting OFF and SEH is not used, SQL Server will raise the error immediately but, depending on the error and severity, batch execution might continue and the transaction not rolled back. The T-SQL batch must use control-of-flow statements after each statement to avoid continuing after errors and roll back the transaction (objectives #1 and #2).

T-SQL Behavior With T-SQL Structured Error Handling
When an error occurs during statement execution with a T-SQL structured error handler is in scope, the CATCH block error handler is entered, after marking the transaction uncommittable if SET XACT_ABORT is ON. SEH meets the first error handling objective by skipping subsequent statements in the TRY block after an error. It is the responsibility of the error handling code in the CATCH BLOCK to roll back the transaction if needed and raise the error. The simple T-SQL handler below achieves objectives #2 and #3 in Azure SQL Database and SQL Server 2012 and later:

BEGIN CATCH
 IF @@TRANCOUNT > 0 ROLLBACK;
 THROW;
END;

THROW is not available In SQL 2008 R2 and earlier so one must use RAISERROR instead in older versions. The error handler below provides similar functionality a THROW, although RAISERROR obfuscates the original error as a user error with message number 50000+.

BEGIN CATCH
 DECLARE
       @ErrorNumber int = ERROR_NUMBER()
     , @ErrorMessage nvarchar(2048) = ERROR_MESSAGE()
     , @ErrorSeverity int = ERROR_SEVERITY()
     , @ErrorState int = ERROR_STATE()
     , @ErrorLine int = ERROR_LINE();
 RAISERROR('Error %d caught at line %d: %s'
    , @ErrorSeverity
    , @ErrorState
    , @ErrorNumber
    , @ErrorLine
    , @ErrorMessage);
END CATCH;

Using T-SQL SEH when result sets are returned has implications on ADO.NET data access code to ensure database exceptions are raised after SQL errors. As illustrated with the ExecuteReader example at the beginning of this article, when an error is caught in T-SQL during a row-returning statement, ADO.NET will not raise the database exception immediately and instead return the partial or empty result set generated by the failed statement. The next data reader Read method call will return false after the error. The error raised in the CATCH block by the T-SQL THROW or RAISERROR statement is considered a separate result by ADO.NET and will not be raised as a database exception until NextResult is called.

Unless you have a specific reason to use SEH in row-returning batches, I suggest instead using SET XACT_ABORT ON alone as this will address core T-SQL error handling objectives and allow ADO.NET to detect errors without calling NextResult. However, as mentioned earlier, ADO.NET code will still need to call NextResult if the row-returning statement is not the last statement in the batch.

Summary
I hope this information will help you ensure database errors in multi-statement batches are detected. The interaction between T-SQL and ADO.NET isn’t as intuitive as it could be.