Microsoft SQL Operations Studio Preview

Microsoft made the new cross-platform SQL Operations Studio (SOS) tool available on Github this week as a free open-source project. This SOS preview allows one to develop and manage SQL Server and Azure SQL Database from Windows, Linux, and macOS. The current preview can be downloaded from the SOS portal page, which also contains links to impressive quick start guides, how-to, and tutorials. I encourage you to try out the preview and improve it by reporting issues and offering suggestions.

If you are a developer, consider contributing to this project on Github. SOS is built on the Electron framework, which leverages JavaScript, HTML, and Node.js technologies to build rich cross-platform desktop applications. This is the same stack that the popular VS Code IDE employs so it’s not surprising SOS has a similar look and feel.

SOS is yet another indicator of a significant culture shift at Microsoft. If you asked me just a few years ago, I would have said SQL Server would run Linux when pigs fly. Nowadays, SQL Server on Linux is reality. Microsoft now embraces open-source and cross-platform technologies as part of the eco system and welcomes community contributions to the tooling that makes jobs easier for both developers and DBAs.

The release of SOS does not mean to suggest that SSMS (also free but not open-source) is deprecated. The SOS FAQ specifically calls out that “investments in flagship Windows tools (SSMS, SSDT, PowerShell) will continue in addition to the next generation of multi-OS and multi-DB CLI and GUI tools. The goal is to offer customers the choice of using the tools they want on the platforms of their choice for their scenarios.”

Choices are good, IMHO, because there is no one-size fits all solution that will keep everyone happy. I’m also glad the see multi-DB mentioned as a next generation tool direction because, like many data folks, I work with DBMS products in addition to SQL Server and Azure SQL Database. I don’t really expect a single tool to fulfill all my needs but the less I need to jump between tools for common tasks, the better.

What I like most about SOS is its easy extensibility. Dashboard and insight widgets for server and database views are easily built and customized for one’s particular needs, allowing you to automatically run favorite DMV queries and show results in graph or tabular form. Code snippets are very easy to create and use.

Visit the SOS portal page to see the power of SOS and try it out yourself.

Bulk Load Batch Size Considerations in SQL Server 2016

Bulk load has long been the fastest way to mass insert rows into a SQL Server table, providing orders of magnitude better performance compared to traditional INSERTs. SQL Server database engine bulk load capabilities are leveraged by T-SQL BULK INSERT, INSERT…SELECT, and MERGE statements as well as by SQL Server client APIs like ODBC, OLE DB, ADO.NET, and JDBC. SQL Server tools like BCP and components like SSIS leverage these client APIs to optimize insert performance.

SQL Server 2016 and later improves performance further by turning on bulk load context and minimal logging by default when bulk loading into SIMPLE and BULK LOGGED recovery model databases, which previously required turning on trace flags as detailed in this blog post by Parikshit Savjani of the MSSQL Tiger team. That post also includes links to other great resources that thoroughly cover minimal logging and data loading performance, which I recommend you peruse if you use bulk load often. I won’t repeat all that information here but do want to call attention to the fact that these new bulk load optimizations can result in much more unused space when a small batch size is used compared to SQL Server 2014 and older versions.

Bulk Load Batch Size Implications
An important consideration in SQL 2016 and later with bulk load context and minimal logging is that each batch allocates new extent(s) (8 contiguous 8K pages, 64K) rather than using existing unused pages in existing extents. This improves concurrency and space allocation performance but possibly with the cost of significantly higher unused space than previous versions when few rows are loaded per batch.

The optimal batch size for bulk load operations involves trade-offs between throughput, data space utilization, concurrency, and transaction log space (FULL recovery model). To avoid excessive unused space with bulk load context, adjust the batch size when possible such that data are loaded in multiples of the 64K extent size.

There are cases when one must load with small batches, such as when loading small files. One solution to mitigate unused space in this scenario is to not use TABLOCK so that bulk load context isn’t used. This isn’t much of a performance concern with small batches anyway. Another method us to turn on trace flag 692 to disable the default bulk load context in SQL Server 2016, effectively reverting to pre-SQL 2016 behavior.

Don’t confuse batch size with the rows per batch hint. The rows per batch hint is used by the SQL Server optimizer to help optimize the load process because the number of rows that will be loaded is otherwise unknown, defaulting to an estimate of 10,000 rows.

The Curious Case of Undetected SQL Exceptions

Undetected database errors are insidious. It can be really bad when an error gets dropped on the floor, resulting in incomplete or wrong results. Consider that this simple SELECT query returns an empty result set instead of raising a SqlException for the divide by zero error:

string sqlBatchText = @"
 BEGIN TRY

  SELECT 1/0 AS DivideByZeroError;

 END TRY
 BEGIN CATCH
  THROW;
 END CATCH;
";
int rowCount = 0;
try
{
 using (SqlConnection connection = new SqlConnection(@"Data Source=.;Initial Catalog=tempdb;Integrated Security=SSPI"))
 using (SqlCommand command = new SqlCommand(sqlBatchText, connection))
 {
  connection.Open();
  using (SqlDataReader reader = command.ExecuteReader())
  {
   while (reader.Read()) { ++rowCount; };
  }
  Console.WriteLine("{0} rows read", rowCount);
 }
}
catch
{
 throw;
}

One generally assumes SQL errors raised during batch execution will also raise a SQL exception in the application. However, there are cases involving multi-statement batches and stored procedures where errors that occur might not be raised in the client application, as the above example shows. These scenarios can be distilled as:

1) An error is caught by T-SQL TRY/CATCH while executing a row-returning statement.
2) An error occurs after a row-returning statement successfully executes.
3) An error occurs after a row count message is returned (depending on client API).

To ensure SQL exceptions are raised as expected, one must either code T-SQL to avoid these scenarios entirely or ensure the client application data access layer consumes all subsequent results returned by SQL Server even when only a single result set is expected. Row-returning statements include SELECT (not variable assignment), OUTPUT clause in an INSERT/UPDATE/DELETE/MERGE statement that returns rows to the client, as well as some specialized commands like RESTORE FILELISTONLY, DBCC commands with TABLE_RESULTS, etc.

Although not technically row-returning, some client APIs (ADO classic, JDBC, etc.) return row count messages (TDS protocol DONE_IN_PROC) as empty result sets with no rows or columns. This is one reason the common practice is to include SET NOCOUNT ON in stored procedures and batches to suppress row counts returned by INSERT/UPDATE/DELETE/MERGE statements unless those are needed by application code. Otherwise, the client app must consume those results before the SQL exception is raised.

Below is a C# ADO.NET example of this defensive programming technique. Even though a single result set is expected, the code still invokes NextResult afterwards to process the entire result stream and ensure SQL exceptions are raised in the app when errors occur.

string sqlBatchText = @"
    BEGIN TRY
        SELECT 1/0 AS DivideByZeroError;
    END TRY
    BEGIN CATCH
        THROW;
    END CATCH;
";
int rowCount = 0;
try
{
    using (SqlConnection connection = new SqlConnection(@"Data Source=.;Initial Catalog=tempdb;Integrated Security=SSPI"))
    using (SqlCommand command = new SqlCommand(sqlBatchText, connection))
    {
        connection.Open();
        using (SqlDataReader reader = command.ExecuteReader())
        {
            while (reader.Read()) { ++rowCount; };
            //get any subsequent errors returned
            while (reader.NextResult()) { };
        }
        Console.WriteLine("{0} rows read", rowCount);
    }
}
catch
{
    throw;
}

This consideration applies to all SQL Server API and programming languages although I focus on C# and ADO.NET with System.Data.SqlClient (.NET Framework Data Provider for SQL Server) in this article. The specific methods for consuming all results will vary depending on the API (e.g. getMoreResults() in JDBC) but the basic concept is the same.

Regardless of the method one uses to execute SQL Server queries, ADO.NET uses a data reader to return command results even when higher-level objects (e.g. Dataset) or ORMs (e.g. Entity Framework) are used. The low-level ADO.NET command ExecuteReader method exposes the data reader whereas ExecuteScalar and ExecuteNonQuery do not expose the internal reader.

ExecuteScalar returns the first column of the first row returned as a scalar value but doesn’t call NextResult on the internal data reader to retrieve subsequent results. Consequently, errors may go undetected with ExecuteScalar. ExecuteScalar will not raise an exception if a T-SQL error occurs after the first row is returned. Also, if no rows are returned because the row-returning statement erred and the error was caught in T-SQL, ExecuteScalar returns a null object without raising an exception.

ExecuteNonQuery executes the entire batch of statements and returns the accumulated count of affected rows as a scalar value, discarding rows returned (if any). The returned value will be -1 if SET NOCOUNT ON is specified. Because ExecuteNonQuery internaly consumes all results in the process, errors will be raised without additional ADO.NET programming, albeit one doesn’t typically use ExecuteNonQuery to execute a batch that returns rows. Again, the ADO.NET error detection issue only applies to row-returning statements.

The remainder of this article discusses T-SQL error handling and ADO.NET defensive programming techniques in more detail and discusses techniques to avoid undetected database errors in ADO.NET.

T-SQL Error Handling Objectives
T-SQL and ADO.NET data access code must work in concert with one another to ensure SQL errors are detected in application code. The T-SQL constructs used in multi-statement batches can affect if and how when errors are reported by ADO.NET during batch execution. I’ll start by citing core T-SQL error handling objectives, which can be summarized as:

1) Ensure a multi-statement T-SQL batch doesn’t continue after an error occurs.
2) Rollback transaction after errors.
3) Raise error so that the client application is aware a problem occurred.

The T-SQL building blocks used to achieve these objectives are:
1) SET XACT_ABORT ON
2) Structured error handling (SEH) (a.k.a. TRY/CATCH)
3) Control-of-flow (e.g. IF @@ERROR GOTO ErrorHandler)

T-SQL Behavior Without SEH and XACT_ABORT ON
When a runtime error occurs with the XACT_ABORT session setting ON outside a TRY block, SQL Server will stop batch execution immediately, rollback the transaction (if any), and raise the error. Consequently, a single SET XACT_ABORT ON statement will meet all aforementioned error handling objectives without T-SQL procedural code. However, the XACT_ABORT setting is not considered when user-defined errors are raised with RAISERROR so control-of-flow statements are required to meet objectives #2 and #3 when RAISERROR is employed.

SET XACT_ABORT ON also rolls back open transactions following an attention event like an explicit cancel or query timeout, which would otherwise leave the transaction open. This is one reason why I strongly recommend using SET XACT_ABORT ON, especially in procs that include BEGIN TRAN, regardless of whether or not SEH is also used.

T-SQL Behavior Without SEH and XACT_ABORT OFF
When an error occurs with the SET XACT_ABORT session setting OFF and SEH is not used, SQL Server will raise the error immediately but, depending on the error and severity, batch execution might continue and the transaction not rolled back. The T-SQL batch must use control-of-flow statements after each statement to avoid continuing after errors and roll back the transaction (objectives #1 and #2).

T-SQL Behavior With T-SQL Structured Error Handling
When an error occurs during statement execution with a T-SQL structured error handler is in scope, the CATCH block error handler is entered, after marking the transaction uncommittable if SET XACT_ABORT is ON. SEH meets the first error handling objective by skipping subsequent statements in the TRY block after an error. It is the responsibility of the error handling code in the CATCH BLOCK to roll back the transaction if needed and raise the error. The simple T-SQL handler below achieves objectives #2 and #3 in Azure SQL Database and SQL Server 2012 and later:

BEGIN CATCH
 IF @@TRANCOUNT > 0 ROLLBACK;
 THROW;
END;

THROW is not available In SQL 2008 R2 and earlier so one must use RAISERROR instead in older versions. The error handler below provides similar functionality a THROW, although RAISERROR obfuscates the original error as a user error with message number 50000+.

BEGIN CATCH
 DECLARE
       @ErrorNumber int = ERROR_NUMBER()
     , @ErrorMessage nvarchar(2048) = ERROR_MESSAGE()
     , @ErrorSeverity int = ERROR_SEVERITY()
     , @ErrorState int = ERROR_STATE()
     , @ErrorLine int = ERROR_LINE();
 RAISERROR('Error %d caught at line %d: %s'
    , @ErrorSeverity
    , @ErrorState
    , @ErrorNumber
    , @ErrorLine
    , @ErrorMessage);
END CATCH;

Using T-SQL SEH when result sets are returned has implications on ADO.NET data access code to ensure database exceptions are raised after SQL errors. As illustrated with the ExecuteReader example at the beginning of this article, when an error is caught in T-SQL during a row-returning statement, ADO.NET will not raise the database exception immediately and instead return the partial or empty result set generated by the failed statement. The next data reader Read method call will return false after the error. The error raised in the CATCH block by the T-SQL THROW or RAISERROR statement is considered a separate result by ADO.NET and will not be raised as a database exception until NextResult is called.

Unless you have a specific reason to use SEH in row-returning batches, I suggest instead using SET XACT_ABORT ON alone as this will address core T-SQL error handling objectives and allow ADO.NET to detect errors without calling NextResult. However, as mentioned earlier, ADO.NET code will still need to call NextResult if the row-returning statement is not the last statement in the batch.

Summary
I hope this information will help you ensure database errors in multi-statement batches are detected. The interaction between T-SQL and ADO.NET isn’t as intuitive as it could be.