Bulk Load Batch Size Considerations in SQL Server 2016

Bulk load has long been the fastest way to mass insert rows into a SQL Server table, providing orders of magnitude better performance compared to traditional INSERTs. SQL Server database engine bulk load capabilities are leveraged by T-SQL BULK INSERT, INSERT…SELECT, and MERGE statements as well as by SQL Server client APIs like ODBC, OLE DB, ADO.NET, and JDBC. SQL Server tools like BCP and components like SSIS leverage these client APIs to optimize insert performance.

SQL Server 2016 and later improves performance further by turning on bulk load context and minimal logging by default when bulk loading into SIMPLE and BULK LOGGED recovery model databases, which previously required turning on trace flags as detailed in this blog post by Parikshit Savjani of the MSSQL Tiger team. That post also includes links to other great resources that thoroughly cover minimal logging and data loading performance, which I recommend you peruse if you use bulk load often. I won’t repeat all that information here but do want to call attention to the fact that these new bulk load optimizations can result in much more unused space when a small batch size is used compared to SQL Server 2014 and older versions.

Bulk Load Batch Size Implications
An important consideration in SQL 2016 and later with bulk load context and minimal logging is that each batch allocates new extent(s) (8 contiguous 8K pages, 64K) rather than using existing unused pages in existing extents. This improves concurrency and space allocation performance but possibly with the cost of significantly higher unused space than previous versions when few rows are loaded per batch.

The optimal batch size for bulk load operations involves trade-offs between throughput, data space utilization, concurrency, and transaction log space (FULL recovery model). To avoid excessive unused space with bulk load context, adjust the batch size when possible such that data are loaded in multiples of the 64K extent size.

There are cases when one must load with small batches, such as when loading small files. One solution to mitigate unused space in this scenario is to not use TABLOCK so that bulk load context isn’t used. This isn’t much of a performance concern with small batches anyway. Another method us to turn on trace flag 692 to disable the default bulk load context in SQL Server 2016, effectively reverting to pre-SQL 2016 behavior.

Don’t confuse batch size with the rows per batch hint. The rows per batch hint is used by the SQL Server optimizer to help optimize the load process because the number of rows that will be loaded is otherwise unknown, defaulting to an estimate of 10,000 rows.

Microsoft SQL Operations Studio Preview

Microsoft made the new cross-platform SQL Operations Studio (SOS) tool available on Github this week as a free open-source project. This SOS preview allows one to develop and manage SQL Server and Azure SQL Database from Windows, Linux, and macOS. The current preview can be downloaded from the SOS portal page, which also contains links to impressive quick start guides, how-to, and tutorials. I encourage you to try out the preview and improve it by reporting issues and offering suggestions.

If you are a developer, consider contributing to this project on Github. SOS is built on the Electron framework, which leverages JavaScript, HTML, and Node.js technologies to build rich cross-platform desktop applications. This is the same stack that the popular VS Code IDE employs so it’s not surprising SOS has a similar look and feel.

SOS is yet another indicator of a significant culture shift at Microsoft. If you asked me just a few years ago, I would have said SQL Server would run Linux when pigs fly. Nowadays, SQL Server on Linux is reality. Microsoft now embraces open-source and cross-platform technologies as part of the eco system and welcomes community contributions to the tooling that makes jobs easier for both developers and DBAs.

The release of SOS does not mean to suggest that SSMS (also free but not open-source) is deprecated. The SOS FAQ specifically calls out that “investments in flagship Windows tools (SSMS, SSDT, PowerShell) will continue in addition to the next generation of multi-OS and multi-DB CLI and GUI tools. The goal is to offer customers the choice of using the tools they want on the platforms of their choice for their scenarios.”

Choices are good, IMHO, because there is no one-size fits all solution that will keep everyone happy. I’m also glad the see multi-DB mentioned as a next generation tool direction because, like many data folks, I work with DBMS products in addition to SQL Server and Azure SQL Database. I don’t really expect a single tool to fulfill all my needs but the less I need to jump between tools for common tasks, the better.

What I like most about SOS is its easy extensibility. Dashboard and insight widgets for server and database views are easily built and customized for one’s particular needs, allowing you to automatically run favorite DMV queries and show results in graph or tabular form. Code snippets are very easy to create and use.

Visit the SOS portal page to see the power of SOS and try it out yourself.