Performance testing with DBCC DROPCLEANBUFFERS

DBCC DROPCLEANBUFFERS is a common practice when unit testing SQL Server performance on an isolated test instance. This allows one to evaluate different candidates for query, stored procedure, and index tuning based on execution times in a worst-case cold buffer cache scenario and provides better test repeatability by leveling the playing field before each test. However, clearing cache in this way has considerations one should be aware of.

An important detail sometimes overlooked is that one must first execute a CHECKPOINT command in the context of the database(s) to be tested before executing DBCC DROPCLEANBUFFERS. DBCC DROPCLEANBUFFERS only frees pages that are not dirty (cached version same as on disk version) so modified pages will remain in cache when CHECKPOINT isn’t first executed. Overlooking the CHECKPOINT can result in non-repeatable test timings. One should always run CHECKPOINT before DBCC DROPCLEANBUFFERS.

One can make the argument that DBCC DROPCLEANBUFFERS might not be particularly valuable for testing. First, the storage engine in SQL Server Enterprise Edition (or Developer Edition, which is often used when testing) behaves differently with a cold cache versus a warm one. With a warm cache, a page not already in cache (e.g. index seek by primary key) will be fetched from disk using a single 8K page IO request as one expects. However, when the cache isn’t fully warmed up (Buffer Manager’s Target Pages not yet met), the entire 64K extent (8 contiguous 8K pages) is read for the single page request regardless of whether the adjacent pages are actually needed by the query. This has the benefit of warming the cache much more quickly than would otherwise occur, but given that the normal steady state of a production SQL Server is a warm cache, testing with a cold cache isn’t a fair comparison of different plans. More data than normal will be transferred from storage so timings may not be indicative of actual performance.

The storage engine also behaves differently during scans when data are not already cached regardless of the SQL Server edition. During sequential scans, read-ahead prefetches multiple extents from storage at a time so that data is in cache by the time it is actually needed by the query. This greatly reduces the time needed for large scans because fewer IOPS are required and sequential access reduces costly seek time against spinning media. Furthermore, Enterprise and Developer editions perform read-ahead reads more aggressively than lesser editions, up to 4MB (500 pages) in a single scatter-gather IO in later SQL Server versions.

The implication with cold cache performance testing is that both full extent reads and read-ahead prefetches are much more likely to occur such that test timings of different execution plans are not fairly comparable. These timings will over emphasize hardware (storage) performance rather than query performance as intended. Given hardware differences on a test system and that cold cache is not the typical production state, cold cache testing isn’t a good indicator of query performance and resource usage one will experience in a production system.

I recommend using logical reads as a primary performance measure when evaluating query and index tuning candidates. Logical reads is a count of the number of pages touched by the query regardless of whether data was read from storage or already cached, making it a better comparison indicator of data access resource utilization. The number of logical reads can be determined by running the query or procedure with SET STATISTICS IO ON and will be consistent regardless of whether physical IO was needed or not. Query times may be used as a secondary measure by running the query more than once, discarding the results of first run, and taking the average of subsequent executions. This is not to say these logical read measurements and timings will predict actual production performance but will allow one to more accurately evaluate resource usage of different execution plans.

Microsoft SQL Operations Studio Preview

Microsoft made the new cross-platform SQL Operations Studio (SOS) tool available on Github this week as a free open-source project. This SOS preview allows one to develop and manage SQL Server and Azure SQL Database from Windows, Linux, and macOS. The current preview can be downloaded from the SOS portal page, which also contains links to impressive quick start guides, how-to, and tutorials. I encourage you to try out the preview and improve it by reporting issues and offering suggestions.

If you are a developer, consider contributing to this project on Github. SOS is built on the Electron framework, which leverages JavaScript, HTML, and Node.js technologies to build rich cross-platform desktop applications. This is the same stack that the popular VS Code IDE employs so it’s not surprising SOS has a similar look and feel.

SOS is yet another indicator of a significant culture shift at Microsoft. If you asked me just a few years ago, I would have said SQL Server would run Linux when pigs fly. Nowadays, SQL Server on Linux is reality. Microsoft now embraces open-source and cross-platform technologies as part of the eco system and welcomes community contributions to the tooling that makes jobs easier for both developers and DBAs.

The release of SOS does not mean to suggest that SSMS (also free but not open-source) is deprecated. The SOS FAQ specifically calls out that “investments in flagship Windows tools (SSMS, SSDT, PowerShell) will continue in addition to the next generation of multi-OS and multi-DB CLI and GUI tools. The goal is to offer customers the choice of using the tools they want on the platforms of their choice for their scenarios.”

Choices are good, IMHO, because there is no one-size fits all solution that will keep everyone happy. I’m also glad the see multi-DB mentioned as a next generation tool direction because, like many data folks, I work with DBMS products in addition to SQL Server and Azure SQL Database. I don’t really expect a single tool to fulfill all my needs but the less I need to jump between tools for common tasks, the better.

What I like most about SOS is its easy extensibility. Dashboard and insight widgets for server and database views are easily built and customized for one’s particular needs, allowing you to automatically run favorite DMV queries and show results in graph or tabular form. Code snippets are very easy to create and use.

Visit the SOS portal page to see the power of SOS and try it out yourself.

Bulk Load Batch Size Considerations in SQL Server 2016

Bulk load has long been the fastest way to mass insert rows into a SQL Server table, providing orders of magnitude better performance compared to traditional INSERTs. SQL Server database engine bulk load capabilities are leveraged by T-SQL BULK INSERT, INSERT…SELECT, and MERGE statements as well as by SQL Server client APIs like ODBC, OLE DB, ADO.NET, and JDBC. SQL Server tools like BCP and components like SSIS leverage these client APIs to optimize insert performance.

SQL Server 2016 and later improves performance further by turning on bulk load context and minimal logging by default when bulk loading into SIMPLE and BULK LOGGED recovery model databases, which previously required turning on trace flags as detailed in this blog post by Parikshit Savjani of the MSSQL Tiger team. That post also includes links to other great resources that thoroughly cover minimal logging and data loading performance, which I recommend you peruse if you use bulk load often. I won’t repeat all that information here but do want to call attention to the fact that these new bulk load optimizations can result in much more unused space when a small batch size is used compared to SQL Server 2014 and older versions.

Bulk Load Batch Size Implications
An important consideration in SQL 2016 and later with bulk load context and minimal logging is that each batch allocates new extent(s) (8 contiguous 8K pages, 64K) rather than using existing unused pages in existing extents. This improves concurrency and space allocation performance but possibly with the cost of significantly higher unused space than previous versions when few rows are loaded per batch.

The optimal batch size for bulk load operations involves trade-offs between throughput, data space utilization, concurrency, and transaction log space (FULL recovery model). To avoid excessive unused space with bulk load context, adjust the batch size when possible such that data are loaded in multiples of the 64K extent size.

There are cases when one must load with small batches, such as when loading small files. One solution to mitigate unused space in this scenario is to not use TABLOCK so that bulk load context isn’t used. This isn’t much of a performance concern with small batches anyway. Another method us to turn on trace flag 692 to disable the default bulk load context in SQL Server 2016, effectively reverting to pre-SQL 2016 behavior.

Don’t confuse batch size with the rows per batch hint. The rows per batch hint is used by the SQL Server optimizer to help optimize the load process because the number of rows that will be loaded is otherwise unknown, defaulting to an estimate of 10,000 rows.

SQL Server 2016 SP1 Standard Edition Enhancements

I seldom get excited about service packs but the changes released with SQL Server 2016 SP1 are the most significant I’ve seen in a SQL Server service pack in 20+ years. Microsoft announced this week at the Microsoft Connect(); developer’s conference that SQL Server 2016 SP1, which is available for download immediately, allows features previously available only in Enterprise/Developer Editions to be used in lessor Standard, Web, Express, and LocalDB Editions too. Features like table partitioning, In-Memory OLTP, and columnstore are now options for developers and DBAs using SQL Server Standard Edition and even the free Express Edition in production. See SQL Server 2016 Service Pack 1 (SP1) released !!! for the complete matrix of programmability features by edition along with other cool SP1 information.

The implications are huge now that SQL Server has the same programmability surface area among editions. The choice of the production edition can be made independently based on operational needs rather than programmability features. Developers can use a free edition (i.e. LocalDB, Express or Developer) without fear a feature won’t be available in production as long as prod is running SQL Server 2016 SP1 or greater. DBAs can now choose the appropriate edition for production based on other considerations like advanced high availability, TDE, Auditing as well as performance features like higher supported memory, more number of cores, and advanced scanning. This separation of concerns avoids the need to lock in the production edition early in the application lifecycle, making development easier and production implementation more flexible.

Real World Use Case Scenario
I work with an ISV with hundreds of customers running a mix of Standard and Enterprise Edition. Their needs vary widely and SQL Server Enterprise Edition is not an option for some due to budget constraints. Some tables are often quite large so partitioning is required for manageability and, for their reporting workload, partitioning also improves performance of large scans due to partition elimination. The ugliness though, is that table partitioning (and/or columnstore) is the right tool for the job but was not an option for customers on Standard Edition.

The ISV initially compromised and used view partitioning instead of table partitioning so that the same code would run regardless of edition. Although that provided the expected manageability benefits, there were some downsides. Compilation time increased significantly as the number of partitioned view member tables increased as did the query plan complexity. This sometimes resulted in poor query plans against very large tables and especially impacted larger and most valued customers, most of which were running Enterprise Edition.

To address the problem before SQL Server 2016 SP1, the ISV added conditional code to the application so that either view or table partitioning could be used depending on the SQL Server edition. This wasn’t ideal as it added code complexity and doubled the number of QA test cases for application features that performed partition maintenance. However, since the resultant benefits for their larger customers on Enterprise Edition were quite significant; the additional costs of development and testing were well-justified.

Now that table partitioning is available in SQL Server 2016 SP1 Standard Edition, they plan to require SQL Server 2016 SP1 (or later) going forward, use table partitioning unconditionally, and perhaps introduce usage of other features like columnstore that were previously Enterprise only. Not only will this simplify the code base and test cases, customers on Standard Edition will be happier with their experience and can upgrade to Enterprise if they so choose without reinstalling or reconfiguring the application. It will of course take some time before all their customers upgrade to the latest product version and SQL 2016 SP1+ but the future is much brighter now.

Perform Due Diligence
If you are new to features previously available only in Enterprise Edition, I suggest you perform due diligence before using these features. Memory-optimized features like columnstore and In-Memory OLTP require additional physical memory and insufficient memory with memory-optimized features will be a production show-stopper. Make sure your hardware is sized appropriately regardless of edition and, in the case of editions other than Enterprise or Developer, memory requirements don’t exceed the maximum capacity limits for that edition. Although very powerful, In-Memory OLTP is a fundamentally different paradigm that you might be accustomed to regarding transactional behavior and isolation levels. Be sure you fully understand these features before using it in development or production.

I hope these changes are enough motivation for you to consider upgrading to SQL Server 2016 SP1, especially if you are running Standard Edition or are currently on an older SQL Server version. Together with the fact that SQL Server 2016 just runs faster, the time and effort spend in upgrading is a solid investment that will pay dividends regardless of edition.

SQL Server 2016 and Azure SQL Database V12 Breaking Change

This post is to get the word out about a breaking change to datetime conversion and comparison behavior in SQL Server 2016 and Azure SQL Database V12. This change hasn’t been documented as of this writing in the Breaking Changes to Database Engine Features in SQL Server 2016 topic in the SQL Server Books Online.

In short, conversion from datetime to a higher precision temporal data type (datetime2, datetimeoffset, or time) may yield a different, but more accurate, time value than in prior versions. Also, predicates involving datetime consider the full precision of raw datetime internal value instead of the time value rounded to the nearest millisecond. These changes in conversion and comparison behavior may affect existing applications and are not intuitive unless one understands the underlying datetime data type implementation.

You may be aware that the accuracy of datetime is limited to 1/300 of a second. This is because values are internally an 8-byte structure consisting of 2 separate 32-bit integers, one with the number of day units since 1900-01-01 and the other with the number of 1/300 second interval units since midnight. The 1/300 second unit interval limits the time accuracy to 3.33333… milliseconds and the milliseconds value will be a repeating decimal when time interval units are not evenly divisible by 3. The raw decimal value is rounded to a scale of 3 in accordance with the fixed datetime precision of 3, resulting in a millisecond value of 0, 3, or 7 for all datetime values.

Pre-SQL Server 2016 Behavior
Before SQL Server 2016, conversion from datetime to another temporal type used the source datetime value after it was rounded to the nearest millisecond, which truncated repeating decimal fractional milliseconds. The rounded value was then rounded again according to the target type precision. When the target type precision was greater than 3, the time was extended to the target type precision with insignificant trailing zeros, resulting in zero for the sub-millisecond value.

Also, when datetime was compared to another temporal type, the rounded value was used. This script shows the result of the equality predicate is true after the datetime value is converted to datetime2.

SQL Server 2016 Behavior Change
SQL Server 2016 and Azure SQL Database V12 use the raw datetime internal value without rounding during conversion to another temporal type. The value is rounded only once during conversion, to the target type precision. The end result will be the same as before SQL Server 2016 when the target type precision is 3 or less. However, the converted value will be different when the target type precision is greater than 3 and the internal time unit interval is not evenly divisible by 3 (i.e. rounded source datetime millisecond value is 3 or 7). Note the non-zero microseconds and nanoseconds in the script results below and that rounding is based on the target type precision rather than the source.

This behavior change provides a more accurate converted value but may break applications that expect the converted value to be the same as the rounded datetime value as was the case before SQL Server 2016.

Be aware than the full raw datetime precision (instead of the rounded value) is also used when evaluating predicates involving a datetime type. The full precision of both arguments are used, resulting in the equality compare predicate to evaluate to false in both scripts below. The greater than predicate is true in the first script and the less than predicate is true in the second:

To provide insight into why the comparisons result in greater than and less than respectively, the script below shows the nanoseconds value of the compared data types:

The datetime2 type is accurate only to 100 nanosecond whereas datetime includes values to the nanosecond (and beyond) because the theoretical precision of repeating decimal values is unlimited. The implication is that a datetime type with a repeating decimal value will never compare equally with any temporal type except datetime.

Datetime conversion and comparison behavior is controlled by the database compatibility level. Databases in SQL Server 2016 level (130) use the new behavior and the legacy behavior is used with other levels.

These datetime behavior changes have the benefit of improved accuracy and performance of datetime conversion/comparison. Affected applications can use a pre-SQL Server 2016 database compatibility level until they can be remediated.

I recommend one avoid comparing datetime directly to other temporal types going forward. Instead convert the datetime value to the type being compared and use the converted value in the predicate. It’s generally best practice to match data types whenever possible for performance and to avoid ambiguity.

SQL Server TVP Performance Gotchas

Table-valued parameters have important considerations that developers and DBAs need to be aware of. It is essential that application code specify the proper data type and length for TVP columns in order to achieve optimal performance and reduce unnecessary overhead. Additionally, a trace (Extended Events, server-side SQL Trace, Profiler) that captures RPC events of an inappropriately defined TVP can not only exacerbate performance issues, but affect stability of the SQL Server instance in some cases.

The graph below summarizes the impact the application code max column length specification can have on performance, without and with a trace running. All tests used the same table type of 10 varchar(50) columns and a 10,000 row TVP rows passed via a DataTable object of 10 string columns. The only variables were the max column length specified by the app code and a trace running on the database server. The client application was run on a different machine than the database server and elapsed time measured by the application.

TVP Performance Comparison
Figure 1: Impact of TVP maximum column length specification with and without tracing

The average duration was 113ms without a trace running when the app code used the default -1 max column length. However, when max column length 50 was specified (matching the varchar(50) column of the table type), the average duration dropped significantly to 75ms. The trivial code change of specifying the string column max length of 50 improved performance by 33%.

I then ran the same pair of tests while a trace captured the RPC completed events. The average duration of the default max length test increased from 113ms to 9.324ms with the trace running, an over 80x degradation in performance! The test with the explicit 50 character max length was not nearly impacted as much, increasing from 75ms to 89ms (which is tolerable, IMHO).

TVP Internals
Under the hood, TVP data are passed to SQL Server over the Tabular Data Stream (TDS) protocol. The client API sends TVP column meta-data to SQL Server describing the data type, length, and other meta-data for each TVP column followed by data rows with each column in native format matching the preceding data type specification. SQL Server uses the provided column meta-data to prepare and fill buffers for efficient processing on the server side, leveraging native types to eliminate parsing overhead similarly to other parameterized queries.

Before a query or stored proc with a TVP starts executing, SQL Server creates a table in tempdb with the same schema as the parameter table type and uses bulk insert internally to efficiently load the table with TVP rows streamed by the client application. The size of a TVP is constrained only by available tempdb storage. SQL Server executes the query/proc after the TVP temp table is loaded and the parameterized T-SQL query/proc can then use the TVP data.

The TVP columns provided by the client application do not have to match the schema of the target table type; SQL Server implicitly converts TVP values to match the target table type column when data types differ. Although not optimal, implicit conversion is generally not a major factor in overall TVP performance.

The application-specified TVP max column length can impact performance significantly, and in some cases dramatically, as illustrated by the performance tests shown earlier. SQL Server prepares to receive TVP data up to the max length specified by the client application rather than the defined size of the target table type column. When the specified max length of variable length columns exceed the 8000 byte tipping point, SQL Server uses a different code path to allow for large object (LOB) values up to 2GB. Unless the table type actually contains LOB values (varchar(MAX), nvarchar(MAX)), database server resources are wasted unnecessarily when an inappropriate max column length is specified.

LOB values are especially problematic when a trace captures the RPC completed event of a TVP query. Tracing uses memory from the OBJECTSTORE_LBSS memory pool to build trace records that contain TVP LOB values. From my observations of the sys.dm_os_memory_clerks DMV, each LOB cell of a TVP requires about 8K during tracing regardless of the actual value length. This memory adds up very quickly when many rows and lob columns are passed via a TVP with a trace running. For example, the 10,000 row TVP with 10 LOB columns used in the earlier test required over 800MB memory for a single trace record. Consider that a large number of TVP LOB cells and/or concurrent TVP queries can cause queries to fail with insufficient memory errors. In extreme cases, the entire instance can become unstable and even crash under due to tracing of TVP queries.

Specifying Proper TVP Parameter Column Meta-Data
A SQL Server development best practice has long been to use strongly-typed parameters with attention to detail regarding the parameter data type and length such that it is consistent with the types on the server. This practice improves performance by avoiding implicit data type conversions, promotes sargable expressions, avoids unnecessary procedure cache bloat, and inherently validates data for proper typing on the client before it is sent to the database server. With scalar parameters, one need only specify the correct SqlDbType along with the proper length, precision/scale (avoiding AddWithValue method to add parameters) and all is well in the world.

TVP parameters require additional column meta-data not applicable to scalar parameters. The parameter data type of a TVP in .NET is always SqlDbType.Structured. The additional TVP column meta-data is inferred from the supplied parameter value, which may be a DbDataReader, IEnumerable, or DataTable object. These objects inherently contain column meta-data and methods enumerate rows, which the SqlClient API uses to send the TVP to SQL Server.

DataTable objects are most commonly used as TVP values. DataTables are easy to use and can serve as containers for data beyond just TVP usage. But unlike DbDataReader and IEnumerable objects, a big gotcha with a DataTable is that the default data type String with maximum length of -1 (2GB LOB). This is the .NET equivalent of the SQL Server nvarchar(MAX) data type and has many insidious and negative implications with a TVP. First, values of types other than string that are added to a DataTable string column will be converted to string (DateTime, Integer, GUID, etc.). Consequently, using the default DataTable column string data type for non-string types will:

• increase client memory requirements compared to more compact native types
• incur conversion overhead
• prevent strong-typed data validation on client side
• require date format aware formatting of date and datetime values
• require using a period as decimal separators
• increase network usage compared with smaller native types

When String is the proper column data type, developers must be especially mindful of the max length specification when the DataTable is used as the TVP value. Strings in Windows and .NET are Unicode, requiring 2 bytes per character. This means a max length of over 4000 characters will cross the 8000 byte threshold for LOB data on the server side regardless of the table type on the server. Avoid using long string columns in TVPs when many rows are passed and never use the default -1 length unless a MAX type is actually intended.

DBAs who support applications that use TVPs should be aware of the repercussions tracing can have on performance and SQL Server memory. Avoid capturing TVP RPC completed events of large TVP requests, if possible. When tracing TVP RPC completed events, monitor the OBJECTSTORE_LBSS memory pool for excessive memory usage.

I’d like the thank SQL Server MVP Ola Hallengren for his suggestion for me to write this article.

ORDER BY Is Required

I often see folks assume rows will be returned in order by the clustered index when ORDER BY is not specified in the query. This is despite the fact that the SQL Server Books Online ORDER BY topic specifically states “The order in which rows are returned in a result set are not guaranteed unless an ORDER BY clause is specified.”

For those that want to save a few keystrokes and omit the needed ORDER BY, here’s one example that shows rows may be returned in an order other than the clustered index.

The reason the query without the ORDER BY returns rows in a different order than the clustered index logical order is because SQL Server chose to scan the table in physical allocation order using the IAM (Index Allocation Map) instead of following the clustered index linked list. For details of the allocation and page linkage of this table, you can use the undocumented (don’t use in in prod) sys.dm_db_database_page_allocations table-valued function in SQL 2012 and later:

The IAM scan was done here due to no ordering requirement for this query and the allocation order scan was deemed more efficient, and permissible in the READ UNCOMMITTED isolation level because data integrity isn’t needed. Other factors can also affect the ordering of results, including available indexes, execution plan operators, parallelism, and other concurrent activity.

Remember that SQL Server strives to execute queries as efficiently as possible as long as it adheres to the data contract. The chosen plan and storage engine internals that vary by SQL Server version and edition will influence ordering of results. The ordering of rows is by happenstance unless ORDER BY is specified.

Tiered Storage Partition Copy

In my last tiered storage sliding window post, I shared a sliding window script that also moves an older partition to a different filegroup as part of a tiered storage strategy. This approach allows one to keep actively used partitions on the fastest storage available while keeping older less frequently used data on less expensive storage. That version of the script moves data efficiently using a staging table and CREATE INDEX…DROP EXISTING but the downside is the data being moved is unavailable for querying from the time the partition is switched out of the main table until it is switched back in. Consequently, the maintenance needs to be scheduled during a window where data in the partition being moved isn’t needed.

This follow-up article shows an alternative copy technique instead. Data in the partition being moved is left in the main table until the after the copy completes. Once data are copied to the different filegroup, partition maintenance is performed to switch out the original partition and switch in the copied data, now on the older data filegroup. This allows data to remain online except during the final partition switches, which are fast meta-data operations. It is assumed that the data in the partition being moved is read-only during this process.

Copy Data to Slower Storage
Unfortunately, neither ALTER INDEX nor CREATE INDEX…DROP EXISTING provide the capability to repartition an individual partition of a table/index. ALTER INDEX allows one to target a specific partition during a REORGANIZE or REBUILD but not change the filegroup or partition scheme in the process. CREATE INDEX…DROP EXISTING allows a filegroup or partition scheme specification but applies to the entire index; individual partitions cannot be specified. Consequently DML (INSERT…SELECT) instead of DDL must be used in order to keep data online during the copy process.

I’ll assume you’ve already slid the window by purging expired data and preparing for new data as detailed in my last article with the only task remaining to move an older partition to a slower storage filegroup. The following steps will keep the data being moved online during the potentially long copy process. The offline operations performed in steps 3-6 are metadata operations that will complete quickly, once an exclusive table lock (to avoid deadlocks) can be granted. Note that all partition functions and schemes are identical and the staging table is empty before these steps are performed.

1) Move the empty staging table partition to the older data filgegroup
2) Load data into the moved partition of the staging table
3) Switch the copied data out of the main table into a second staging table
4) Switch the adjacent partition out of the main table into the first staging table
5) Move the main table partition to the older data filegroup
6) Switch the partitions from the first staging table back into the main table

The complete DDL and script is at the end of this article. It differs from my last article (which I’ll assume you’ve already perused) by the addition of a second staging table and this alternative move technique. Below are the individual steps.

Move the empty staging table partition to the older data filgegroup
Remember, the staging table is partitioned using the secondary partition function/scheme so that it can be repartitioned independently of the main table. The first task is to move the partition of the unused secondary partition scheme to the slower storage filegroup using MERGE and SPLIT:

1) MERGE the secondary partition function boundary of the moved month to remove it from the secondary partition function and scheme:

2) Set the secondary partition scheme NEXT USED to slower storage filegroup:

3) SPLIT the secondary partition function for the moved month to recreate the partition on the slower storage filegroup:

Load the moved partition of the staging table
Now that the staging table partition is on the FG_OlderData filegroup, copy data for the partition to be moved into a staging table using INSERT…SELECT. Note that the WHERE clause specifies the source partition boundaries:

Switch the copied data out of the main table into a second staging table
The second staging table is used to permanently remove data from the original partition, still on the NewerData filegroup.

Switch the adjacent partition out of the main table into the first staging table
Before we move the main table partition (now empty) to the OlderData filegroup, the adjacent partition is also switched out. This is technically not required but is done to follow Microsoft’s recommended best practice of merging only empty partitions.

Move the main table partition to the older data filegroup
The main table partition is moved from the NewerData filegroup to OlderData with merge. Keep in mind that the adjacent partition on the OlderData filegroup is empty. Data for both these partitions are in the staging table.

Switch the partitions from the first staging table back into the main table
Both main and staging table partition schemes are now identical. The partitions can now be switch back into the main table.

Below are the complete example scripts to implement this copy sliding window technique. My last tiered storage sliding window post also includes the CREATE DATABASE, including filegroups.

SQL Server File Naming Standards

Attention to detail in naming SQL Server physical files and logical file names will make a DBAs life easier. This is especially important when using RESTORE or attach. I suggest one establish and follow a naming standard for physical and logical database file names.

SQL Server does not enforce any particular naming standard for files.  SQL Server is perfectly happy with a data file named “readme.txt” or a log file named “word.doc”.  Of course, such an inappropriate name and extension will lead to confusion so the best practice is to name files appropriately.  An appropriate SQL Server database file name is essentially one that is self-documenting; one should be able to determine the exact purpose of a file simply by examining the name.  I recommend a naming standard that includes the associated database name, filegroup name, and file type.

Physical File Names
I propose adopting a physical file naming convention of “<database-name>_< filegroup-name>_<uniqueifier>.<file-type>” where:

  • <database-name> is the name of the associated database
  • <filegroup-name> is the name of the filegroup containing the file, or the literal “Log” for log files
  • <uniqueifier> is a integer to ensure the file name is unique
  • <file-type> is the standard SQL Server extension for the file type (“mdf”, “ndf”, or “ldf”) as documented in the SQL Server Books Online Database Files and Filegroups topic

Personally, I use an underscore to separate the name components but a dash will also suffice.  Ideally, the separator character should never be used in database or filegroup names to avoid ambiguity.  I recommend one avoid using special characters in database and filegroup names (e.g. use proper case) and ensure database and filegroup names conform to the rules for regular identifiers as outlined in the Books Online Database Identifiers topic.

There is some wiggle room as to how strictly one adheres to this naming convention.  One could omit the filegroup name and uniqueifier components for the primary data file (mdf) because this file is implicitly in the PRIMARY filegroup and there can be only primary data file in the filegroup/database.  Similarly, the uniqueifier could be omitted for the first or only file within a filegroup.   That being said, a more strict adherence this naming convention provides better consistency and makes naming file more of a no-brainer.

Note that the uniqueifier in the name need not imply a sequential value with no gaps.  For example, consider a database named MyDatabase with filegroup DataFG containing 3 files named ‘MyDatabase_DataFG_1.ndf’, ‘MyDatabase_DataFG_2.ndf’, and ‘MyDatabase_DataFG_3.ndf’.  If the second file is removed, there is no requirement to rename file ‘MyDatabase_DataFG_3.ndf’ to ‘MyDatabase_DataFG_2.ndf’.  However, if one is anal about such things, there is no harm in doing so other than the unavailability of the database during the maintenance.

The physical file naming convention described above will guarantee physical file names are unique within a SQL Server instance and facilitate relocation to different drives/folders without naming conflicts.  On a server with multiple instances, I suggest placing files in separate folders for each instance.  This practice will better organize database files while avoiding file name conflicts when a database with the same name exists on different instances.

Logical File Names
Although I commonly see database names embedded within logical file names, I suggest one avoid that practice.  The scope of a logical file name is the context database so including the database name is redundant.  More importantly, the original logical file name is retained when a database is subsequently restored or attached with a different name so the name becomes out-of-sync with the actual database name unless one explicitly changes the names afterward to match the new database name.  This extra administrative work is often overlooked and can be avoided entirely by not including the database name in the logical file name when the initial database is created or altered.

I suggest one use the same naming convention for the logical file name as the physical file name but without the database name.  The logical name will therefore match the last part of physical name (< filegroup-name>_<uniqueifier>).

Unfortunately, SQL Server likes to include the database name in logical names of the primary data file and log file during initial creation.  Executing a minimal create database T-SQL statement like “CREATE DATABASE Foo;” will result in the files below created in the default data and log folder for the instance.  This also applies to the logical names suggested by the SSMS GUI, although one can specify different names as desired.

Logical Name Physical Name
Foo D:\SqlDataFiles\Foo.mdf
Foo_log L:\SqlLogFiles\Foo_log.ldf

I recommend using the expanded form of CREATE DATABASE so that you have complete control over names, locations, and sizes.  Consider creating a custom SSMS template or code snippet to facilitate creating databases with the proper names.


Below are examples of the naming conventions detailed in this article.

Simple database with only a primary data and log file:

Logical Name Physical Name
PRIMARY_1 D:\SqlDataFiles\ExampleDatabase_PRIMARY_1.mdf
Log_1 L:\SqlLogFiles\ExampleDatabase_Log_1.ldf

Database with 2 files in PRIMARY filegroup, 2 secondary filegroups containing 3 files each, and 2 log files:

Logical Name Physical Name
PRIMARY_1 D:\SqlDataFiles\ExampleDatabase_PRIMARY_1.mdf
PRIMARY_2 E:\SqlDataFiles\ExampleDatabase_PRIMARY_2.ndf
DataFG_1 F:\SqlDataFiles\ExampleDatabase_DataFG_1.ndf
DataFG_2 G:\SqlDataFiles\ExampleDatabase_DataFG_2.ndf
DataFG_3 H:\SqlDataFiles\ExampleDatabase_DataFG_3.ndf
IndexFG_1 I:\SqlDataFiles\ExampleDatabase_IndexFG_1.ndf
IndexFG_2 J:\SqlDataFiles\ExampleDatabase_IndexFG_2.ndf
IndexFG_3 K:\SqlDataFiles\ExampleDatabase_IndexFG_3.ndf
Log_1 L:\SqlLogFiles\ExampleDatabase_Log_1.ldf
Log_2 M:\SqlLogFiles\ExampleDatabase_Log_2.ldf

Improving Uniqueidentifier Performance

A common anti-pattern I run into is the random primary key, commonly a GUID. This design is insidious because the performance implications of random access aren’t immediately obvious and exacerbated when the primary key index is clustered. It is often only after the table grows to a larger size that the performance problems become apparent. Symptoms include slowly degrading performance over time, with increased blocking and deadlocking as a side effect.

Figure 1 shows the performance profile of a random inserts with a random GUID (SQL Server uniqueidentifier data type) clustered primary key. The red line indicates the rate of batch requests per second (inserts) while the blue line shows the total number of rows in the table, scaled such that the top of the graph represents 3M rows. Only about 700, 000 rows could be inserted during this 15 minute single-threaded random key insert test, even though the insert rate was fast initially.

Figure 1: Random key insert performance
Random insert performance graph

Incremental Primary Keys

As you might guess, the cure for the random primary key anti-pattern is an incremental key pattern. With a uniqueidentifier data type, a sequential value can be assigned by SQL Server using the NEWSEQUENTIALID function (in a default constraint expression) or in application code using the UuidCreateSequential Win32 API call along with some byte swapping (code example below). Alternatively, one can use an integral data type (int, bigint, etc.) along with a value generated by an IDENTITY property or a SEQUENCE object. The advantage of an integral type is the reduced space requirements compared to a 16-byte uniqueidentifier. The advantage of a uniqueidentifier is that it can easily be generated in application code before database persistence without a database round trip, which is desirable for distributed applications and when keys of related tables are assigned in application code before writing to the database.

Figure 2 shows the same test using a sequential key value. Over 2.2M rows were inserted in 15 minutes. As you can see, significant performance improvement is achieved with this trivial application change.

Figure 2: Incremental key insert performance
Random insert performance graph

Listing 1 shows the T-SQL code I used for these performance tests and listing 2 contains the C# code (with the random GUID commented out). I generated the uniqueidentifier value via application code in the tests but performance with NEWID() is comparable to the first test and NEWSEQUENTIALID() is similar to the second test.

Listing 1: T-SQL scripts for test table and stored procedure

Listing 2: C# insert test console application

Why Random Keys Are Bad

I think it’s important for one to understand why random keys have such a negative impact on performance against large tables. DBAs often cite fragmentation and page splits as the primary causes of poor performance with random keys. Although it is true random inserts do cause fragmentation and splits, the primary cause of bad performance with random keys is poor temporal reference locality (, which I’ll detail shortly. Note that there were no real page splits in these insert performance tests because the nearly 8K row size allowed only one row per page. Although significant extent fragmentation occurred, this didn’t impact these single-row requests; extent fragmentation is mostly an issue with sequential scans against spinning media. So neither splits nor fragmentation explain the poor performance of the random inserts.

Temporal reference locality basically means that once data is used (e.g. inserted or touched in any way), it is likely to be used again in the near future. This is why SQL Server uses a LRU-2 algorithm to manage the buffer cache; data most recently touched will remain in memory while older, less often referenced data are aged out. The impact of random key values on temporal locality (i.e. buffer efficiency) is huge. Consider that inserts are basically rewrites of existing pages. When a new row is inserted into a table, SQL Server first reads the page where the row belongs (by key value if the table has a clustered index) and then either adds the row to the existing data page or allocates a new one if there’s not enough space available in the existing page for the new row. With a random key value, the new key value is unlikely to be adjacent to the last one inserted (which is probably still in memory) so the needed page often must be read from storage.

All things being equal, single-row performance will be roughly the same with both sequential and random keys as long as data are memory resident. This is why the random and sequential key insert tests show the same good performance initially. But once the table size exceeded the size of the buffer pool, the random key test showed a precipitous drop in throughput and steady degradation thereafter. In short, random keys diminish temporal reference locality because there is no correlation between time (most recently accessed data) and the key value.

Why Incremental Keys Good

An incremental key value naturally improves temporal reference locality; the next key value is adjacent to the last one inserted and is likely still in memory. An incremental key provides excellent insert performance regardless of table size as the insert performance test shows. Also, applications typically use recently inserted data more often than older data. This allows the same amount of work to done with much less physical I/O than a random key value.

Random Notes about GUIDs

According to the Globally unique identifier Wiki (, the random 122 bits of a GUID can generate 2122 unique values. That’s an incomprehensibly large 5.3 x 1036 (or 5,300,000,000,000,000,000,000,000,000,000,000,000) number unique values.

The value returned by NEWSEQUENTIALID and UuidCreateSequential is guaranteed to be unique on a given computer. Furthermore, it is globally unique if the computer has a network card because the MAC address is used as part of the GUID generation algorithm.