Tiered Storage Partition Copy

In my last tiered storage sliding window post, I shared a sliding window script that also moves an older partition to a different filegroup as part of a tiered storage strategy. This approach allows one to keep actively used partitions on the fastest storage available while keeping older less frequently used data on less expensive storage. That version of the script moves data efficiently using a staging table and CREATE INDEX…DROP EXISTING but the downside is the data being moved is unavailable for querying from the time the partition is switched out of the main table until it is switched back in. Consequently, the maintenance needs to be scheduled during a window where data in the partition being moved isn’t needed.

This follow-up article shows an alternative copy technique instead. Data in the partition being moved is left in the main table until the after the copy completes. Once data are copied to the different filegroup, partition maintenance is performed to switch out the original partition and switch in the copied data, now on the older data filegroup. This allows data to remain online except during the final partition switches, which are fast meta-data operations. It is assumed that the data in the partition being moved is read-only during this process.

Copy Data to Slower Storage
Unfortunately, neither ALTER INDEX nor CREATE INDEX…DROP EXISTING provide the capability to repartition an individual partition of a table/index. ALTER INDEX allows one to target a specific partition during a REORGANIZE or REBUILD but not change the filegroup or partition scheme in the process. CREATE INDEX…DROP EXISTING allows a filegroup or partition scheme specification but applies to the entire index; individual partitions cannot be specified. Consequently DML (INSERT…SELECT) instead of DDL must be used in order to keep data online during the copy process.

I’ll assume you’ve already slid the window by purging expired data and preparing for new data as detailed in my last article with the only task remaining to move an older partition to a slower storage filegroup. The following steps will keep the data being moved online during the potentially long copy process. The offline operations performed in steps 3-6 are metadata operations that will complete quickly, once an exclusive table lock (to avoid deadlocks) can be granted. Note that all partition functions and schemes are identical and the staging table is empty before these steps are performed.

1) Move the empty staging table partition to the older data filgegroup
2) Load data into the moved partition of the staging table
3) Switch the copied data out of the main table into a second staging table
4) Switch the adjacent partition out of the main table into the first staging table
5) Move the main table partition to the older data filegroup
6) Switch the partitions from the first staging table back into the main table

The complete DDL and script is at the end of this article. It differs from my last article (which I’ll assume you’ve already perused) by the addition of a second staging table and this alternative move technique. Below are the individual steps.

Move the empty staging table partition to the older data filgegroup
Remember, the staging table is partitioned using the secondary partition function/scheme so that it can be repartitioned independently of the main table. The first task is to move the partition of the unused secondary partition scheme to the slower storage filegroup using MERGE and SPLIT:

1) MERGE the secondary partition function boundary of the moved month to remove it from the secondary partition function and scheme:

2) Set the secondary partition scheme NEXT USED to slower storage filegroup:

3) SPLIT the secondary partition function for the moved month to recreate the partition on the slower storage filegroup:

Load the moved partition of the staging table
Now that the staging table partition is on the FG_OlderData filegroup, copy data for the partition to be moved into a staging table using INSERT…SELECT. Note that the WHERE clause specifies the source partition boundaries:

Switch the copied data out of the main table into a second staging table
The second staging table is used to permanently remove data from the original partition, still on the NewerData filegroup.

Switch the adjacent partition out of the main table into the first staging table
Before we move the main table partition (now empty) to the OlderData filegroup, the adjacent partition is also switched out. This is technically not required but is done to follow Microsoft’s recommended best practice of merging only empty partitions.

Move the main table partition to the older data filegroup
The main table partition is moved from the NewerData filegroup to OlderData with merge. Keep in mind that the adjacent partition on the OlderData filegroup is empty. Data for both these partitions are in the staging table.

Switch the partitions from the first staging table back into the main table
Both main and staging table partition schemes are now identical. The partitions can now be switch back into the main table.

Scripts
Below are the complete example scripts to implement this copy sliding window technique. My last tiered storage sliding window post also includes the CREATE DATABASE, including filegroups.

Partitioned Tables and Tiered Storage

A sliding window has long been a mainstay use case for table partitioning. A table partitioning sliding window pattern provides an efficient way to purge data according to retention needs and allows individual partitions to be placed on segregated storage. Well-planned data placement can improve performance of parallel scans of related data, meet RTO objectives with piecemeal restores, and performance SLAs with tiered storage. One can place more actively used partitions on the fastest storage available while keeping older less often used data on slower storage, which is typically much less expensive and more abundant.

I’ve posted articles on this site that show how to automate retention of temporal data using partitioned tables and T-SQL scripts. In this post, I’ll combine those building blocks to implement a tiered storage strategy with an automated sliding window. This technique will keep the most current data on the fastest storage, move data to slower storage as temporal partitions age, purge the oldest partition, and finally prepare for new data. I’ll assume you’re already familiar with partitioning concepts, objects (PARTTION SCHEME and FUNCTION), DDL (SWITCH, SPLIT, MERGE), and partition alignment. Alignment is required in order to use the methods described here.

Tiered Storage Planning
Partitioned tables or not, remember that SQL Server provides control of data placement only at the filegroup level. Without partitioning, an entire table or index is placed on the same filegroup. SQL Server then distributes data among the underlying files according to its proportional fill algorithm. Table partitioning provides more granular control over data placement by allowing individual table/index partitions within a table to be placed on different filegroups when it is advantageous to do so to meet the objectives I mentioned earlier.

In order control placement of individual partitions on different storage, at least one filegroup for each storage tier is needed, each with at least one file on that storage tier. Avoid mixing files within a filegroup that reside on different types of storage if you want maximum control over performance. Consider that there may be multiple levels of storage abstraction (OS, virtualization, SAN, NAS, etc.) that ultimately determines where data are physical stored so one must be mindful of those implications in file placement within a storage tier filegroup.

I’ll walk through the implementation with this scenario. Let’s assume 1TB of solid state storage is available and an additional 5TB available on a spinning media SAN. Although I use only 2 storage tiers here, this technique can be extended for additional tiers if needed.

A 12 month sliding window is planned with a single partitioned table. We’ll keep the current month plus 3 prior months on solid state storage and the remaining 9 months of historical data on the SAN. This necessitates a monthly maintenance script that that performs the following tasks:

1) Purge expired data (older than 12 months)
2) Prepare for new data (on solid state storage)
3) Move data to slower storage (move the 4-month old partition from solid state to spinning media storage
4) Cleanup

I start by creating the partition function, partition scheme, partitioned table, and a staging table partitioned using the same partition scheme (complete DDL at the end of this article). The partitioned staging table avoids the need to create check constraints on the staging table in order to use switch partitions back into the main table. I also create a secondary partition function and partition scheme, identical the others but initially not used. These objects are used to move the partition and data, which I’ll discuss in detail later.

Note that I use a RANGE RIGHT partition function, which I consider to be more natural for incremental temporal data than RANGE LEFT. I also follow the recommended practice of a SPLIT and MERGE of only empty partitions in order to avoid data movement and excessive logging (about 4 times that of normal DML).

Monthly maintenance tasks are performed by a stored procedure scheduled by SQL Agent job after the start of each new month. This proc takes 3 date parameters and is run scheduled for execution after the start of a new month:

    @ExpiredMonthStartDate: partition boundary of expired data to purge.
    @MonthStartDateToMoveFromCurrentToOld: partition boundary of expired data to move to slower storage.
    @FutureMonthStartDate: partition boundary for future data to be created on fast storage.

Below are the details of the maintenance tasks.

Purge Expired data
The first task is the same with any sliding window where oldest data are purged:

1) Truncate the staging table, which is partitioned using a partition scheme identical to the main table:

2) Switch expired data partitions from to staging table into the corresponding staging table partition using SWITCH:

3) Truncate staging table again:

4) Remove the first partition boundary with MERGE of the main table partition function:

The secondary partition function is also merged to keep it identical to the main partition. Again, no objects yet use the secondary partition function/scheme.

Prepare for New Data
The second task is also similar to the typical sliding window pattern, except the new partition is specifically placed on the solid state filegroup and the secondary partition function/scheme is also modified:

1) Set NEXT USED partition schemes to solid state filegroup

2) SPLIT main table partition function to create future boundary

Move Data to Slower Storage
The final task is to move the 4 month old partition from solid state to spinning media storage. Unlike the previous MERGE and SPLIT, which are metadata-only operations due to the empty partitions, moving the non-empty partition requires physical movement of a potentially large amount of data. Although it is possible to do this with MERGE and SPLIT alone, that would be very inefficient. Instead, I use the DROP_EXISITNG clause of CREATE INDEX after switching the partition to the staging table. The DROP_EXISITNG clause leverages the existing index order to avoid an expensive sort operation.

Note that this technique switches the partition to be moved out of the main table so data is not available for querying during the move process. In my next post, I’ll show how to keep read-only data online during the move using INSERT…SELECT with detailed attention to performance.

1) SWITCH the partition of the month to be moved into the staging table (partitioned using the same partition scheme as the main table to ensure alignment):

2) Move the partition of the unused secondary partition scheme to the slower storage filegroup:

a. MERGE the secondary partition function boundary of the moved month to remove it from the secondary partition function and scheme:

b. Set the secondary partition scheme NEXT USED to slower storage filegroup:

c. SPLIT the secondary partition function for the moved month to recreate the partition on the slower storage filegroup:

3) Rebuild the staging table on the secondary partition scheme. This will physically move the partition and data of the month to move (the only non-empty partition in the staging table) to the slower storage filegroup:

4) SWITCH the partition prior to the moved month into the staging table so that the adjacent partitions of the main table are empty:

5) MERGE the main table partition function to remove the moved month boundary:

6) Recreate the main table moved month partition on slower storage:

a. Set the main table partition scheme NEXT USED to the slower storage filegroup:

b. SPLIT the main table partition scheme to recreate the moved month partition on the slower storage filegroup:

7) SWITCH the moved and prior month partitions back into the staging table:

Cleanup
1) Rebuild the staging table specifying the main table partition scheme to prepare for the next months maintenance. The secondary partition function and scheme are again unused and are identical to the main table partition function/scheme:

Complete Script
Below is a complete script to create the initial database, partition function, partition scheme, table DDL, sample data, partition maintenance stored procedure, and example usage. The D drive is on solid state storage, E and L drives are on a spinning disk SAN, with the L drive dedicated to log.

Note that the stored procedure acquires an exclusive table lock on the partitioned table to prevent deadlocking during partition maintenance. However, the duration of the transaction should be very short since only metadata operations are performed.

If you have multiple partition schemes (e.g. indexes on separate filegroups) using the same partition functions, you’ll need to create secondary schemes for those too and set the NEXT USED accordingly prior to SPLIT. I can’t think of a reason to also have different partition functions in this aligned table scenario since the functions must be identical anyway to provide alignment. If you also have different functions for the index partition schemes, those will need to be split and merged at the same time as the data partition scheme.