Microsoft SQL Server Script DOM

Posted on June 7, 2020 by dguzman

I once heard someone refer to the Microsoft SQL Server Script DOM library as the crown jewels of SQL Server tools. That may be a bit of an overstatement but reliably parsing and analyzing T-SQL scripts programmatically is an exceedingly complex task due to the many variations of T-SQL constructs and contexts in which T-SQL language elements may occur. Albeit one can parse T-SQL code using string functions, regex expressions, et.al., such parsing is fragile at best due to the difficulty in considering variants of multi-line statements, multi-statement lines, whitespace, comments that contain T-SQL code, etc.

Whether you’re a DBA needing to identify problem code in existing objects or a database developer wanting to incorporate robust code analysis rules in a CI/CD pipeline, the T-SQL Script Document Object Model (DOM) library is the right tool for these jobs and more.

The script DOM library official documentation is rather scant, including auto-generated class documentation without usage examples and behavior details. This introductory article reviews script DOM concepts and programming while showing how leverage the power of this library using PowerShell (version 5 or later), which is especially handy for ad-hoc needs or CI/CD pipeline custom tasks. One can also use the library in any .NET language (e.g. C#, VB.NET) for a variety of purposes such as database code analysis rules in SSDT. Deep expertise in .NET programming or PowerShell is not required but a cursory knowledge of object-oriented programming, PowerShell, and/or .NET is helpful to understand and extend the example code included here.

Script DOM Overview

The T-SQL script DOM library, Microsoft.SqlServer.TransactSql.ScriptDom.dll, includes classes and methods to parse, process, and format T-SQL scripts. Microsoft has open-sourced the script DOM as a project on GitHub and made it available as a NuGet Package. The Microsoft.SqlServer.TransactSql.ScriptDom assembly is a stand-alone client library that can be used without installing SQL Server or connecting to a database engine. The library includes .NET Framework and .NET Core assemblies so that it may be used on Windows, macOS, or Linux.

The PowerShell script examples in this article use the .NET Framework version included in the Microsoft.SqlServer.TransactSql.ScriptDom NuGet Package. The assembly is also included with other components and tools, including the DacFx NuGet package and sqlpackage. These scripts in this article will run on any machine (or container) with PowerShell or PowerShell Core installed by specifying the appropriate assembly .NET version on the Add-Type PowerShell command. One can similarly add a reference to the NuGet package for use in compiled .NET Framework applications.

Note that the T-SQL script DOM library parses only T-SQL; it does not recognize SQLCMD commands or scripting variables understood by SQL Server tools like SQLCMD, SSMS (in SQLCMD mode), and Azure Data Studio. The library can, however, parse scripts containing GO batch terminators even though GO is not technically a T-SQL statement.

Parsing Overview

Scripts are parsed by invoking the Parse method of T-SQL script DOM library TSqlParser class. The parser understands the complex T-SQL abstract syntax tree and splits T-SQL source into atomic TSqlParserTokens of TSqlTokenTypes that represent keywords, identifiers, punctuation, literals, whitespace, etc. These low-level tokens are grouped into more meaningful TSqlFragment objects that represent language elements of the script DOM, such as batches, statements, clauses, etc. Fragments, rather than the low-level parser tokens, are most often used in practice, although the underlying tokens are available for specialized requirements

The Parse method returns a TSqlFragment object of type TSqlScript containing all fragments within the script. This top-level fragment of the DOM hierarchy provides programmatic access to all language element fragments in the script. Nearly 1,000 different fragment types exist today due to the many granular T-SQL language elements.

Each fragment includes properties applicable to the specific fragment type it represents plus the properties common to all fragments (inherited from TSqlFragment) listed below as extracted from the documentation.

Property	Description
FirstTokenIndex	Gets or sets the first index of the token.
FragmentLength	Defines the number of characters the fragment takes up in the script it was parsed.
LastTokenIndex	Gets or sets the last index of the token.
ScriptTokenStream	Gets or sets the script token stream.
StartColumn	Gets the start column.
StartLine	Gets the start line.
StartOffset	Defines the character offset of fragments starting location in the script it was parsed.

The script source can be passed to Parse as a TextReader or as an IList<TSqlParserToken>. All the examples in this article use the TextReader method overload with a StringReader object containing the source script.

Parsing Examples

Here is an example that parses trigger code to ensure it is syntactically valid for each SQL Server version from SQL Server 2000 to SQL Server 2019. This trigger uses RAISERROR syntax that was deprecated decades ago (SQL Server 2000?) and removed from the product entirely in SQL Server 2012. Parsing will fail with an invalid syntax error when a TSqlParser110 or later parser version is used.

# trigger with deprecated/obsolete RAISERROR syntax
$script = @"
CREATE TRIGGER dbo.tr_OrderHeader
ON dbo.OrderHeader
FOR DELETE
AS
IF EXISTS(SELECT 1 FROM dbo.OrderDetail AS od WHERE od.OrderID IN(SELECT d.OrderID FROM deleted AS d))
BEGIN
	RAISERROR 50001 'OrderDetail rows exist for order. Delete order detail before order header.'
	ROLLBACK
END
"@

# parse script with specified parser and display success or parsing errors
Function Parse-TSqlScript($parser, $script) {

    # create an ParseError collection for any errors returned by parser
    $parseErrors = New-Object System.Collections.Generic.List[Microsoft.SqlServer.TransactSql.ScriptDom.ParseError]

    # create a StringReader for the script for parsing
    $stringReader = New-Object System.IO.StringReader($script)

    # parse the script
    $tSqlFragment = $parser.Parse($stringReader, [ref]$parseErrors)

    # display parse success or error
    if($parseErrors.Count -eq 0) {
        Write-Host "$($parser.GetType().Name): No parsing errors" -ForegroundColor Green
    }
    else {
        Write-Host "$($parser.GetType().Name): $($parseErrors.Count) parsing error(s): $(($parseErrors | ConvertTo-Json))" -ForegroundColor Yellow
    }

}

# ############
# ### MAIN ###
# ############
try {

    # load Script DOM assembly for use by this PowerShell session
    # Create trusted NuGet package source, if needed
    $packageSource = Get-PackageSource | where { ($_.Location -EQ "https://www.nuget.org/api/v2") -and ($_.ProviderName -eq "NuGet") -and ($_.IsTrusted -eq $true) }
    if($packageSource -eq $null) {
        Register-PackageSource NuGetV2 https://www.nuget.org/api/v2 -ProviderName NuGet -Trusted
    }

    # Install package, if needed.
    $tSqlScriptDomPackage = Install-Package Microsoft.SqlServer.TransactSql.ScriptDom -Source ($packageSource.Name) -Scope CurrentUser
    # Get package
    $tSqlScriptDomPackage = Get-Package -Name Microsoft.SqlServer.TransactSql.ScriptDom

    # Load Microsoft.SqlServer.TransactSql.ScriptDom.dll .NET framework assembly into app domain for use in PS scripts
    $tSqlScriptDomPackageFolderPath = [System.IO.Path]::GetDirectoryName($tSqlScriptDomPackage.Source)
    Add-Type -LiteralPath "$tSqlScriptDomPackageFolderPath\lib\net462\Microsoft.SqlServer.TransactSql.ScriptDom.dll"

    # SQL Server 2000
    $parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql80Parser($true)
    Parse-TSqlScript -parser $parser -script $script

    # SQL Server 2005
    $parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql90Parser($true)
    Parse-TSqlScript -parser $parser -script $script

    # SQL Server 2008
    $parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql100Parser($true)
    Parse-TSqlScript -parser $parser -script $script

    # SQL Server 2012
    $parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql110Parser($true)
    Parse-TSqlScript -parser $parser -script $script

    # SQL Server 2014
    $parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql120Parser($true)
    Parse-TSqlScript -parser $parser -script $script

    # SQL Server 2016
    $parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql130Parser($true)
    Parse-TSqlScript -parser $parser -script $script

    # SQL Server 2017
    $parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql140Parser($true)
    Parse-TSqlScript -parser $parser -script $script

    # SQL Server 2019
    $parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql150Parser($true)
    Parse-TSqlScript -parser $parser -script $script

    # SQL Server 2022
    $parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql160Parser($true)
    Parse-TSqlScript -parser $parser -script $script

}
catch {

    throw

}

Script output:

TSql80Parser: No parsing errors
TSql90Parser: No parsing errors
TSql100Parser: No parsing errors
TSql110Parser: 1 parsing error(s): {
    "Number":  46010,
    "Offset":  193,
    "Line":  7,
    "Column":  12,
    "Message":  "Incorrect syntax near 50001."
}
TSql120Parser: 1 parsing error(s): {
    "Number":  46010,
    "Offset":  193,
    "Line":  7,
    "Column":  12,
    "Message":  "Incorrect syntax near 50001."
}
TSql130Parser: 1 parsing error(s): {
    "Number":  46010,
    "Offset":  193,
    "Line":  7,
    "Column":  12,
    "Message":  "Incorrect syntax near 50001."
}
TSql140Parser: 1 parsing error(s): {
    "Number":  46010,
    "Offset":  193,
    "Line":  7,
    "Column":  12,
    "Message":  "Incorrect syntax near 50001."
}
TSql150Parser: 1 parsing error(s): {
    "Number":  46010,     "Offset":  193,
    "Line":  7,
    "Column":  12,
    "Message":  "Incorrect syntax near 50001."
}

I have seen invalid code like the above trigger in existing production databases due to past database upgrades. These objects can no longer be executed and should be dropped or remediated. The script below extends the previous example, parsing all existing stored procedures, triggers, and functions in an existing database according to the desired target SQL Server version (SQL 2019) and the existing object quoted identifier setting passed as the TSqlParser constructor argument.

$connectionString = "Data Source=.;Initial Catalog=AdventureWorks2012;Integrated Security=SSPI"

# parse script with specified parser and display success or parsing errors
Function Parse-TSqlScript($parser, $script) {

    # create an ParseError collection for any errors returned by parser
    $parseErrors = New-Object System.Collections.Generic.List[Microsoft.SqlServer.TransactSql.ScriptDom.ParseError]

    # create a StringReader for the script for parsing
    $stringReader = New-Object System.IO.StringReader($script)

    # parse the script
    $tSqlFragment = $parser.Parse($stringReader, [ref]$parseErrors)

    # display parse success or error
    if($parseErrors.Count -eq 0) {
        Write-Host "$($parser.GetType().Name): No parsing errors" -ForegroundColor Green
    }
    else {
        Write-Host "$($parser.GetType().Name): $($parseErrors.Count) parsing error(s): $(($parseErrors | ConvertTo-Json))" -ForegroundColor Yellow
    }

}

# ############
# ### MAIN ###
# ############
try {

    # Create trusted NuGet package source, if needed
    $packageSource = Get-PackageSource | where { ($_.Location -EQ "https://www.nuget.org/api/v2") -and ($_.ProviderName -eq "NuGet") -and ($_.IsTrusted -eq $true) }
    if($packageSource -eq $null) {
        Register-PackageSource NuGetV2 https://www.nuget.org/api/v2 -ProviderName NuGet -Trusted
    }

    # Install package, if needed.
    $tSqlScriptDomPackage = Install-Package Microsoft.SqlServer.TransactSql.ScriptDom -Source ($packageSource.Name) -Scope CurrentUser
    # Get package
    $tSqlScriptDomPackage = Get-Package -Name Microsoft.SqlServer.TransactSql.ScriptDom

    # Load Microsoft.SqlServer.TransactSql.ScriptDom.dll .NET framework assembly into app domain for use in PS scripts
    $tSqlScriptDomPackageFolderPath = [System.IO.Path]::GetDirectoryName($tSqlScriptDomPackage.Source)
    Add-Type -LiteralPath "$tSqlScriptDomPackageFolderPath\lib\net462\Microsoft.SqlServer.TransactSql.ScriptDom.dll"
    $query = @"
SELECT 
	  QUOTENAME(OBJECT_SCHEMA_NAME(object_id)) + N'.' + QUOTENAME(OBJECT_NAME(object_id)) AS ObjectName
	, OBJECTPROPERTY(object_id, 'ExecIsQuotedIdentOn') AS ExecIsQuotedIdentOn
	, definition 
FROM sys.sql_modules;
"@

    $connection = New-Object Data.SqlClient.SqlConnection($connectionString)
    $command = New-Object Data.SqlClient.SqlCommand($query, $connection)
    $connection.Open()
    $reader = $command.ExecuteReader()
    while ($reader.Read()) {

        $parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql160Parser($reader["ExecIsQuotedIdentOn"])

        Write-Host "Parsing $($reader["ObjectName"]) ..."
        Parse-TSqlScript -parser $parser -script $($reader["definition"])

    }
    $connection.Close()
}    
catch {

    throw

}

Below are the first few lines output if output. No parsing errors were found in this case.

Parsing [Sales].[vStoreWithContacts] ...
TSql160Parser: No parsing errors
Parsing [Sales].[vStoreWithAddresses] ...
TSql160Parser: No parsing errors
Parsing [Purchasing].[vVendorWithContacts] ...
TSql160Parser: No parsing errors
Parsing [Purchasing].[vVendorWithAddresses] ...
TSql160Parser: No parsing errors
Parsing [dbo].[ufnGetAccountingStartDate] ...
TSql160Parser: No parsing errors

The same technique as above can parse version-controlled T-SQL scripts or any scripts stored in files. This example parses all scripts in a given directory and subdirectories.

# parse script with specified parser and display success or parsing errors
Function Parse-TSqlScript($parser, $script) {

    # create an ParseError collection for any errors returned by parser
    $parseErrors = New-Object System.Collections.Generic.List[Microsoft.SqlServer.TransactSql.ScriptDom.ParseError]

    # create a StringReader for the script for parsing
    $stringReader = New-Object System.IO.StringReader($script)

    # parse the script
    $tSqlFragment = $parser.Parse($stringReader, [ref]$parseErrors)

    # display parse success or error
    if($parseErrors.Count -eq 0) {
        Write-Host "$($parser.GetType().Name): No parsing errors" -ForegroundColor Green
    }
    else {
        Write-Host "$($parser.GetType().Name): $($parseErrors.Count) parsing error(s): $(($parseErrors | ConvertTo-Json))" -ForegroundColor Yellow
    }

}

# ############
# ### MAIN ###
# ############
try {
    # Create trusted NuGet package source, if needed
    $packageSource = Get-PackageSource | where { ($_.Location -EQ "https://www.nuget.org/api/v2") -and ($_.ProviderName -eq "NuGet") -and ($_.IsTrusted -eq $true) }
    if($packageSource -eq $null) {
        Register-PackageSource NuGetV2 https://www.nuget.org/api/v2 -ProviderName NuGet -Trusted
    }
    # Install package, if needed.
    $tSqlScriptDomPackage = Install-Package Microsoft.SqlServer.TransactSql.ScriptDom -Source ($packageSource.Name) -Scope CurrentUser
    # Get package
    $tSqlScriptDomPackage = Get-Package -Name Microsoft.SqlServer.TransactSql.ScriptDom

    # Load Microsoft.SqlServer.TransactSql.ScriptDom.dll .NET framework assembly into app domain for use in PS scripts
    $tSqlScriptDomPackageFolderPath = [System.IO.Path]::GetDirectoryName($tSqlScriptDomPackage.Source)
    Add-Type -LiteralPath "$tSqlScriptDomPackageFolderPath\lib\net462\Microsoft.SqlServer.TransactSql.ScriptDom.dll"
    $scriptFiles = Get-ChildItem "C:\SqlScripts\*.sql" -Recurse
    # parse with quoted identifier on
    $parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql160Parser($true)
    foreach ($scriptFile in $scriptFiles) {

        $script = [IO.File]::ReadAllText($scriptFile.FullName)
        Write-Host "Parsing $($scriptFile.FullName) ..."
        Parse-TSqlScript -parser $parser -script $script

    }

}    
catch {

    throw

}

Partial output from the script above:

Parsing C:\SqlScripts\AdventureWorks2012\AdventureWorks2012.Database.sql ...
TSql160Parser: No parsing errors
Parsing C:\SqlScripts\AdventureWorks2012\dbo.AccountNumber.UserDefinedDataType.sql ...
TSql160Parser: No parsing errors
Parsing C:\SqlScripts\AdventureWorks2012\dbo.AWBuildVersion.Table.sql ...
TSql160Parser: No parsing errors
Parsing C:\SqlScripts\AdventureWorks2012\dbo.DatabaseLog.Table.sql ...
TSql160Parser: No parsing errors
Parsing C:\SqlScripts\AdventureWorks2012\dbo.ErrorLog.Table.sql ...
TSql160Parser: No parsing errors
Parsing C:\SqlScripts\AdventureWorks2012\dbo.Flag.UserDefinedDataType.sql ...
TSql160Parser: No parsing errors
Parsing C:\SqlScripts\AdventureWorks2012\dbo.Name.UserDefinedDataType.sql ...
TSql160Parser: No parsing errors

Analyzing T-SQL Scripts

Parsing code for syntax errors is only a small subset of T-SQL Script DOM capability. One can also analyze T-SQL code after parsing using the TSqlScript fragment returned by the Parse method.

One way to examine scripts is by traversing the parsed DOM hierarchy of TSqlScript, TSqlBatch, and TSqlStatement fragments. This technique has shortcomings as illustrated with this example, which iterates over the TSqlBatch Statements collection looking for “SELECT *” constructs. The script displays a warning message and location within the script where a SelectStarExpression fragment is found.

# this is the script to parse
$script = @"
SELECT * FROM dbo.SomeTable;
IF 1 = 1
BEGIN
    SELECT * FROM dbo.SomeOtherTable;
END;
"@

try {

    # Create trusted NuGet package source, if needed
    $packageSource = Get-PackageSource | where { ($_.Location -EQ "https://www.nuget.org/api/v2") -and ($_.ProviderName -eq "NuGet") -and ($_.IsTrusted -eq $true) }
    if($packageSource -eq $null) {
        Register-PackageSource NuGetV2 https://www.nuget.org/api/v2 -ProviderName NuGet -Trusted
    }

    # Install package, if needed.
    $tSqlScriptDomPackage = Install-Package Microsoft.SqlServer.TransactSql.ScriptDom -Source ($packageSource.Name) -Scope CurrentUser
    # Get package
    $tSqlScriptDomPackage = Get-Package -Name Microsoft.SqlServer.TransactSql.ScriptDom

    # Load Microsoft.SqlServer.TransactSql.ScriptDom.dll .NET framework assembly into app domain for use in PS scripts
    $tSqlScriptDomPackageFolderPath = [System.IO.Path]::GetDirectoryName($tSqlScriptDomPackage.Source)
    Add-Type -LiteralPath "$tSqlScriptDomPackageFolderPath\lib\net462\Microsoft.SqlServer.TransactSql.ScriptDom.dll"    # create a TSql160Parser parser (SQL Server 2022)
    $parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql160Parser($true)

    # create an ParseError collection for any errors returned by parser
    $parseErrors = New-Object System.Collections.Generic.List[Microsoft.SqlServer.TransactSql.ScriptDom.ParseError]

    # create a StringReader for the script for parsing
    $stringReader = New-Object System.IO.StringReader($script)

    # parse the script
    $tSqlFragment = $parser.Parse($stringReader, [ref]$parseErrors)

    # raise an exception if any parsing errors occur
    if($parseErrors.Count -gt 0) {
        throw "$($parseErrors.Count) parsing error(s): $(($parseErrors | ConvertTo-Json))"
    }

    # show number of batches in this script
    Write-Host "Script starts at line $($tSqlFragment.StartLine), column $($tSqlFragment.StartColumn), length $($tSqlFragment.FragmentLength), and contains $($tSqlFragment.Batches.Count) batches"

    # iterate over batches in this script
    foreach ($batchFragment in $tSqlFragment.Batches) {

        # show starting line number of this batch
        Write-Host "`tBatch at line $($batchFragment.StartLine), column $($batchFragment.StartColumn), length $($batchFragment.FragmentLength), and contains $($batchFragment.Statements.Count) statements"

        # iterate over statements in this batch
        foreach ($statementFragment in $batchFragment.Statements) {

            # show statement starting line number and statement type
            Write-Host "`t`tStatement at line $($statementFragment.StartLine), column $($statementFragment.StartColumn), length $($statementFragment.FragmentLength) and is a $($statementFragment.GetType().Name)"

            # for SELECT statements, look for SELECT * expression and display warning if found
            if($statementFragment.GetType().Name -eq "SelectStatement") {
                foreach($selectElementFragment in $statementFragment.QueryExpression.SelectElements) {
                    if($selectElementFragment.GetType().Name -eq "SelectStarExpression") {
                        Write-Host "`t`tWARNING: 'SELECT *' found at line $($selectElementFragment.StartLine), column $($selectElementFragment.StartColumn), length $($selectElementFragment.FragmentLength)" -ForegroundColor Yellow
                    }
                }
                
            }
        }
    }
}
catch {
    throw
}

Output:

Script starts at line 1, column 1, length 90, and contains 1 batches
	Batch at line 1, column 1, length 90, and contains 2 statements
		Statement at line 1, column 1, length 28 and is a SelectStatement
		WARNING: 'SELECT *' found at line 1, column 8, length 1
		Statement at line 2, column 1, length 60 and is a IfStatement

You might have noticed the second SELECT * statement in the script, which is conditional, was not identified. This is because the code examined only SELECT statements that are immediate children of a batch and the unidentified SELECT statement is a child of an IfStatement fragment (grandchild of batch). The missed statement could have been identified by adding code to evaluate IfStatement fragments too, examining its ThenStatement property and descendants to see if a SelectStatement with a SelectStarExpression exists somewhere under the DOM tree. But as you can imagine, the code would become quite unwieldly and one would also need to also need to examine the ElseStatement property of IfStatement fragments, consider nested constructs, BEGIN/END blocks, etc., where a SELECT * might occur.

Navigating the T-SQL DOM with the above technique may be useful for some specialized use cases but too cumbersome with scripts containing statements of varying types and complex constructs. A more robust solution for this, and most other code analysis needs, is a visitor. A visitor class allows one to more easily process desired TSqlFragment types regardless of where they occur in the script while ignoring others. The remainder of the article focuses on fragment visitors.

Fragment Visitors

A custom fragment visitor class allows one to use a visitor pattern to more easily process relevant T-SQL fragment types with concise code while ignoring fragment types unneeded for the task at hand. A visitor class can be created in any .NET programming language that supports object-oriented inheritance, such as C#, VB.NET, and PowerShell version 5 or later. The class simply derives from the T-SQL Script DOM library TSqlConcreteFragmentVisitor type (or TSqlFragmentVisitor for specialized needs) and overrides the Visit base class method for fragment type(s) of interest. The base class includes plumbing to invoke the overridden Visit methods for the accepted fragment and its descendants without verbose code.

A custom visitor is used by invoking the Accept method on a fragment instance with the custom visitor object as the argument. Accept invokes the visitor’s Visit method for the current fragment type, which will be the overridden method if defined by the custom visitor or the base class. Each child of the fragment is then visited similarly, invoking the overridden Visit method for the fragment’s type if one exists. Consequently, Accept visits not only the current fragment, but all of its descendants and executes the overridden Visit method for each fragment along the way. This allows one to invoke Accept on the TSqlScript fragment returned by parse to visit all overridden Visit methods of the custom visitor class.

Be aware of a Catch-22 with PowerShell visitors. The script with the visitor class definition will not compile until the script DOM assembly is loaded and one cannot execute the Add-Type command to load the assembly unless the script compiles. The work-around is to execute the Add-Type command in a separate wrapper script before invoking visitor class scripts, or load the assembly manually in an interactive session beforehand. This is described in more detail later in the PowerShell Considerations section. The visitor examples that follow assume this code snippet was already run in the Powershell session to load the assembly:

# Get Microsoft.SqlServer.TransactSql.ScriptDom NuGet package and Add-Type
try {

    # Create trusted NuGet package source, if needed
    $packageSource = Get-PackageSource | where { ($_.Location -EQ "https://www.nuget.org/api/v2") -and ($_.ProviderName -eq "NuGet") -and ($_.IsTrusted -eq $true) }
    if($packageSource -eq $null) {
        Register-PackageSource NuGetV2 https://www.nuget.org/api/v2 -ProviderName NuGet -Trusted
    }

    # Install package, if needed.
    $dacFxPackage = Install-Package Microsoft.SqlServer.TransactSql.ScriptDom -Source ($packageSource.Name) -Scope CurrentUser
    # Get package
    $dacFxPackage = Get-Package -Name Microsoft.SqlServer.TransactSql.ScriptDom

    # Load Microsoft.SqlServer.TransactSql.ScriptDom.dll into app domain for use in PS scripts
    $packageFolderPath = [System.IO.Path]::GetDirectoryName($dacFxPackage.Source)
    Add-Type -LiteralPath "$packageFolderPath\lib\net462\Microsoft.SqlServer.TransactSql.ScriptDom.dll"

}
catch {

    throw

}

Here’s a PowerShell example that identifies SelectStarExpression fragments anywhere in a T-SQL script using a visitor class deriving from TSqlConcreateFragmentVisitor.

# this is the script to parse
$script = @"
SELECT * FROM dbo.SomeTable;
IF 1 = 1
BEGIN
    SELECT * FROM dbo.SomeOtherTable;
END; 
"@

try {


    class MyVisitor: Microsoft.SqlServer.TransactSql.ScriptDom.TSqlConcreteFragmentVisitor {

        [void]Visit ([Microsoft.SqlServer.TransactSql.ScriptDom.SelectStarExpression] $fragment) {
            Write-Host "`t`tWARNING: 'SELECT *' found at line $($fragment.StartLine), column $($fragment.StartColumn), length $($fragment.FragmentLength)" -ForegroundColor Yellow
        }
    
    }

    $parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql160Parser($true)
    $parseErrors = New-Object System.Collections.Generic.List[Microsoft.SqlServer.TransactSql.ScriptDom.ParseError]
    $stringReader = New-Object System.IO.StringReader($script)

    $tSqlFragment = $parser.Parse($stringReader, [ref]$parseErrors)
    if($parseErrors.Count -gt 0) {
        throw "$($parseErrors.Count) parsing error(s): $(($parseErrors | ConvertTo-Json))"
    }

    $visitor = [MyVisitor]::new()

    $tSqlFragment.Accept($visitor)

}
catch {
    throw
}

Script output:

WARNING: 'SELECT *' found at line 1, column 8, length 1
WARNING: 'SELECT *' found at line 4, column 12, length 1

The code invoked Accept on only the root TSqlScript fragment returned by Parse. The fragment Accept method did the rest of the work by visiting the fragment and its descendants, invoking the overridden Visit(SelectStarExpression) method whenever a SelectStarExpression fragment was encountered. This behavior can be observed in more detail by also overriding Visit(TSqlFragment) to display information about every fragment visited:

$script = @"
SELECT * FROM dbo.SomeTable;
IF 1 = 1
BEGIN
    SELECT * FROM dbo.SomeOtherTable;
END; 
"@

try {

    class MyVisitor: Microsoft.SqlServer.TransactSql.ScriptDom.TSqlConcreteFragmentVisitor {

        [void]Visit ([Microsoft.SqlServer.TransactSql.ScriptDom.SelectStarExpression] $fragment) {
            Write-Host "WARNING: 'SELECT *' found at line $($fragment.StartLine), column $($fragment.StartColumn), length $($fragment.FragmentLength)" -ForegroundColor Yellow
        }

        [void]Visit ([Microsoft.SqlServer.TransactSql.ScriptDom.TSqlFragment] $fragment) {
            Write-Host "$($fragment.GetType().Name) found at line $($fragment.StartLine), column $($fragment.StartColumn), length $($fragment.FragmentLength)"
        }
    
    }

    $parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql160Parser($true)
    $parseErrors = New-Object System.Collections.Generic.List[Microsoft.SqlServer.TransactSql.ScriptDom.ParseError]
    $stringReader = New-Object System.IO.StringReader($script)

    $frament = $parser.Parse($stringReader, [ref]$parseErrors)
    if($parseErrors.Count -gt 0) {
        throw "$($parseErrors.Count) parsing error(s): $(($parseErrors | ConvertTo-Json))"
    }

    $visitor = [MyVisitor]::new()

    $frament.Accept($visitor)

}
catch {

    throw

}

Output showing every visited fragment along with warnings:

TSqlScript found at line 1, column 1, length 91
TSqlBatch found at line 1, column 1, length 90
SelectStatement found at line 1, column 1, length 28
QuerySpecification found at line 1, column 1, length 27
WARNING: 'SELECT *' found at line 1, column 8, length 1
FromClause found at line 1, column 10, length 18
NamedTableReference found at line 1, column 15, length 13
SchemaObjectName found at line 1, column 15, length 13
Identifier found at line 1, column 15, length 3
Identifier found at line 1, column 19, length 9
IfStatement found at line 2, column 1, length 60
BooleanComparisonExpression found at line 2, column 4, length 5
IntegerLiteral found at line 2, column 4, length 1
IntegerLiteral found at line 2, column 8, length 1
BeginEndBlockStatement found at line 3, column 1, length 50
StatementList found at line -1, column -1, length -1
SelectStatement found at line 4, column 5, length 33
QuerySpecification found at line 4, column 5, length 32
WARNING: 'SELECT *' found at line 4, column 12, length 1
FromClause found at line 4, column 14, length 23
NamedTableReference found at line 4, column 19, length 18
SchemaObjectName found at line 4, column 19, length 18
Identifier found at line 4, column 19, length 3
Identifier found at line 4, column 23, length 14

TSqlFragmentVisitor versus TSqlConcreteFragmentVisitor

I mentioned earlier a custom visitor class derives from either TSqlConcreteFragmentVisitor or TSqlFragmentVisitor. The difference between these base classes is that TSqlFragmentVisitor includes a Visit method for each of the nearly one thousand different TSqlFragment types (984 as of this writing to be precise), including the abstract base types fragments types derive from. In contrast. TSqlConcreteFragmentVisitor, includes an overridable Visit method for only TSqlFragment and concrete derived types (currently 856). The implication is that the same fragment can be visited more than once when TSqlFragmentVisitor is used as a visitor base class, once as TSqlFragment, again as base type(s) when applicable, and finally as the concreate derived type.

This example shows a trivial select statement visited as TSqlFragment, TSqlStatement, and SelectStatement when a visitor derives from TSqlFragmentVisitor:

$script = "SELECT 1;"

try {

class MyVisitor: Microsoft.SqlServer.TransactSql.ScriptDom.TSqlFragmentVisitor {

    [void]Visit ([Microsoft.SqlServer.TransactSql.ScriptDom.TSqlFragment] $fragment) {
        Write-Host "$($fragment.GetType().Name) visited as TSqlFragment at line $($fragment.StartLine), column $($fragment.StartColumn), length $($fragment.FragmentLength)"
    }

    [void]Visit ([Microsoft.SqlServer.TransactSql.ScriptDom.TSqlStatement] $fragment) {
        Write-Host "$($fragment.GetType().Name) visited as TSqlStatement at line $($fragment.StartLine), column $($fragment.StartColumn), length $($fragment.FragmentLength)"
    }

    [void]Visit ([Microsoft.SqlServer.TransactSql.ScriptDom.SelectStatement] $fragment) {
        Write-Host "$($fragment.GetType().Name) visited as SelectStatement at line $($fragment.StartLine), column $($fragment.StartColumn), length $($fragment.FragmentLength)"
    }
    
}

$parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql160Parser($true)
$parseErrors = New-Object System.Collections.Generic.List[Microsoft.SqlServer.TransactSql.ScriptDom.ParseError]
$stringReader = New-Object System.IO.StringReader($script)

$frament = $parser.Parse($stringReader, [ref]$parseErrors)
if($parseErrors.Count -gt 0) {
    throw "$($parseErrors.Count) parsing error(s): $(($parseErrors | ConvertTo-Json))"
}

$visitor = [MyVisitor]::new()

$frament.Accept($visitor)

}
catch {
    throw
}

Output showing all visited fragments:

TSqlScript visited as TSqlFragment at line 1, column 1, length 9
TSqlBatch visited as TSqlFragment at line 1, column 1, length 9
SelectStatement visited as TSqlStatement at line 1, column 1, length 9
SelectStatement visited as TSqlFragment at line 1, column 1, length 9
SelectStatement visited as SelectStatement at line 1, column 1, length 9
QuerySpecification visited as TSqlFragment at line 1, column 1, length 8
SelectScalarExpression visited as TSqlFragment at line 1, column 8, length 1
IntegerLiteral visited as TSqlFragment at line 1, column 8, length 1

One typically derives from TSqlConcreteFragmentVisitor and overrides concrete Visit methods of only relevant concrete types for most use cases. This ensures a fragment is visited only once as the relevant concrete type (plus TSqlFragment if desired). Below is the output from the same script as above except using TSqlConcreteFragmentVisitor as the visitor’s base class showing the select statement fragment is visited only as the concrete SelectStatement type but not as the TSqlStatement abstract type:

TSqlScript visited as TSqlFragment at line 1, column 1, length 9
TSqlBatch visited as TSqlFragment at line 1, column 1, length 9
SelectStatement visited as SelectStatement at line 1, column 1, length 9
QuerySpecification visited as TSqlFragment at line 1, column 1, length 8
SelectScalarExpression visited as TSqlFragment at line 1, column 8, length 1
IntegerLiteral visited as TSqlFragment at line 1, column 8, length 1

Compiled languages like C# provide more type safety and will result in a compilation error if one tries to override a Visit method of a TSqlConcreteFragmentVisitor sealed base fragment type. PowerShell is more forgiving and will silently ignore the override of a TSqlConcreteFragmentVisitor sealed Visit method (i.e. TSqlStatement in this example).

PowerShell Considerations

A PowerShell class that derives from a type defined in an external assembly will not compile unless the assembly with the base type has already been loaded. Consequently, the Microsoft.SqlServer.TransactSql.ScriptDom.dll assembly must first be loaded before a script with a visitor class can be run.

With an interactive PowerShell session, one can run the Add-Type command manually from before running scripts with visitor classes as shown earlier in the Fragment Visitors topic. This only needs to be done once and the assembly will be available in the app domain for the duration of the PowerShell session.

Unattended PowerShell script files with visitor classes are more problematic since one cannot execute Add-Type interactively. As a work-around, a wrapper script can be employed to execute the Add-Type command and then “dot source” the script file with the visitor class. For example, after saving any of the previous visitor examples to a file named “VisitorExample.ps1”, the wrapper script code below can be saved to a file named “Execute-VisitorExample.ps1” in the same directory:

# Get Microsoft.SqlServer.TransactSql.ScriptDom NuGet package, Add-Type, and invoke script DOM visitor script
try {

    # Create trusted NuGet package source, if needed
    $packageSource = Get-PackageSource | where { ($_.Location -EQ "https://www.nuget.org/api/v2") -and ($_.ProviderName -eq "NuGet") -and ($_.IsTrusted -eq $true) }
    if($packageSource -eq $null) {
        Register-PackageSource NuGetV2 https://www.nuget.org/api/v2 -ProviderName NuGet -Trusted
    }

    # Install package, if needed.
    $dacFxPackage = Install-Package Microsoft.SqlServer.TransactSql.ScriptDom -Source ($packageSource.Name) -Scope CurrentUser
    # Get package
    $dacFxPackage = Get-Package -Name Microsoft.SqlServer.TransactSql.ScriptDom

    # Load Microsoft.SqlServer.TransactSql.ScriptDom.dll into app domain for use in PS scripts
    $packageFolderPath = [System.IO.Path]::GetDirectoryName($dacFxPackage.Source)
    Add-Type -LiteralPath "$packageFolderPath\lib\net462\Microsoft.SqlServer.TransactSql.ScriptDom.dll"

    # Use "dot source" to Invoke scripts 
    .\"VisitorExample.ps1"
}
catch {

    throw

}

The wrapper script can then be run to execute visitor scripts:

powershell -ExecutionPolicy RemoteSigned -File "Execute-VisitorExample.ps1"

Summary

This introductory article showed T-SQL script DOM library methods to parse and analyze T-SQL code. I hope this helped you gain an understanding of the library’s capabilities and get started for various needs. I’ll follow up with articles discussing more complex use cases, including leveraging visitors to ensure adherence to naming conventions, identify problem T-SQL code constructs, T-SQL code formatting, and other custom tooling.

Improper SET Option Errors

Posted on January 20, 2020 by dguzman

SQL Server backwards compatibility SET options are hidden land mines that explode when one tries to use a feature that requires proper session settings, such as a filtered index, indexed view, etc. The QUOTED_IDENTIFIER, ANSI_NULLS, and ANSI_PADDING settings are especially problematic. These are persisted as meta-data with view, stored procedure, function, trigger, and column definitions and, since persisted settings override current session settings, a nasty runtime error like “…failed because the following SET options have incorrect settings: ‘QUOTED_IDENTIFIER…’” occurs when a persisted setting is set to OFF even though the current session settings are set properly. “Sticky” OFF settings are a common problem and often accidental rather than because the setting is required by T-SQL code that doesn’t follow ISO SQL standards.

This article reviews QUOTED_IDENTIFIER, ANSI_NULLS, and ANSI_PADDING settings, settings persisted as database object meta-data, and considerations with SQL Server tools. I’ll discuss how to identify and remediate problem objects and considerations to ensure proper ON settings going forward.

Background
Microsoft SQL Server, along with its Sybase ancestor, are a bit long in the tooth nowadays. The original code base was developed decades ago before ISO (previously ANSI) SQL standards were formalized. As SQL Server evolved to respect ISO SQL standards, SET options were introduced to avoid breaking existing applications that expected non-ISO behavior. This allows legacy T-SQL code to run on newer SQL Server versions without changes while supporting ISO SQL standards for new development. The SQL Server product team goes through great effort, albeit sometimes to a fault, to provide backwards compatibility as to not block upgrades.

Some SQL Server features require ISO SQL standard settings
plus other session settings to be set properly in order to be used and avoid runtime errors. Features that require these settings include:

Filtered Indexes
Indexed Views
Indexes on computed columns
XML indexes
Query notifications (a.k.a. SqlDependency)

The below session settings, sometimes called the “magic 7 settings”, must be set properly when using these features. Otherwise, the above
indexes will not be used or a runtime error will be raised when data are
modified:

QUOTED_IDENTIFIER ON
ANSI_NULLS ON
ANSI_PADDING ON
ANSI_WARNINGS ON
ARITHABORT ON
CONCAT_NULL_YIELDS_NULL ON
NUMERIC_ROUNDABORT OFF

Modern Microsoft SQL Server drivers (e.g. ODBC, OLE DB, SqlClient, JDBC) connect with all these session settings properly set by default. The lone exception is that ARITHABORT OFF is set by default in the context of a database in the SQL Server 2000 compatibility level (80). ARITHABORT OFF will not be an issue for most since the unsupported SQL Server 2008 version was the last to include the SQL Server 2000 database compatibility level. ARITHABORT will be set ON by default in SQL Server 2012 and later versions.

QUOTED_IDENTIFIER, ANSI_NULLS, and ANSI_PADDING Session Settings
Given these session settings are initially set properly to ON by default, an OFF setting is a result of one or more of the following:

• An explict T-SQL SET OFF statement after the connection is made
• Non-default OFF setting specified by API connection (e.g. connection string keyword, DSN property, or set in app code programmatically)
• Overridden by a persisted object meta-data setting

Improper SET OFF statements most often leak into T-SQL code due to inattention to detail. The same is true for OFF API connection settings and persisted meta-data settings. T-SQL code that adheres to ISO SQL standard single-quotes literal enclosures and IS NULL/IS NOT NULL comparison operators can run with these settings either ON or OFF with the same outcome. Consequently, there’s no reason not to use the proper ON default settings when these ISO SQL practices are followed. One can simply remove unintentional T-SQL SET OFF statements, fix API settings, and correct persisted meta-data (discussed later) in order to future proof code so that it can run with or without features that require the proper settings.

Legacy T-SQL code that requires OFF settings due to non-ISO compliant constructs needs to be remediated to leverage the aforementioned features before changing settings to ON. Below is a summary of these settings and considerations for changing to use the ON setting.

QUOTED_IDENTIFIER
QUOTED_IDENTIFIER OFF is required only when T-SQL code uses double-quotes instead of single quotes to enclose literals. It’s typically a trivial effort to fix non-conformant code to use single quotes instead of double quotes with a search/replace. A caveat is that single quotes embedded within literals must to be escaped with two consecutive single quotes. The code will then run regardless of the QUOTED_IDENTIFIER setting and follow ISO SQL standards. Only minimal testing is usually needed after remediation since the setting is evaluated at parse time; parsing errors will occur immediately if double-quotes are used to enclose literals. An exception is dynamic SQL where errors won’t be found until one tries to execute an invalid dynamic SQL statement containing double-quote literal enclosures.

The current session QUOTED_IDENTIFER setting is persisted as meta-data when a view, stored procedure, function, or trigger is created or altered. To change the persisted OFF setting to ON, recreate the object from a session with QUOTED_IDENTIFER and ANSI_NULLS ON. Be mindful to ensure the session setting is ON when executing DDL scripts to prevent improper settings going forward. See the considerations topic later in this article for gotchas with SQL Server tools.

ANSI_NULLS
ANSI_NULLS OFF has long been deprecated. The OFF setting allows code to test for NULL values using equality/inequality predicates instead of the ISO SQL standard “IS NULL” and “IS NOT NULL” operators. For example, the “ColumnName = NULL” will evaluate to TRUE instead of UNKNOWN with the ANSI_NULLS OFF setting. Such code should be changed to “ColumnName IS NULL” to follow ISO SQL standards and provide the same behavior regardless of the session ANSI_NULLS setting. Changes made for ANSI_NULLS compliance necessitate more extensive testing because runtime behavior changes rather than parse time errors like QUOTED_IDENTIFIER.

Like QUOTED_IDENTIFER, the current session ANSI_NULLS setting is persisted as meta-data when a view, stored procedure, function, or trigger is created or altered. Recreate the object from a session with QUOTED_IDENTIFER and ANSI_NULLS ON to change the persisted OFF setting to ON and take care to ensure the setting is ON when executing DDL scripts.

ANSI_PADDING
ANSI_PADDING OFF has also been deprecated for quite some time and the SQL Server documentation specifically calls out “ANSI_PADDING should always be set to on.” In summary, a column-level ANSI_PADDING OFF setting causes nullable fixed-length char(n) and binary(n) columns to behave like variable-length varchar(n) and varbinary(n) columns. Furthermore, SQL Server automatically trims trailing blank characters from character data and leading binary zeros from binary data and stores the values as variable length instead of storing the provided value as-is during inserts and updates. Varchar(n)/varbinary(n) columns with ANSI_PADDING OFF are similarly trimmed. Note that it is the persisted ANSI_NULLS column meta-data setting that determines the storage and trimming behavior, not the current session ANSI_PADDING setting. The session ANSI_PADDING must still be ON when using features that require proper settings.

The current session ANSI_PADDING setting is persisted as column-level meta data when tables are created and new columns added to an existing tables. This setting affects only char(n) NULL, binary(n) NULL, and varchar(n)/varbinary(n) columns regardless of nullability. The setting doesn’t apply to varchar(MAX) , varbinary(MAX), char(n) NOT NULL, binary(n) NOT NULL, and other data types.

Since SQL Server comparison operators ignore trailing blanks when comparing strings as well as leading binary zeros when comparing binary values, there isn’t usually an impact from a T-SQL perspective with changing ANSI_PADDING from OFF to ON (aside from storage when the provided values aren’t already trimmed). However, application code might consider training blanks and leading binary zeros, resulting in differences when comparing trimmed and non-trimmed values. Testing is needed depending on how data are used in the app code.

To change a persisted column ANSI_PADDING setting from OFF to ON, execute ALTER TABLE…ALTER COLUMN from an ANSI_PADDING ON session, specifying the same definition as the existing column. Note that this technique will only change ANSI_PADDING from OFF to ON. Altering an existing ANSI_PADDING ON column from an ANSI_PADDING OFF session will not change the persisted setting.

Considerations to Ensure Proper Settings
All “magic 7 settings” are set properly by default with current drivers so one might think ensuring proper settings is easy. This is largely true for application code but, sadly, not with SQL Server tools due to backward compatibility behavior and inattention to detail when using them.

SQL Server Agent uses ODBC (which sets all “magic 7 settings” properly) but then explicitly sets QUOTED_IDENTIFIER OFF after connecting for backwards compatibility. The implication is one needs to explicitly add a SET QUOTED_IDENTIFIER ON statement to T-SQL scripts executed by SQL Server Agent T-SQL job steps. This is optional when executing stored procedures because the sticky QUOTED_IDENTIFIER ON setting will override the session setting.

SQLCMD similarly uses ODBC and explicitly sets QUOTED_IDENTIFIER OFF. This is a common cause of inadvertent persisted QUOTED_IDENTIFIER OFF leaking into databases as meta-data when SQLCMD is used to deploy database changes. One must specify the SQLCMD -I (uppercase eye) argument to opt-in for QUOTED_IDENTIFIER ON. Additionally, deployment scripts should either explicitly include SET QUOTED_IDENTIFIER ON, SET ANSI_NULLS ON, and SET ANSI_PADDING ON statements or omit these set statements entirely so session settings are used. Avoid including SET OFF statements in scripts for these options.

BCP also uses ODBC, but as you might have guessed, explicitly sets QUOTED_IDENTIFIER OFF too. One needs to opt-in for QUOTED_IDENTIFIER ON by specify the -q (lower case queue) BCP argument to avoid runtime errors.

SSMS, which uses SqlClient, is nicer in that it sets all options properly by default and doesn’t turn set QUOTED_IDENTIFIER OFF behind your back. But be aware that SSMS will honor customized SET options specified for the current window (Query–>Query Options–>ANSI) and new windows (Tools–>Options–>Query Execution–>SQL Server–>ANSI). Make sure the Magic 7 settings are properly set in SSMS options.

Be mindful that SET statements executed in an SSMS query window change the session settings for the lifetime of the query window connection. Take care when executing DDL scripts in a reused query window and, when in doubt, check current session settings using DBCC USEROPTIONS to verify proper settings.
SSMS (and SMO) scripting tools have a terrible habit of including an extraneous SET ANSI_PADDING OFF at the end of CREATE TABLE scripts. Either remove the statement after scripting or set the ANSI PADDING generation option to False (Tools–>Options–>SQL Server Object Explorer–>Scripting–>Generate SET ANSI PADDING commands). This will help avoid unintentionally creating ANSI_PADDING OFF columns.

How to Identify Improper Persisted Settings
It’s common to find wrong settings in existing databases for the reasons mentioned earlier. The catalog view queries below will identify objects with problem persisted OFF settings.

--stored procedures, views, functions, triggers with QUOTED_IDENTIFIER or ANSI_NULLS OFF
 SELECT
       OBJECT_SCHEMA_NAME(o.object_id) AS SchemaName
     , OBJECT_NAME(o.object_id) AS ObjectName
     , o.type_desc AS ObjectType
 FROM sys.objects AS o
 WHERE
     0 IN(
           OBJECTPROPERTY(o.object_id, 'ExecIsQuotedIdentOn')
         , OBJECTPROPERTY(o.object_id, 'ExecIsAnsiNullsOn')
 )
 ORDER BY
       SchemaName
     , ObjectName;
 --columns with ANSI_PADDING OFF
 SELECT
       OBJECT_SCHEMA_NAME(t.object_id) AS SchemaName
     , OBJECT_NAME(t.object_id) AS ObjectName
     , c.name AS ColumnName
 FROM sys.tables AS t
 JOIN sys.columns AS c ON c.object_id = t.object_id
 JOIN sys.types AS ty ON ty.system_type_id = c.system_type_id AND ty.user_type_id = c.user_type_id
 WHERE
     c.is_ansi_padded = 0
     AND (
         (ty.name IN ('varbinary','varchar') AND c.max_length <> -1)
         OR (ty.name IN ('binary','char') AND c.is_nullable = 1)
 );

Summary
Attention to connection settings will facilitate using SQL Server features.

Performance testing with DBCC DROPCLEANBUFFERS

Posted on July 29, 2018 by dguzman

DBCC DROPCLEANBUFFERS is a common practice when unit testing SQL Server performance on an isolated test instance. This allows one to evaluate different candidates for query, stored procedure, and index tuning based on execution times in a worst-case cold buffer cache scenario and provides better test repeatability by leveling the playing field before each test. However, clearing cache in this way has considerations one should be aware of.

An important detail sometimes overlooked is that one must first execute a CHECKPOINT command in the context of the database(s) to be tested before executing DBCC DROPCLEANBUFFERS. DBCC DROPCLEANBUFFERS only frees pages that are not dirty (cached version same as on disk version) so modified pages will remain in cache when CHECKPOINT isn’t first executed. Overlooking the CHECKPOINT can result in non-repeatable test timings. One should always run CHECKPOINT before DBCC DROPCLEANBUFFERS.

One can make the argument that DBCC DROPCLEANBUFFERS might not be particularly valuable for testing. First, the storage engine in SQL Server Enterprise Edition (or Developer Edition, which is often used when testing) behaves differently with a cold cache versus a warm one. With a warm cache, a page not already in cache (e.g. index seek by primary key) will be fetched from disk using a single 8K page IO request as one expects. However, when the cache isn’t fully warmed up (Buffer Manager’s Target Pages not yet met), the entire 64K extent (8 contiguous 8K pages) is read for the single page request regardless of whether the adjacent pages are actually needed by the query. This has the benefit of warming the cache much more quickly than would otherwise occur, but given that the normal steady state of a production SQL Server is a warm cache, testing with a cold cache isn’t a fair comparison of different plans. More data than normal will be transferred from storage so timings may not be indicative of actual performance.

The storage engine also behaves differently during scans when data are not already cached regardless of the SQL Server edition. During sequential scans, read-ahead prefetches multiple extents from storage at a time so that data is in cache by the time it is actually needed by the query. This greatly reduces the time needed for large scans because fewer IOPS are required and sequential access reduces costly seek time against spinning media. Furthermore, Enterprise and Developer editions perform read-ahead reads more aggressively than lesser editions, up to 4MB (500 pages) in a single scatter-gather IO in later SQL Server versions.

The implication with cold cache performance testing is that both full extent reads and read-ahead prefetches are much more likely to occur such that test timings of different execution plans are not fairly comparable. These timings will over emphasize hardware (storage) performance rather than query performance as intended. Given hardware differences on a test system and that cold cache is not the typical production state, cold cache testing isn’t a good indicator of query performance and resource usage one will experience in a production system.

I recommend using logical reads as a primary performance measure when evaluating query and index tuning candidates. Logical reads is a count of the number of pages touched by the query regardless of whether data was read from storage or already cached, making it a better comparison indicator of data access resource utilization. The number of logical reads can be determined by running the query or procedure with SET STATISTICS IO ON and will be consistent regardless of whether physical IO was needed or not. Query times may be used as a secondary measure by running the query more than once, discarding the results of first run, and taking the average of subsequent executions. This is not to say these logical read measurements and timings will predict actual production performance but will allow one to more accurately evaluate resource usage of different execution plans.

AddWithValue is Evil

Posted on February 3, 2018 by dguzman

AddWithValue is the root cause of many SQL Server performance and concurrency problems. Albeit the SqlParameterCollection.AddWithValue method is a slightly more convenient way to add parameters and values to a SqlCommand, the implications are insidious because the high costs of SQL Server resource utilization is not apparent during development. It’s often not until the application is in production and under load that it is realized that queries are slow due to full scans, excessive resource usage, lock escalation, blocking, cache bloat, and many deadlocks.

A best practice in SQL Server development has long been strongly-typed parameters. Not only does this practice thwart SQL injection (barring non-parameterized dynamic SQL statements), performance is generally improved due to caching of queries that differ only by values, avoiding the cost of compiling the query each time it is executed. Importantly, matching parameter definitions with referenced columns promotes efficiency by avoiding implicit conversions that can preclude efficient index use.

To follow best practices, avoid AddWithValue and instead add parameters and values using a method that allows specification of the desired SqlDbType along with the appropriate length, precision, and scale. Although slightly more verbose than AddWithValue, the performance benefits are huge and worth a few keystrokes.

The Problem with Addwithvalue

The nastiness with AddWithValue is that ADO.NET infers the parameter definition from the supplied object value. Parameters in SQL Server are inherently strongly-typed, including the SQL Server data type, length, precision, and scale. Types in .NET don’t always map precisely to SQL Server types, and are sometimes ambiguous, so AddWithValue has to makes guesses about the intended parameter type.

The guesses AddWithValue makes can have huge implications when wrong because SQL Server uses well-defined data type precedence rules when expressions involve unlike data types; the value with the lower precedence is implicitly converted to the higher type. The implicit conversion itself isn’t particularly costly but is a major performance concern when it is the column value rather than the parameter value must be converted, especially in a WHERE or JOIN clause predicate. The implicit column value conversion can prevent indexes on the column from being used with an index seek (i.e. non-sargable expression), resulting in a full scan of every row in the table or index.

Below are some of the most common pitfalls with AddWithValue.

NET Strings

Strings are arguably the single biggest problem with AddWithValue. AddWithValue infers SqlDbType.NVarChar when a string object is provided because strings in .NET are Unicode. The resultant SQL Server parameter type is nvarchar(n) or nvarchar(MAX), depending on the length of the string value provided. The type is nvarchar(n) when the string length is 4000 characters or less and nvarchar(MAX) when over 4000 characters.

Performance problems due to AddWithValue inferring nvarchar are quite common in the wild when the parameter value is compared to varchar columns in queries. Remember that nvarchar has a higher precedence than varchar, which can prevent indexes from being used efficiently. Although SQL Server can still coerce a sargable expression in this case if the indexed varchar columns has a Windows collation (but not a legacy SQL collation), it is still best to specify the proper varchar parameter type as described below.

Another issue with strings and AddWithValue is cache bloat and lack of reuse. AddWithValue uses the actual length of the supplied value as the nvarchar parameter length. This results in a separate cached plan for queries that differ only by parameter length. For example, you’ll end up with 20 different cached plans with 20 different @LastName parameter value lengths. Consider the number of plans is greatly exacerbated with many string parameters (e.g. @LastName, @FirstName, @Address) and is especially an issue with large parameterized IN clauses.

A better method to add string parameters is with the SqlParameterCollection.Add method. In addition to the desired name and data type, this overload allows one to specify the length, which should match the column maximum length (-1 for MAX types). The value can be easily assigned to the returned SqlParameter instance Value property in a one-liner:

command.Parameters.Add("@LastName", SqlDbType.VarChar, 30).Value = "Smith";
command.Parameters.Add("@GenderCode", SqlDbType.Char, 1).Value = "M";

.NET DateTime

AddWithValue infers SqlDbType.Datetime parameter type when a DateTime object value is provided. Although a .NET DateTime aligns closely with SQL Server datetime2(7), AddWithValue uses datetime for backwards compatibility.

The SQL Server datetime type has a fixed precision of 3 with accuracy to 1/300 fractional seconds so ADO.NET will round the value accordingly and sub-millisecond values will be lost, resulting in a value that ends with 0, 3, or 7 milliseconds. This behavior is compatible with a SQL Server datetime column value but loses accuracy and precision when used with a datetime2 column of precision 3 or greater. Values will be rounded to the lesser datetime precision and sub-millisecond values will be lost.

One can explicitly specify SqlDbType.DateTime2 to avoid loss of accuracy when working with the SQL Server datetime2 type. Note that the parameter Scale property is used to specify datetime precision in ADO.NET rather than the Precision property as one might expect. The default Scale is 7, the same as the default datetime2 precision in SQL Server. Although the value will be automatically be rounded when the target column precision is less, I recommend one match the column precision regardless.

This example uses Add instead of AddWithValue, specifying Scale of 3 to match a target column datetime2(3), which is more precise than datetime with 1 byte less storage.

command.Parameters.Add(new SqlParameter("@InsertTimestamp", SqlDbType.DateTime2) { Scale = 3, Value = DateTime.Now });

.NET Decimal

AddWithValue infers a SqlDbType.Decimal with the same precision and scale as the supplied object value. Varying precision and scale properties can bloat cache similarly to string lengths as described earlier so it’s best to explicitly specify the precision and scale matching the column.

Here’s an example that adds a decimal value without AddWithValue, specifying precision and scale (decimal(18,4)):

command.Parameters.Add( new SqlParameter("@InvoiceTotal", SqlDbType.Decimal) { Precision = 18, Scale = 4, Value = 1234.56m });

Summary

I hope this article helps you understand the importance of the strongly-typed parameter best practice. Note only will this help avoid performance and concurrency problems, the attention to detail will save compute costs in the cloud.

SQL Server System Table Statistics Update

Posted on December 3, 2017 by dguzman

I’ve seen a few questions in the SQL Server forums recently reporting slow performance of DMV queries, such as those in queries executed by SQL Server Data Tools. This can result in query timeouts and is particularly an issue with databases that contain many objects and/or columns. Like many query performance problems, the root cause may be stale statistics. The last statistics update date on system table indexes can be gleaned by this query.

--identify last stats update date
SELECT
       QUOTENAME(OBJECT_SCHEMA_NAME(i.object_id)) + N'.' + QUOTENAME(OBJECT_NAME(i.object_id)) + N'.' + QUOTENAME(name) AS index_name
     , STATS_DATE(i.object_id, i.index_id) AS stats_date
FROM sys.indexes AS I
JOIN sys.partitions AS p ON
     p.object_id = i.object_id
     AND p.index_id = i.index_id
WHERE 
     OBJECTPROPERTYEX(i.object_id, 'IsSystemTable') = 1
     AND i.index_id > 0
     AND p.rows > 0;

One can often simply update stats on the underlying system tables to improve catalog view query performance. Below is a script for SQL Server 2017+ and Azure SQL Database that generates and executes DDL to update stats on all non-empty system tables.

--system table stats update for SQL 2017 and Azure SQL Database
DECLARE @SQL nvarchar(MAX) =
(
     SELECT
         STRING_AGG(
               N'UPDATE STATISTICS ' 
             + QUOTENAME(OBJECT_SCHEMA_NAME(i.object_id))
             + N'.'
             + QUOTENAME(OBJECT_NAME(i.object_id))
         ,';')
FROM sys.indexes AS I
JOIN sys.partitions AS p ON
     p.object_id = i.object_id
     AND p.index_id = i.index_id
WHERE 
     OBJECTPROPERTYEX(i.object_id, 'IsSystemTable') = 1
     AND i.index_id > 0
     AND p.rows > 0
);
EXEC sp_executesql @SQL;
GO

Similarly, the script below will update system table stats for SQL Server 2016 and earlier versions. The only difference is the XML path technique for aggregate string concatenation because STRING_AGG() isn’t available in older SQL Server versions.

--generate script SQL 2016 and earlier using XML PATH
DECLARE @SQL nvarchar(MAX) =
(
     SELECT 
       N'UPDATE STATISTICS ' 
     + QUOTENAME(OBJECT_SCHEMA_NAME(i.object_id)) 
     + N'.' 
     + QUOTENAME(OBJECT_NAME(i.object_id)) 
     + N';'
FROM sys.indexes AS I
JOIN sys.partitions AS p ON
     p.object_id = i.object_id
AND p.index_id = i.index_id
WHERE
     OBJECTPROPERTYEX(i.object_id, 'IsSystemTable') = 1
     AND i.index_id > 0
     AND p.rows > 0
FOR XML PATH(''), TYPE).value('.','nvarchar(MAX)'
);
EXEC sp_executesql @SQL;
GO

On a side note, I’ll mention the “SELECT @SQL = @SQL + …” technique is sometimes used for set-based aggregate string concatenation instead of XML PATH or STRING_AGG(). However, that should be avoided because it is nether supported nor reliable. A quote from Microsoft in response to a bug report on the now retired Connect feedback site: “Even without ORDER BY, we do not guarantee that @var = @var + will produce the concatenated value for any statement that affects multiple rows. The right-hand side of the expression can be evaluated either once or multiple times during query execution and the behavior as I said is plan dependent.“

Bulk Load Batch Size Considerations in SQL Server 2016

Posted on November 11, 2017 by dguzman

Bulk load has long been the fastest way to mass insert rows into a SQL Server table, providing orders of magnitude better performance compared to traditional INSERTs. SQL Server database engine bulk load capabilities are leveraged by T-SQL BULK INSERT, INSERT…SELECT, and MERGE statements as well as by SQL Server client APIs like ODBC, OLE DB, ADO.NET, and JDBC. SQL Server tools like BCP and components like SSIS leverage these client APIs to optimize insert performance.

SQL Server 2016 and later improves performance further by turning on bulk load context and minimal logging by default when bulk loading into SIMPLE and BULK LOGGED recovery model databases, which previously required turning on trace flags as detailed in this blog post by Parikshit Savjani of the MSSQL Tiger team. That post also includes links to other great resources that thoroughly cover minimal logging and data loading performance, which I recommend you peruse if you use bulk load often. I won’t repeat all that information here but do want to call attention to the fact that these new bulk load optimizations can result in much more unused space when a small batch size is used compared to SQL Server 2014 and older versions.

Bulk Load Batch Size Implications
An important consideration in SQL 2016 and later with bulk load context and minimal logging is that each batch allocates new extent(s) (8 contiguous 8K pages, 64K) rather than using existing unused pages in existing extents. This improves concurrency and space allocation performance but possibly with the cost of significantly higher unused space than previous versions when few rows are loaded per batch.

The optimal batch size for bulk load operations involves trade-offs between throughput, data space utilization, concurrency, and transaction log space (FULL recovery model). To avoid excessive unused space with bulk load context, adjust the batch size when possible such that data are loaded in multiples of the 64K extent size.

There are cases when one must load with small batches, such as when loading small files. One solution to mitigate unused space in this scenario is to not use TABLOCK so that bulk load context isn’t used. This isn’t much of a performance concern with small batches anyway. Another method us to turn on trace flag 692 to disable the default bulk load context in SQL Server 2016, effectively reverting to pre-SQL 2016 behavior.

Don’t confuse batch size with the rows per batch hint. The rows per batch hint is used by the SQL Server optimizer to help optimize the load process because the number of rows that will be loaded is otherwise unknown, defaulting to an estimate of 10,000 rows.

Microsoft SQL Operations Studio Preview

Posted on November 11, 2017 by dguzman

Microsoft made the new cross-platform SQL Operations Studio (SOS) tool available on Github this week as a free open-source project. This SOS preview allows one to develop and manage SQL Server and Azure SQL Database from Windows, Linux, and macOS. The current preview can be downloaded from the SOS portal page, which also contains links to impressive quick start guides, how-to, and tutorials. I encourage you to try out the preview and improve it by reporting issues and offering suggestions.

If you are a developer, consider contributing to this project on Github. SOS is built on the Electron framework, which leverages JavaScript, HTML, and Node.js technologies to build rich cross-platform desktop applications. This is the same stack that the popular VS Code IDE employs so it’s not surprising SOS has a similar look and feel.

SOS is yet another indicator of a significant culture shift at Microsoft. If you asked me just a few years ago, I would have said SQL Server would run Linux when pigs fly. Nowadays, SQL Server on Linux is reality. Microsoft now embraces open-source and cross-platform technologies as part of the eco system and welcomes community contributions to the tooling that makes jobs easier for both developers and DBAs.

The release of SOS does not mean to suggest that SSMS (also free but not open-source) is deprecated. The SOS FAQ specifically calls out that “investments in flagship Windows tools (SSMS, SSDT, PowerShell) will continue in addition to the next generation of multi-OS and multi-DB CLI and GUI tools. The goal is to offer customers the choice of using the tools they want on the platforms of their choice for their scenarios.”

Choices are good, IMHO, because there is no one-size fits all solution that will keep everyone happy. I’m also glad the see multi-DB mentioned as a next generation tool direction because, like many data folks, I work with DBMS products in addition to SQL Server and Azure SQL Database. I don’t really expect a single tool to fulfill all my needs but the less I need to jump between tools for common tasks, the better.

What I like most about SOS is its easy extensibility. Dashboard and insight widgets for server and database views are easily built and customized for one’s particular needs, allowing you to automatically run favorite DMV queries and show results in graph or tabular form. Code snippets are very easy to create and use.

Visit the SOS portal page to see the power of SOS and try it out yourself.

The Curious Case of Undetected SQL Exceptions

Posted on February 21, 2017 by dguzman

Undetected database errors are insidious. It can be really bad when an error gets dropped on the floor, resulting in incomplete or wrong results. Consider that this simple SELECT query returns an empty result set instead of raising a SqlException for the divide by zero error:

string sqlBatchText = @"
 BEGIN TRY

  SELECT 1/0 AS DivideByZeroError;

 END TRY
 BEGIN CATCH
  THROW;
 END CATCH;
";
int rowCount = 0;
try
{
 using (SqlConnection connection = new SqlConnection(@"Data Source=.;Initial Catalog=tempdb;Integrated Security=SSPI"))
 using (SqlCommand command = new SqlCommand(sqlBatchText, connection))
 {
  connection.Open();
  using (SqlDataReader reader = command.ExecuteReader())
  {
   while (reader.Read()) { ++rowCount; };
  }
  Console.WriteLine("{0} rows read", rowCount);
 }
}
catch
{
 throw;
}

One generally assumes SQL errors raised during batch execution will also raise a SQL exception in the application. However, there are cases involving multi-statement batches and stored procedures where errors that occur might not be raised in the client application, as the above example shows. These scenarios can be distilled as:

1) An error is caught by T-SQL TRY/CATCH while executing a row-returning statement.
2) An error occurs after a row-returning statement successfully executes.
3) An error occurs after a row count message is returned (depending on client API).

To ensure SQL exceptions are raised as expected, one must either code T-SQL to avoid these scenarios entirely or ensure the client application data access layer consumes all subsequent results returned by SQL Server even when only a single result set is expected. Row-returning statements include SELECT (not variable assignment), OUTPUT clause in an INSERT/UPDATE/DELETE/MERGE statement that returns rows to the client, as well as some specialized commands like RESTORE FILELISTONLY, DBCC commands with TABLE_RESULTS, etc.

Although not technically row-returning, some client APIs (ADO classic, JDBC, etc.) return row count messages (TDS protocol DONE_IN_PROC) as empty result sets with no rows or columns. This is one reason the common practice is to include SET NOCOUNT ON in stored procedures and batches to suppress row counts returned by INSERT/UPDATE/DELETE/MERGE statements unless those are needed by application code. Otherwise, the client app must consume those results before the SQL exception is raised.

Below is a C# ADO.NET example of this defensive programming technique. Even though a single result set is expected, the code still invokes NextResult afterwards to process the entire result stream and ensure SQL exceptions are raised in the app when errors occur.

string sqlBatchText = @"
    BEGIN TRY
        SELECT 1/0 AS DivideByZeroError;
    END TRY
    BEGIN CATCH
        THROW;
    END CATCH;
";
int rowCount = 0;
try
{
    using (SqlConnection connection = new SqlConnection(@"Data Source=.;Initial Catalog=tempdb;Integrated Security=SSPI"))
    using (SqlCommand command = new SqlCommand(sqlBatchText, connection))
    {
        connection.Open();
        using (SqlDataReader reader = command.ExecuteReader())
        {
            while (reader.Read()) { ++rowCount; };
            //get any subsequent errors returned
            while (reader.NextResult()) { };
        }
        Console.WriteLine("{0} rows read", rowCount);
    }
}
catch
{
    throw;
}

This consideration applies to all SQL Server API and programming languages although I focus on C# and ADO.NET with System.Data.SqlClient (.NET Framework Data Provider for SQL Server) in this article. The specific methods for consuming all results will vary depending on the API (e.g. getMoreResults() in JDBC) but the basic concept is the same.

Regardless of the method one uses to execute SQL Server queries, ADO.NET uses a data reader to return command results even when higher-level objects (e.g. Dataset) or ORMs (e.g. Entity Framework) are used. The low-level ADO.NET command ExecuteReader method exposes the data reader whereas ExecuteScalar and ExecuteNonQuery do not expose the internal reader.

ExecuteScalar returns the first column of the first row returned as a scalar value but doesn’t call NextResult on the internal data reader to retrieve subsequent results. Consequently, errors may go undetected with ExecuteScalar. ExecuteScalar will not raise an exception if a T-SQL error occurs after the first row is returned. Also, if no rows are returned because the row-returning statement erred and the error was caught in T-SQL, ExecuteScalar returns a null object without raising an exception.

ExecuteNonQuery executes the entire batch of statements and returns the accumulated count of affected rows as a scalar value, discarding rows returned (if any). The returned value will be -1 if SET NOCOUNT ON is specified. Because ExecuteNonQuery internaly consumes all results in the process, errors will be raised without additional ADO.NET programming, albeit one doesn’t typically use ExecuteNonQuery to execute a batch that returns rows. Again, the ADO.NET error detection issue only applies to row-returning statements.

The remainder of this article discusses T-SQL error handling and ADO.NET defensive programming techniques in more detail and discusses techniques to avoid undetected database errors in ADO.NET.

T-SQL Error Handling Objectives
T-SQL and ADO.NET data access code must work in concert with one another to ensure SQL errors are detected in application code. The T-SQL constructs used in multi-statement batches can affect if and how when errors are reported by ADO.NET during batch execution. I’ll start by citing core T-SQL error handling objectives, which can be summarized as:

1) Ensure a multi-statement T-SQL batch doesn’t continue after an error occurs.
2) Rollback transaction after errors.
3) Raise error so that the client application is aware a problem occurred.

The T-SQL building blocks used to achieve these objectives are:
1) SET XACT_ABORT ON
2) Structured error handling (SEH) (a.k.a. TRY/CATCH)
3) Control-of-flow (e.g. IF @@ERROR GOTO ErrorHandler)

T-SQL Behavior Without SEH and XACT_ABORT ON
When a runtime error occurs with the XACT_ABORT session setting ON outside a TRY block, SQL Server will stop batch execution immediately, rollback the transaction (if any), and raise the error. Consequently, a single SET XACT_ABORT ON statement will meet all aforementioned error handling objectives without T-SQL procedural code. However, the XACT_ABORT setting is not considered when user-defined errors are raised with RAISERROR so control-of-flow statements are required to meet objectives #2 and #3 when RAISERROR is employed.

SET XACT_ABORT ON also rolls back open transactions following an attention event like an explicit cancel or query timeout, which would otherwise leave the transaction open. This is one reason why I strongly recommend using SET XACT_ABORT ON, especially in procs that include BEGIN TRAN, regardless of whether or not SEH is also used.

T-SQL Behavior Without SEH and XACT_ABORT OFF
When an error occurs with the SET XACT_ABORT session setting OFF and SEH is not used, SQL Server will raise the error immediately but, depending on the error and severity, batch execution might continue and the transaction not rolled back. The T-SQL batch must use control-of-flow statements after each statement to avoid continuing after errors and roll back the transaction (objectives #1 and #2).

T-SQL Behavior With T-SQL Structured Error Handling
When an error occurs during statement execution with a T-SQL structured error handler is in scope, the CATCH block error handler is entered, after marking the transaction uncommittable if SET XACT_ABORT is ON. SEH meets the first error handling objective by skipping subsequent statements in the TRY block after an error. It is the responsibility of the error handling code in the CATCH BLOCK to roll back the transaction if needed and raise the error. The simple T-SQL handler below achieves objectives #2 and #3 in Azure SQL Database and SQL Server 2012 and later:

BEGIN CATCH
 IF @@TRANCOUNT > 0 ROLLBACK;
 THROW;
END;

THROW is not available In SQL 2008 R2 and earlier so one must use RAISERROR instead in older versions. The error handler below provides similar functionality a THROW, although RAISERROR obfuscates the original error as a user error with message number 50000+.

BEGIN CATCH
 DECLARE
       @ErrorNumber int = ERROR_NUMBER()
     , @ErrorMessage nvarchar(2048) = ERROR_MESSAGE()
     , @ErrorSeverity int = ERROR_SEVERITY()
     , @ErrorState int = ERROR_STATE()
     , @ErrorLine int = ERROR_LINE();
 RAISERROR('Error %d caught at line %d: %s'
    , @ErrorSeverity
    , @ErrorState
    , @ErrorNumber
    , @ErrorLine
    , @ErrorMessage);
END CATCH;

Using T-SQL SEH when result sets are returned has implications on ADO.NET data access code to ensure database exceptions are raised after SQL errors. As illustrated with the ExecuteReader example at the beginning of this article, when an error is caught in T-SQL during a row-returning statement, ADO.NET will not raise the database exception immediately and instead return the partial or empty result set generated by the failed statement. The next data reader Read method call will return false after the error. The error raised in the CATCH block by the T-SQL THROW or RAISERROR statement is considered a separate result by ADO.NET and will not be raised as a database exception until NextResult is called.

Unless you have a specific reason to use SEH in row-returning batches, I suggest instead using SET XACT_ABORT ON alone as this will address core T-SQL error handling objectives and allow ADO.NET to detect errors without calling NextResult. However, as mentioned earlier, ADO.NET code will still need to call NextResult if the row-returning statement is not the last statement in the batch.

Summary
I hope this information will help you ensure database errors in multi-statement batches are detected. The interaction between T-SQL and ADO.NET isn’t as intuitive as it could be.

Deprecated SQL Server Data Access Technologies

Posted on February 4, 2017 by dguzman

I hope the warning excerpt below from the SQL Server Books Online is not a surprise to you (emphasis mine):

Warning:
SQL Server Native Client (SNAC) is not supported beyond SQL Server 2012. Avoid using SNAC in new development work, and plan to modify applications that currently use it. The Microsoft ODBC Driver for SQL Server provides native connectivity from Windows to Microsoft SQL Server and Microsoft Azure SQL Database.

EDIT 1:
The above warning was removed from the SQL Server documentation shortly after this article was first published. However, I still think it’s prudent to use the stand-alone ODBC driver instead of OLE DB for new native applications. The OLE DB story continues to evolve and I hope we see a roadmap and guidance from Microsoft soon.

EDIT 2:
The OLE DB story at last has clarity. Microsoft has decided to undeprecate (a new word) OLE DB and released a new OLE DB provider, Microsoft OLE DB Driver for SQL Server (MSOLEDBSQL). This new driver will maintained out-of-band with the SQL Server database engine lifecycle, following the same release model as other SQL Server client drivers, facilitating incremental enhancements. The initial release of MSOLEDBSQL does not support column level encryption with Always Encrypted so one must currently use SqlClient, ODBC, or JDBC when AE column encryption is required.

Keep in mind the legacy Microsoft OLE DB Provider for SQL Server (SQLOLEDB) remains deprecated. If you continue down the OLE DB path for data access in native code, use MSOLEDBSQL going forward.

Let me first mention that the Books Online should have made it clear that this warning applies only to SNAC data access by applications. The warning does not apply to Microsoft SQL Server features and products that use SNAC as part of the OLE DB stack, like Linked Servers, SSIS, and SSAS. Microsoft SQL Server 2012 Native Client is still installed and used as a component dependency to support these features in SQL Server 2014 and later versions, although that may change in the future.

More concerning is OLE DB in general. Even before SQL Server 2012 was released, Microsoft announced in August, 2011 that ODBC was the preferred technology for relational database access and that OLE DB was deprecated. Below is an excerpt from the Microsoft is Aligning with ODBC for Native Relational Data Access FAQ which should be alarming to folks who continue to use Microsoft OLE DB providers to access SQL Server 2014 and later databases (note Denali is the code name for SQL Server 2012 and, again, emphasis mine):

Question6: If I have an OLE DB application that I write for Denali, will it be supported on a post Denali version of SQL Server that is released during the life of Denali?

Answer: No, in fact we may explicitly block the OLE DB applications on post-Denali versions of SQL Server. It is recommended that you plan your migration soon to ODBC, if you want to start using newer versions of SQL Server as soon as they release.

The explicit block of OLE DB data access has yet not happened (as of this writing) but I think it’s prudent to heed the announcement and warning, at least for applications that currently use SQL Server 2014 and later versions as well as SQL Azure Database, or may need to use newer SQL versions in the future. Microsoft has a history of going out of their way to provide backwards compatibility in the SQL Server space but it’s been 3 SQL Server versions (soon to be 4 versions) since the deprecation announcement.

The bottom line is that native Windows applications (e.g. unmanaged C++, VB6, VBA, Classic ASP, etc.) should generally use ODBC and the latest Microsoft ODBC Driver for SQL Server. This stand-alone ODBC driver is a free download from https://docs.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server. Not only should SNAC not be used against SQL Server 2014 and later versions, the deprecated SQL Server OLE DB provider and ODBC driver that ship with Windows should also be avoided. This practice will facilitate more seamless upgrades to new versions of SQL Server and Azure SQL Database.

Preparing for the Future
The message is loud and clear that ODBC is the supported and preferred path for native applications going forward. The Data Access Technologies Road Map provides an overview and history of Microsoft data access technologies, which I recommend you peruse to ensure you are not inadvertently using deprecated or unsupported technologies for new development and, for existing applications, consider moving from legacy data access technologies to current ones when practical.

The current Microsoft ODBC Driver for SQL Server as of this writing is ODBC Driver 13 for SQL Server. Note that that both the 13.0 and 13.1 versions of this driver have the same “ODBC Driver 13 for SQL Server” display name listed under installed programs and ODBC Data Source Administrator. If installed, the driver will be listed under installed programs along with the corresponding driver version (when viewed detail mode). The 13.1 version adds support for the Always Encrypted feature. These ODBC Drivers are available from the link I mentioned earlier.

The remainder of this article reviews common data access technologies for native (unmanaged) Windows applications that are explicitly identified as unsupported, deprecated, or may have a limited future along with remediation considerations. This isn’t to say these technologies won’t work, just that using them may block upgrades to new versions of SQL Server and Azure SQL Database as well as prohibit using new SQL Server features. Applications that are aligned with current technologies are much better positioned for seamless SQL Server upgrades, both on-prem and in the cloud.

Deprecated Windows Data Access Components
Windows Data Access Components (WDAC), which was formally known as Microsoft Data Access Components (MDAC) in older Windows versions, is included with Windows to provide data access infrastructure for ODBC, OLE DB, ADO Classic, and managed ADO.NET out-of-the box. WDAC includes a mix of deprecated, mature, and modern components. WDAC is part of the operating system and maintained by Windows Update.

Both the SQL Server ODBC driver and OLE DB provider included with WDAC are deprecated; these are provided only for legacy application backwards compatibility and should not be used for new application development. The WDAC ODBC driver named “SQL Server” and OLE DB provider named “Microsoft OLE DB Provider for SQL Server” (SQLOLEDB) were both deprecated in favor of SQL Server Native Client when SQL Server 2005 was released for over 10 years ago. Not only do these old components not support new data types introduced after SQL Server 2000 directly, newer features like Availability Groups, MARS, Always Encrypted, and idle connection resiliency are not supported either. Native applications should install and use a separately installed SQL Server driver rather instead of the WDAC “SQL Server ODBC driver, SQLOLEDB provider, or SNAC ODBC driver/OLE DB provider.

WDAC also includes the Microsoft OLE DB Provider for ODBC Drivers (MSDASQL), which acts as a bridge to allow OLE DB applications to use an ODBC driver instead of an OLE DB provider for low-level data access. The main purpose of MSDASQL was to allow OLE DB programs to access RDBMS products other than SQL Server that had an ODBC driver but no OLE DB provider, with the expectation that vendors would eventually jump on the OLE DB bandwagon. However, that didn’t happen widely because OLE DB relies on Windows-only COM interfaces and many vendors preferred to support only ODBC interfaces. Microsoft recognized the value of ODBC for cross-platform SQL Server data access, which is one of the reasons called out for OLE DB deprecation. Although MSDASQL is an OLE DB provider, it is not deprecated for SQL Server relational data access because it uses ODBC to access the data store. MSDASQL can facilitate transitioning to ODBC in existing OLE DB applications. I’ll discuss this in more detail shortly.

ADO (ActiveX Data Objects, not to be confused with ADO.NET) components are included with WDAC. Although not deprecated, ADO a mature OLE DB technology that hasn’t been enhanced since ADO 6.0 was released with WDAC in Windows Vista over 10 years ago. ADO types do not directly support data types introduced after SQL Server 2000, regardless of the underlying provider/driver used. I’ll leave speculation on the future of ADO as an exercise for the reader.

Changing Existing ODBC Applications
Changing an existing ODBC application (ADO or direct ODBC function calls) to use the Microsoft ODBC Driver for SQL Server is often simply a matter of installing the driver and changing the connection string or DSN to use it. Unmanaged C++ applications that use the ODBC call level interface directly may need to be recompiled using the header files included with the SDK install of the ODBC driver. It’s often trivial to switch from SNAC or the WDAC SQL Server ODBC driver. Application testing should be done to ensure compatibility but ODBC driver upgrades are usually transparent.

Changing from OLE DB to ODBC in ADO Classic
There is still quite a bit of OLE DB SQL Server access in the wild in my experience, varying from C++, VBA, Classic ASP, and even in VB.NET apps migrated from VB6 that were never retrofitted to use to SqlClient (which should be done as it provides high-performance managed data access for .NET applications). From unmanaged code, ADO provides an easy to use object-oriented COM interface on top of the 100+ complex OLE DB interfaces. Low-level data access is performed by the OLE DB provider specified in the connection string or connection object. MSDASQL (Microsoft OLE DB Provider for ODBC Drivers) is the default provider in ADO so ODBC is used when no OLE DB provider is specified. When MSDASQL is used explicitly or by default, the connection string must specify either a DSN or ODBC driver.

Below are DSN-less ADO ODBC connection string examples that use the Microsoft ODBC Driver for SQL Server with and without a trusted connection. “Provider=MSDASQL” could have been specified in the connection string but is not technically required since it’s the default provider:

DataSource=YourServer;Driver={ODBC Driver 13 for SQL Server};Database=YourDatabase;Trusted_Connection=Yes
DataSource=YourServer;Driver={ODBC Driver 13 for SQL Server};Database=YourDatabase;Trusted_Connection=Yes;UID=YourUser;PWD=YourPassword

The examples below use an ODBC DSN with the default database specified in the DSN configuration. The first uses Integrated Windows authentication (specified in the DSN configuration) and the second example is functionally identical except using SQL Server authentication. Be aware that the Microsoft ODBC Driver for SQL Server does not store user credentials in the DSN configuration so those must be specified by the application:

DSN=YourOdbcDataSource
DSN=YourOdbcDataSource;UID=YourUser;PWD=YourPassword

ADO applications use a handful of objects to interact with SQL Server (mostly connection, command, recordset, record, transaction, parameter, and field objects). These objects abstract the implementation details such that it is possible to use ADO with any DBMS product, OLE DB provider, or ODBC driver as long as the low-level driver/provider supports the requested functionality. One can theoretically transition from OLE DB to ODBC with only a connection string change. However, the devil is in the details when switching from OLE DB to ODBC via MSDASQL.

MSDASQL converts OLE DB method calls into their equivalent ODBC function calls. Commonly used fast-forward read-only client-side cursors (a.k.a. firehose cursor, same as the only type ADO.NET uses) tend to work well in my experience. However, ADO provides a plethora of options for cursor location, execution, cursor types, and locking modes that ADO and MSDASQL may or may not translate well to ODBC equivalents. If you run into problems during testing with ODBC, make sure the requested ADO options are appropriate for the task at hand. It is common for ADO applications to use advanced options and pessimistic locking inappropriately. A firehose cursor with optimistic locking is often best for the task at hand and will perform better too.

Like ADO, MSDASQL is a mature technology that hasn’t gotten much love lately. I could be wrong but I don’t expect patches that are not security related. If you run into issues with MSDASQL, you may be better off developing a work-around rather than wait for a fix.

Changing OLE DB Interface Calls to ODBC
C++ applications that use OLE DB interfaces directly rather than the higher-level ADO API can be converted to ODBC by using the MSDASQL provider similarly as discussed in the ADO topic. The same considerations apply.

Note that MSDASQL does add another layer, which may be an issue for applications that are especially performance-sensitive. The best approach for C++ applications that are expected to be around for a while might be to move from OLE DB to ODBC function calls (or via MFC). This will provide the highest level of performance and alignment with ODBC.

Remediation Summary
Inasmuch as I recommend staying aligned with product technology lifecycles, whether or not one should modify an existing application to use currently data access technologies is ultimately a business decision. Business benefits include not only supportability but also agility in making functional enhancements that leverage newer features as well as non-functional ones, such as support for Availability Groups. The costs of the development and testing needed to use the Microsoft ODBC Driver for SQL Server will vary considerably depending on application design. Those costs need to be weighed against the benefits of the effort.

That being said, I always recommend one avoid deprecated technologies in new development work even if that requires moving outside one’s comfort zone. Technical debt is a hidden cost that often needs to be paid back with interest.

p.s. For history buffs, this post by Hal Berenson, retired Microsoft Distinguished Engineer and General Manager, provides additional background on OLE DB and data access technologies.

SQL Server 2016 SP1 Standard Edition Enhancements

Posted on October 2, 2016 by dguzman

I seldom get excited about service packs but the changes released with SQL Server 2016 SP1 are the most significant I’ve seen in a SQL Server service pack in 20+ years. Microsoft announced this week at the Microsoft Connect(); developer’s conference that SQL Server 2016 SP1, which is available for download immediately, allows features previously available only in Enterprise/Developer Editions to be used in lessor Standard, Web, Express, and LocalDB Editions too. Features like table partitioning, In-Memory OLTP, and columnstore are now options for developers and DBAs using SQL Server Standard Edition and even the free Express Edition in production. See SQL Server 2016 Service Pack 1 (SP1) released !!! for the complete matrix of programmability features by edition along with other cool SP1 information.

The implications are huge now that SQL Server has the same programmability surface area among editions. The choice of the production edition can be made independently based on operational needs rather than programmability features. Developers can use a free edition (i.e. LocalDB, Express or Developer) without fear a feature won’t be available in production as long as prod is running SQL Server 2016 SP1 or greater. DBAs can now choose the appropriate edition for production based on other considerations like advanced high availability, TDE, Auditing as well as performance features like higher supported memory, more number of cores, and advanced scanning. This separation of concerns avoids the need to lock in the production edition early in the application lifecycle, making development easier and production implementation more flexible.

Real World Use Case Scenario
I work with an ISV with hundreds of customers running a mix of Standard and Enterprise Edition. Their needs vary widely and SQL Server Enterprise Edition is not an option for some due to budget constraints. Some tables are often quite large so partitioning is required for manageability and, for their reporting workload, partitioning also improves performance of large scans due to partition elimination. The ugliness though, is that table partitioning (and/or columnstore) is the right tool for the job but was not an option for customers on Standard Edition.

The ISV initially compromised and used view partitioning instead of table partitioning so that the same code would run regardless of edition. Although that provided the expected manageability benefits, there were some downsides. Compilation time increased significantly as the number of partitioned view member tables increased as did the query plan complexity. This sometimes resulted in poor query plans against very large tables and especially impacted larger and most valued customers, most of which were running Enterprise Edition.

To address the problem before SQL Server 2016 SP1, the ISV added conditional code to the application so that either view or table partitioning could be used depending on the SQL Server edition. This wasn’t ideal as it added code complexity and doubled the number of QA test cases for application features that performed partition maintenance. However, since the resultant benefits for their larger customers on Enterprise Edition were quite significant; the additional costs of development and testing were well-justified.

Now that table partitioning is available in SQL Server 2016 SP1 Standard Edition, they plan to require SQL Server 2016 SP1 (or later) going forward, use table partitioning unconditionally, and perhaps introduce usage of other features like columnstore that were previously Enterprise only. Not only will this simplify the code base and test cases, customers on Standard Edition will be happier with their experience and can upgrade to Enterprise if they so choose without reinstalling or reconfiguring the application. It will of course take some time before all their customers upgrade to the latest product version and SQL 2016 SP1+ but the future is much brighter now.

Perform Due Diligence
If you are new to features previously available only in Enterprise Edition, I suggest you perform due diligence before using these features. Memory-optimized features like columnstore and In-Memory OLTP require additional physical memory and insufficient memory with memory-optimized features will be a production show-stopper. Make sure your hardware is sized appropriately regardless of edition and, in the case of editions other than Enterprise or Developer, memory requirements don’t exceed the maximum capacity limits for that edition. Although very powerful, In-Memory OLTP is a fundamentally different paradigm that you might be accustomed to regarding transactional behavior and isolation levels. Be sure you fully understand these features before using it in development or production.

Summary
I hope these changes are enough motivation for you to consider upgrading to SQL Server 2016 SP1, especially if you are running Standard Edition or are currently on an older SQL Server version. Together with the fact that SQL Server 2016 just runs faster, the time and effort spend in upgrading is a solid investment that will pay dividends regardless of edition.

Dan Guzman's Blog

Microsoft SQL Server Technical Articles

Category Archives: Uncategorized