Microsoft SQL Server Script DOM

I once heard someone refer to the Microsoft SQL Server Script DOM library as the crown jewels of SQL Server tools. That may be a bit of an overstatement but reliably parsing and analyzing T-SQL scripts programmatically is an exceedingly complex task due to the many variations of T-SQL constructs and contexts in which T-SQL language elements may occur. Albeit one can parse T-SQL code using string functions, regex expressions, et.al., such parsing is fragile at best due to the difficulty in considering variants of multi-line statements, multi-statement lines, whitespace, comments that contain T-SQL code, etc.

Whether you’re a DBA needing to identify problem code in existing objects or a database developer wanting to incorporate robust code analysis rules in a CI/CD pipeline, the T-SQL Script Document Object Model (DOM) library is the right tool for these jobs and more.

The script DOM library official documentation is rather scant, including auto-generated class documentation without usage examples and behavior details. This introductory article reviews script DOM concepts and programming while showing how leverage the power of this library using PowerShell (version 5 or later), which is especially handy for ad-hoc needs. One can also use the library in any .NET language (e.g. C#, VB.NET) for a variety of purposes such as database code analysis rules in SSDT. Expertise in .NET programming or PowerShell is not required but a cursory knowledge of object-oriented programming and .NET is helpful to understand and extend the example code included here.

Script DOM Overview

The T-SQL script DOM library, Microsoft.SqlServer.TransactSql.ScriptDom.dll, includes classes and methods to parse, process, and format T-SQL scripts. The library is part of the Microsoft SQL Server Data-Tier Application Framework (DacFx) and, as a stand-alone client library, it can be used without installing SQL Server or connecting to a database engine. The library is available for .NET Framework and .NET Core versions so that it will run on Windows, macOS, or Linux.

The Windows PowerShell script examples in this article use the .NET Framework version installed with the sqlpackage MSI. Alternatively, one can use the assembly included in the DacFx NuGet package, or the assembly distributed with an existing SQL Server tools installation, or a .NET Core version from the sqlpackage download page. The PowerShell scripts in this article will run on any machine (or container) with PowerShell or PowerShell Core installed simply by specifying the location of the assembly Add-Type PowerShell command. One can similarly add a reference to the NuGet package or assembly for use in compiled .NET Framework applications.

Note that the T-SQL script DOM library parses only T-SQL; it does not recognize SQLCMD commands or scripting variables understood by SQL Server tools like SQLCMD, SSMS (in SQLCMD mode), and Azure Data Studio. The library can, however, parse scripts containing GO batch terminators even though GO is not technically a T-SQL statement.

Parsing Overview

Scripts are parsed by invoking the Parse method of T-SQL script DOM library TSqlParser class. The parser understands the complex T-SQL abstract syntax tree and splits T-SQL source into atomic TSqlParserTokens of TSqlTokenTypes that represent keywords, identifiers, punctuation, literals, whitespace, etc. These low-level tokens are grouped into more meaningful TSqlFragment objects that represent language elements of the script DOM, such as batches, statements, clauses, etc. Fragments, rather than the low-level parser tokens, are most often used in practice, although the underlying tokens are available for specialized requirements

The Parse method returns a TSqlFragment object of type TSqlScript containing all fragments within the script. This top-level fragment of the DOM hierarchy provides programmatic access to all language element fragments in the script. Nearly 1,000 different fragment types exist today due to the many granular T-SQL language elements.

Each fragment includes properties applicable to the specific fragment type it represents plus the properties common to all fragments (inherited from TSqlFragment) listed below as extracted from the documentation.

Property Description
FirstTokenIndex Gets or sets the first index of the token.
FragmentLength Defines the number of characters the fragment takes up in the script it was parsed.
LastTokenIndex Gets or sets the last index of the token.
ScriptTokenStream Gets or sets the script token stream.
StartColumn Gets the start column.
StartLine Gets the start line.
StartOffset Defines the character offset of fragments starting location in the script it was parsed.

The script source can be passed to Parse as a TextReader or as an IList<TSqlParserToken>. All the examples in this article use the TextReader method overload with a StringReader object containing the source script.

Parsing Examples

Here is an example that parses trigger code to ensure it is syntactically valid for each SQL Server version from SQL Server 2000 to SQL Server 2019. This trigger uses RAISERROR syntax that was deprecated decades ago (SQL Server 2000?) and removed from the product entirely in SQL Server 2012. Parsing will fail with an invalid syntax error when a TSqlParser110 or later parser version is used.

# trigger with deprecated/obsolete RAISERROR syntax
$script = @"
CREATE TRIGGER dbo.tr_OrderHeader
ON dbo.OrderHeader
FOR DELETE
AS
IF EXISTS(SELECT 1 FROM dbo.OrderDetail AS od WHERE od.OrderID IN(SELECT d.OrderID FROM deleted AS d))
BEGIN
	RAISERROR 50001 'OrderDetail rows exist for order. Delete order detail before order header.'
	ROLLBACK
END
"@

# parse script with specified parser and display success or parsing errors
Function Parse-TSqlScript($parser, $script) {

    # create an ParseError collection for any errors returned by parser
    $parseErrors = New-Object System.Collections.Generic.List[Microsoft.SqlServer.TransactSql.ScriptDom.ParseError]

    # create a StringReader for the script for parsing
    $stringReader = New-Object System.IO.StringReader($script)

    # parse the script
    $tSqlFragment = $parser.Parse($stringReader, [ref]$parseErrors)

    # display parse success or error
    if($parseErrors.Count -eq 0) {
        Write-Host "$($parser.GetType().Name): No parsing errors" -ForegroundColor Green
    }
    else {
        Write-Host "$($parser.GetType().Name): $($parseErrors.Count) parsing error(s): $(($parseErrors | ConvertTo-Json))" -ForegroundColor Yellow
    }

}

# ############
# ### MAIN ###
# ############
try {

    # load Script DOM assembly for use by this PowerShell session
    Add-Type -Path "C:\Program Files\Microsoft SQL Server\150\DAC\bin\Microsoft.SqlServer.TransactSql.ScriptDom.dll"

    # SQL Server 2000
    $parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql80Parser($true)
    Parse-TSqlScript -parser $parser -script $script

    # SQL Server 2005
    $parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql90Parser($true)
    Parse-TSqlScript -parser $parser -script $script

    # SQL Server 2008
    $parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql100Parser($true)
    Parse-TSqlScript -parser $parser -script $script

    # SQL Server 2012
    $parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql110Parser($true)
    Parse-TSqlScript -parser $parser -script $script

    # SQL Server 2014
    $parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql120Parser($true)
    Parse-TSqlScript -parser $parser -script $script

    # SQL Server 2016
    $parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql130Parser($true)
    Parse-TSqlScript -parser $parser -script $script

    # SQL Server 2017
    $parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql140Parser($true)
    Parse-TSqlScript -parser $parser -script $script

    # SQL Server 2019
    $parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql150Parser($true)
    Parse-TSqlScript -parser $parser -script $script

}
catch {

    throw

}

Script output:

TSql80Parser: No parsing errors
TSql90Parser: No parsing errors
TSql100Parser: No parsing errors
TSql110Parser: 1 parsing error(s): {
    "Number":  46010,
    "Offset":  193,
    "Line":  7,
    "Column":  12,
    "Message":  "Incorrect syntax near 50001."
}
TSql120Parser: 1 parsing error(s): {
    "Number":  46010,
    "Offset":  193,
    "Line":  7,
    "Column":  12,
    "Message":  "Incorrect syntax near 50001."
}
TSql130Parser: 1 parsing error(s): {
    "Number":  46010,
    "Offset":  193,
    "Line":  7,
    "Column":  12,
    "Message":  "Incorrect syntax near 50001."
}
TSql140Parser: 1 parsing error(s): {
    "Number":  46010,
    "Offset":  193,
    "Line":  7,
    "Column":  12,
    "Message":  "Incorrect syntax near 50001."
}
TSql150Parser: 1 parsing error(s): {
    "Number":  46010,     "Offset":  193,
    "Line":  7,
    "Column":  12,
    "Message":  "Incorrect syntax near 50001."
} 

I have seen invalid code like the above trigger in existing production databases due to past database upgrades. These objects can no longer be executed and should be dropped or remediated. The script below extends the previous example, parsing all existing stored procedures, triggers, and functions in an existing database according to the desired target SQL Server version (SQL 2019) and the existing object quoted identifier setting passed as the TSqlParser constructor argument.

$connectionString = "Data Source=.;Initial Catalog=AdventureWorks2012;Integrated Security=SSPI"

# parse script with specified parser and display success or parsing errors
Function Parse-TSqlScript($parser, $script) {

    # create an ParseError collection for any errors returned by parser
    $parseErrors = New-Object System.Collections.Generic.List[Microsoft.SqlServer.TransactSql.ScriptDom.ParseError]

    # create a StringReader for the script for parsing
    $stringReader = New-Object System.IO.StringReader($script)

    # parse the script
    $tSqlFragment = $parser.Parse($stringReader, [ref]$parseErrors)

    # display parse success or error
    if($parseErrors.Count -eq 0) {
        Write-Host "$($parser.GetType().Name): No parsing errors" -ForegroundColor Green
    }
    else {
        Write-Host "$($parser.GetType().Name): $($parseErrors.Count) parsing error(s): $(($parseErrors | ConvertTo-Json))" -ForegroundColor Yellow
    }

}

# ############
# ### MAIN ###
# ############
try {

    # load Script DOM assembly for use by this PowerShell session
    Add-Type -Path "C:\Program Files\Microsoft SQL Server\150\DAC\bin\Microsoft.SqlServer.TransactSql.ScriptDom.dll"

    $query = @"
SELECT 
	  QUOTENAME(OBJECT_SCHEMA_NAME(object_id)) + N'.' + QUOTENAME(OBJECT_NAME(object_id)) AS ObjectName
	, OBJECTPROPERTY(object_id, 'ExecIsQuotedIdentOn') AS ExecIsQuotedIdentOn
	, definition 
FROM sys.sql_modules;
"@

    $connection = New-Object Data.SqlClient.SqlConnection($connectionString)
    $command = New-Object Data.SqlClient.SqlCommand($query, $connection)
    $connection.Open()
    $reader = $command.ExecuteReader()
    while ($reader.Read()) {

        $parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql150Parser($reader["ExecIsQuotedIdentOn"])

        Write-Host "Parsing $($reader["ObjectName"]) ..."
        Parse-TSqlScript -parser $parser -script $($reader["definition"])

    }
    $connection.Close()
}    
catch {

    throw

} 

Below are the first few lines output if output. No parsing errors were found in this case.

Parsing [Sales].[vStoreWithContacts] ...
TSql150Parser: No parsing errors
Parsing [Sales].[vStoreWithAddresses] ...
TSql150Parser: No parsing errors
Parsing [Purchasing].[vVendorWithContacts] ...
TSql150Parser: No parsing errors
Parsing [Purchasing].[vVendorWithAddresses] ...
TSql150Parser: No parsing errors
Parsing [dbo].[ufnGetAccountingStartDate] ...
TSql150Parser: No parsing errors 

The same technique as above can parse version-controlled T-SQL scripts or any scripts stored in files. This example parses all scripts in a given directory and subdirectories.

# parse script with specified parser and display success or parsing errors
Function Parse-TSqlScript($parser, $script) {

    # create an ParseError collection for any errors returned by parser
    $parseErrors = New-Object System.Collections.Generic.List[Microsoft.SqlServer.TransactSql.ScriptDom.ParseError]

    # create a StringReader for the script for parsing
    $stringReader = New-Object System.IO.StringReader($script)

    # parse the script
    $tSqlFragment = $parser.Parse($stringReader, [ref]$parseErrors)

    # display parse success or error
    if($parseErrors.Count -eq 0) {
        Write-Host "$($parser.GetType().Name): No parsing errors" -ForegroundColor Green
    }
    else {
        Write-Host "$($parser.GetType().Name): $($parseErrors.Count) parsing error(s): $(($parseErrors | ConvertTo-Json))" -ForegroundColor Yellow
    }

}

# ############
# ### MAIN ###
# ############
try {

    # load Script DOM assembly for use by this PowerShell session
    Add-Type -Path "C:\Program Files\Microsoft SQL Server\150\DAC\bin\Microsoft.SqlServer.TransactSql.ScriptDom.dll"
    $scriptFiles = Get-ChildItem "C:\SqlScripts\*.sql" -Recurse
    # parse with quoted identifier on
    $parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql150Parser($true)
    foreach ($scriptFile in $scriptFiles) {

        $script = [IO.File]::ReadAllText($scriptFile.FullName)
        Write-Host "Parsing $($scriptFile.FullName) ..."
        Parse-TSqlScript -parser $parser -script $script

    }

}    
catch {

    throw

}

Partial output from the script above:

Parsing C:\SqlScripts\AdventureWorks2012\AdventureWorks2012.Database.sql ...
TSql150Parser: No parsing errors
Parsing C:\SqlScripts\AdventureWorks2012\dbo.AccountNumber.UserDefinedDataType.sql ...
TSql150Parser: No parsing errors
Parsing C:\SqlScripts\AdventureWorks2012\dbo.AWBuildVersion.Table.sql ...
TSql150Parser: No parsing errors
Parsing C:\SqlScripts\AdventureWorks2012\dbo.DatabaseLog.Table.sql ...
TSql150Parser: No parsing errors
Parsing C:\SqlScripts\AdventureWorks2012\dbo.ErrorLog.Table.sql ...
TSql150Parser: No parsing errors
Parsing C:\SqlScripts\AdventureWorks2012\dbo.Flag.UserDefinedDataType.sql ...
TSql150Parser: No parsing errors
Parsing C:\SqlScripts\AdventureWorks2012\dbo.Name.UserDefinedDataType.sql ...
TSql150Parser: No parsing errors 

Analyzing T-SQL Scripts

Parsing code for syntax errors is only a small subset of T-SQL Script DOM capability. One can also analyze T-SQL code after parsing using the TSqlScript fragment returned by the Parse method.

One way to examine scripts is by traversing the parsed DOM hierarchy of TSqlScript, TSqlBatch, and TSqlStatement fragments. This technique has shortcomings as illustrated with this example, which iterates over the TSqlBatch Statements collection looking for “SELECT *” constructs. The script displays a warning message and location within the script where a SelectStarExpression fragment is found.

# this is the script to parse
$script = @"
SELECT * FROM dbo.SomeTable;
IF 1 = 1
BEGIN
    SELECT * FROM dbo.SomeOtherTable;
END;
"@

try {

    # load Script DOM assembly for use by this PowerShell session
    Add-Type -Path "C:\Program Files\Microsoft SQL Server\150\DAC\bin\Microsoft.SqlServer.TransactSql.ScriptDom.dll"
    # create a TSql150Parser parser (SQL Server 2019)
    $parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql150Parser($true)

    # create an ParseError collection for any errors returned by parser
    $parseErrors = New-Object System.Collections.Generic.List[Microsoft.SqlServer.TransactSql.ScriptDom.ParseError]

    # create a StringReader for the script for parsing
    $stringReader = New-Object System.IO.StringReader($script)

    # parse the script
    $tSqlFragment = $parser.Parse($stringReader, [ref]$parseErrors)

    # raise an exception if any parsing errors occur
    if($parseErrors.Count -gt 0) {
        throw "$($parseErrors.Count) parsing error(s): $(($parseErrors | ConvertTo-Json))"
    }

    # show number of batches in this script
    Write-Host "Script starts at line $($tSqlFragment.StartLine), column $($tSqlFragment.StartColumn), length $($tSqlFragment.FragmentLength), and contains $($tSqlFragment.Batches.Count) batches"

    # iterate over batches in this script
    foreach ($batchFragment in $tSqlFragment.Batches) {

        # show starting line number of this batch
        Write-Host "`tBatch at line $($batchFragment.StartLine), column $($batchFragment.StartColumn), length $($batchFragment.FragmentLength), and contains $($batchFragment.Statements.Count) statements"

        # iterate over statements in this batch
        foreach ($statementFragment in $batchFragment.Statements) {

            # show statement starting line number and statement type
            Write-Host "`t`tStatement at line $($statementFragment.StartLine), column $($statementFragment.StartColumn), length $($statementFragment.FragmentLength) and is a $($statementFragment.GetType().Name)"

            # for SELECT statements, look for SELECT * expression and display warning if found
            if($statementFragment.GetType().Name -eq "SelectStatement") {
                foreach($selectElementFragment in $statementFragment.QueryExpression.SelectElements) {
                    if($selectElementFragment.GetType().Name -eq "SelectStarExpression") {
                        Write-Host "`t`tWARNING: 'SELECT *' found at line $($selectElementFragment.StartLine), column $($selectElementFragment.StartColumn), length $($selectElementFragment.FragmentLength)" -ForegroundColor Yellow
                    }
                }
                
            }
        }
    }
}
catch {
    throw
} 

Output:

Script starts at line 1, column 1, length 90, and contains 1 batches
	Batch at line 1, column 1, length 90, and contains 2 statements
		Statement at line 1, column 1, length 28 and is a SelectStatement
		WARNING: 'SELECT *' found at line 1, column 8, length 1
		Statement at line 2, column 1, length 60 and is a IfStatement 

You might have noticed the second SELECT * statement in the script, which is conditional, was not identified. This is because the code examined only SELECT statements that are immediate children of a batch and the unidentified SELECT statement is a child of an IfStatement fragment (grandchild of batch). The missed statement could have been identified by adding code to evaluate IfStatement fragments too, examining its ThenStatement property and descendants to see if a SelectStatement with a SelectStarExpression exists somewhere under the DOM tree. But as you can imagine, the code would become quite unwieldly and one would also need to also need to examine the ElseStatement property of IfStatement fragments, consider nested constructs, BEGIN/END blocks, etc., where a SELECT * might occur.

Navigating the T-SQL DOM with the above technique may be useful for some specialized use cases but too cumbersome with scripts containing statements of varying types and complex constructs. A more robust solution for this, and most other code analysis needs, is a visitor. A visitor class allows one to more easily process desired TSqlFragment types regardless of where they occur in the script while ignoring others. The remainder of the article focuses on fragment visitors.

Fragment Visitors

A custom fragment visitor class allows one to use a visitor pattern to more easily process relevant T-SQL fragment types with concise code while ignoring fragment types unneeded for the task at hand. A visitor class can be created in any .NET programming language that supports object-oriented inheritance, such as C#, VB.NET, and PowerShell version 5 or later. The class simply derives from the T-SQL Script DOM library TSqlConcreteFragmentVisitor type (or TSqlFragmentVisitor for specialized needs) and overrides the Visit base class method for fragment type(s) of interest. The base class includes plumbing to invoke the overridden Visit methods for the accepted fragment and its descendants without verbose code.

A custom visitor is used by invoking the Accept method on a fragment instance with the custom visitor object as the argument. Accept invokes the visitor’s Visit method for the current fragment type, which will be the overridden method if defined by the custom visitor or the base class. Each child of the fragment is then visited similarly, invoking the overridden Visit method for the fragment’s type if one exists. Consequently, Accept visits not only the current fragment, but all of its descendants and executes the overridden Visit method for each fragment along the way. This allows one to invoke Accept on the TSqlScript fragment returned by parse to visit all overridden Visit methods of the custom visitor class.

Here’s a PowerShell visitor example that identifies SelectStarExpression fragments anywhere in a T-SQL script using a visitor that derives from TSqlConcreateFragmentVisitor. Be aware than in PowerShell, the T-SQL Script DOM assembly must be added with Add-Type before running a script because the visitor base class type is defined in the external T-SQL Script DOM assembly. I’ll discuss more about this and other PowerShell considerations after reviewing visitors.

# this is the script to parse
$script = @"
SELECT * FROM dbo.SomeTable;
IF 1 = 1
BEGIN
    SELECT * FROM dbo.SomeOtherTable;
END; 
"@

try {


    class MyVisitor: Microsoft.SqlServer.TransactSql.ScriptDom.TSqlConcreteFragmentVisitor {

        [void]Visit ([Microsoft.SqlServer.TransactSql.ScriptDom.SelectStarExpression] $fragment) {
            Write-Host "`t`tWARNING: 'SELECT *' found at line $($fragment.StartLine), column $($fragment.StartColumn), length $($fragment.FragmentLength)" -ForegroundColor Yellow
        }
    
    }

    $parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql150Parser($true)
    $parseErrors = New-Object System.Collections.Generic.List[Microsoft.SqlServer.TransactSql.ScriptDom.ParseError]
    $stringReader = New-Object System.IO.StringReader($script)

    $tSqlFragment = $parser.Parse($stringReader, [ref]$parseErrors)
    if($parseErrors.Count -gt 0) {
        throw "$($parseErrors.Count) parsing error(s): $(($parseErrors | ConvertTo-Json))"
    }

    $visitor = [MyVisitor]::new()

    $tSqlFragment.Accept($visitor)

}
catch {
    throw
}

Script output:

WARNING: 'SELECT *' found at line 1, column 8, length 1
WARNING: 'SELECT *' found at line 4, column 12, length 1 

The code invoked Accept on only the root TSqlScript fragment returned by Parse. The fragment Accept method did the rest of the work by visiting the fragment and its descendants, invoking the overridden Visit(SelectStarExpression) method whenever a SelectStarExpression fragment was encountered. This behavior can be observed in more detail by also overriding Visit(TSqlFragment) to display information about every fragment visited:

$script = @"
SELECT * FROM dbo.SomeTable;
IF 1 = 1
BEGIN
    SELECT * FROM dbo.SomeOtherTable;
END; 
"@

try {

    class MyVisitor: Microsoft.SqlServer.TransactSql.ScriptDom.TSqlConcreteFragmentVisitor {

        [void]Visit ([Microsoft.SqlServer.TransactSql.ScriptDom.SelectStarExpression] $fragment) {
            Write-Host "WARNING: 'SELECT *' found at line $($fragment.StartLine), column $($fragment.StartColumn), length $($fragment.FragmentLength)" -ForegroundColor Yellow
        }

        [void]Visit ([Microsoft.SqlServer.TransactSql.ScriptDom.TSqlFragment] $fragment) {
            Write-Host "$($fragment.GetType().Name) found at line $($fragment.StartLine), column $($fragment.StartColumn), length $($fragment.FragmentLength)"
        }
    
    }

    $parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql150Parser($true)
    $parseErrors = New-Object System.Collections.Generic.List[Microsoft.SqlServer.TransactSql.ScriptDom.ParseError]
    $stringReader = New-Object System.IO.StringReader($script)

    $frament = $parser.Parse($stringReader, [ref]$parseErrors)
    if($parseErrors.Count -gt 0) {
        throw "$($parseErrors.Count) parsing error(s): $(($parseErrors | ConvertTo-Json))"
    }

    $visitor = [MyVisitor]::new()

    $frament.Accept($visitor)

}
catch {

    throw

}

Output showing every visited fragment along with warnings:

TSqlScript found at line 1, column 1, length 91
TSqlBatch found at line 1, column 1, length 90
SelectStatement found at line 1, column 1, length 28
QuerySpecification found at line 1, column 1, length 27
WARNING: 'SELECT *' found at line 1, column 8, length 1
FromClause found at line 1, column 10, length 18
NamedTableReference found at line 1, column 15, length 13
SchemaObjectName found at line 1, column 15, length 13
Identifier found at line 1, column 15, length 3
Identifier found at line 1, column 19, length 9
IfStatement found at line 2, column 1, length 60
BooleanComparisonExpression found at line 2, column 4, length 5
IntegerLiteral found at line 2, column 4, length 1
IntegerLiteral found at line 2, column 8, length 1
BeginEndBlockStatement found at line 3, column 1, length 50
StatementList found at line -1, column -1, length -1
SelectStatement found at line 4, column 5, length 33
QuerySpecification found at line 4, column 5, length 32
WARNING: 'SELECT *' found at line 4, column 12, length 1
FromClause found at line 4, column 14, length 23
NamedTableReference found at line 4, column 19, length 18
SchemaObjectName found at line 4, column 19, length 18
Identifier found at line 4, column 19, length 3
Identifier found at line 4, column 23, length 14 

TSqlFragmentVisitor versus TSqlConcreteFragmentVisitor

I mentioned earlier a custom visitor class derives from either TSqlConcreteFragmentVisitor or TSqlFragmentVisitor. The difference between these base classes is that TSqlFragmentVisitor includes a Visit method for each of the nearly one thousand different TSqlFragment types (984 as of this writing to be precise), including the abstract base types fragments types derive from. In contrast. TSqlConcreteFragmentVisitor, includes an overridable Visit method for only TSqlFragment and concrete derived types (currently 856). The implication is that the same fragment can be visited more than once when TSqlFragmentVisitor is used as a visitor base class, once as TSqlFragment, again as base type(s) when applicable, and finally as the concreate derived type.

This example shows a trivial select statement visited as TSqlFragment, TSqlStatement, and SelectStatement when a visitor derives from TSqlFragmentVisitor:

$script = "SELECT 1;"

try {

class MyVisitor: Microsoft.SqlServer.TransactSql.ScriptDom.TSqlFragmentVisitor {

    [void]Visit ([Microsoft.SqlServer.TransactSql.ScriptDom.TSqlFragment] $fragment) {
        Write-Host "$($fragment.GetType().Name) visited as TSqlFragment at line $($fragment.StartLine), column $($fragment.StartColumn), length $($fragment.FragmentLength)"
    }

    [void]Visit ([Microsoft.SqlServer.TransactSql.ScriptDom.TSqlStatement] $fragment) {
        Write-Host "$($fragment.GetType().Name) visited as TSqlStatement at line $($fragment.StartLine), column $($fragment.StartColumn), length $($fragment.FragmentLength)"
    }

    [void]Visit ([Microsoft.SqlServer.TransactSql.ScriptDom.SelectStatement] $fragment) {
        Write-Host "$($fragment.GetType().Name) visited as SelectStatement at line $($fragment.StartLine), column $($fragment.StartColumn), length $($fragment.FragmentLength)"
    }
    
}

$parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql150Parser($true)
$parseErrors = New-Object System.Collections.Generic.List[Microsoft.SqlServer.TransactSql.ScriptDom.ParseError]
$stringReader = New-Object System.IO.StringReader($script)

$frament = $parser.Parse($stringReader, [ref]$parseErrors)
if($parseErrors.Count -gt 0) {
    throw "$($parseErrors.Count) parsing error(s): $(($parseErrors | ConvertTo-Json))"
}

$visitor = [MyVisitor]::new()

$frament.Accept($visitor)

}
catch {
    throw
}

Output showing all visited fragments:

TSqlScript visited as TSqlFragment at line 1, column 1, length 9
TSqlBatch visited as TSqlFragment at line 1, column 1, length 9
SelectStatement visited as TSqlStatement at line 1, column 1, length 9
SelectStatement visited as TSqlFragment at line 1, column 1, length 9
SelectStatement visited as SelectStatement at line 1, column 1, length 9
QuerySpecification visited as TSqlFragment at line 1, column 1, length 8
SelectScalarExpression visited as TSqlFragment at line 1, column 8, length 1
IntegerLiteral visited as TSqlFragment at line 1, column 8, length 1 

One typically derives from TSqlConcreteFragmentVisitor and overrides concrete Visit methods of only relevant concrete types for most use cases. This ensures a fragment is visited only once as the relevant concrete type (plus TSqlFragment if desired). Below is the output from the same script as above except using TSqlConcreteFragmentVisitor as the visitor’s base class showing the select statement fragment is visited only as the concrete SelectStatement type but not as the TSqlStatement abstract type:

TSqlScript visited as TSqlFragment at line 1, column 1, length 9
TSqlBatch visited as TSqlFragment at line 1, column 1, length 9
SelectStatement visited as SelectStatement at line 1, column 1, length 9
QuerySpecification visited as TSqlFragment at line 1, column 1, length 8
SelectScalarExpression visited as TSqlFragment at line 1, column 8, length 1
IntegerLiteral visited as TSqlFragment at line 1, column 8, length 1

Compiled languages like C# provide more type safety and will result in a compilation error if one tries to override a Visit method of a TSqlConcreteFragmentVisitor sealed base fragment type. PowerShell is more forgiving and will silently ignore the override of a TSqlConcreteFragmentVisitor sealed Visit method (i.e. TSqlStatement in this example).

PowerShell Considerations

A PowerShell class that derives from a type defined in an external assembly will not compile unless the assembly with the base type has already been loaded. Consequently, the Microsoft.SqlServer.TransactSql.ScriptDom.dll assembly must first be loaded before a script with a visitor class can be run.

With an interactive PowerShell session, one can run the Add-Type command manually from before running a script with a visitor class. This only needs to be done once and the assembly will be available in the app domain for the duration of the PowerShell session.

Unattended PowerShell script files with visitor classes are more problematic since one cannot execute Add-Type interactively. As a work-around, a wrapper script can be employed to execute the Add-Type command and then “dot source” the script file with the visitor class. For example, after saving any of the previous visitor examples to a file named “VisitorExample.ps1”, the wrapper script code below can be saved to a file named “Execute-VisitorExample.ps1” in the same directory:

Add-Type -Path "C:\Program Files\Microsoft SQL Server\150\DAC\bin\Microsoft.SqlServer.TransactSql.ScriptDom.dll"
.\"VisitorExample.ps1"

The wrapper script can then be run to execute the visitor:

powershell -ExecutionPolicy RemoteSigned -File "Execute-VisitorExample.ps1"

The .NET Framework NuGet package version can used from PowerShell too by registering the NuGet package repo as a trusted source, installing the package (for current user or all users), and adding the assembly from the local source location. Here’s an example of this technique as part of a wrapper script:

# Create trusted NuGet package source, if needed
$packageSource = Get-PackageSource | where { ($_.Location -EQ "https://www.nuget.org/api/v2") -and ($_.ProviderName -eq "NuGet") -and ($_.IsTrusted -eq $true) }
if($packageSource -eq $null) {
    Register-PackageSource NuGetV2 https://www.nuget.org/api/v2 -ProviderName NuGet -Trusted
}

# Install package, if needed. Note scope AllUsers requires admin
$dacFxPackage = Install-Package Microsoft.SqlServer.DacFx.x64 -Source ($packageSource.Name) -Scope CurrentUser
# Get package
$dacFxPackage = Get-Package -Name Microsoft.SqlServer.DacFx.x64

# Load Microsoft.SqlServer.TransactSql.ScriptDom.dll into app domain for use in PS scripts
$packageFolderPath = [System.IO.Path]::GetDirectoryName($dacFxPackage.Source)
Add-Type -LiteralPath "$packageFolderPath\lib\net46\Microsoft.SqlServer.TransactSql.ScriptDom.dll"

# Dot source visitor example
.\"VisitorExample.ps1"

Summary

This introductory article showed T-SQL script DOM library methods to parse and analyze T-SQL code. I hope this helped you gain an understanding of the library’s capabilities and get started for various needs. I’ll follow up with articles discussing more complex use cases, including leveraging visitors to ensure adherence to naming conventions, identify problem T-SQL code constructs, T-SQL code formatting, and other custom tooling.

Improper SET Option Errors

SQL Server backwards compatibility SET options are hidden land mines that explode when one tries to use a feature that requires proper session settings, such as a filtered index, indexed view, etc. The QUOTED_IDENTIFIER, ANSI_NULLS, and ANSI_PADDING settings are especially problematic. These are persisted as meta-data with view, stored procedure, function, trigger, and column definitions and, since persisted settings override current session settings, a nasty runtime error like “…failed because the following SET options have incorrect settings: ‘QUOTED_IDENTIFIER…’” occurs when a persisted setting is set to OFF even though the current session settings are set properly. “Sticky” OFF settings are a common problem and often accidental rather than because the setting is required by T-SQL code that doesn’t follow ISO SQL standards.

This article reviews QUOTED_IDENTIFIER, ANSI_NULLS, and ANSI_PADDING settings, settings persisted as database object meta-data, and considerations with SQL Server tools. I’ll discuss how to identify and remediate problem objects and considerations to ensure proper ON settings going forward.

Background
Microsoft SQL Server, along with its Sybase ancestor, are a bit long in the tooth nowadays. The original code base was developed decades ago before ISO (previously ANSI) SQL standards were formalized. As SQL Server evolved to respect ISO SQL standards, SET options were introduced to avoid breaking existing applications that expected non-ISO behavior. This allows legacy T-SQL code to run on newer SQL Server versions without changes while supporting ISO SQL standards for new development. The SQL Server product team goes through great effort, albeit sometimes to a fault, to provide backwards compatibility as to not block upgrades.

Some SQL Server features require ISO SQL standard settings
plus other session settings to be set properly in order to be used and avoid runtime errors. Features that require these settings include:

  • Filtered Indexes
  • Indexed Views
  • Indexes on computed columns
  • XML indexes
  • Query notifications (a.k.a. SqlDependency)

The below session settings, sometimes called the “magic 7 settings”, must be set properly when using these features. Otherwise, the above
indexes will not be used or a runtime error will be raised when data are
modified:

  • QUOTED_IDENTIFIER ON
  • ANSI_NULLS ON
  • ANSI_PADDING ON
  • ANSI_WARNINGS  ON
  • ARITHABORT ON
  • CONCAT_NULL_YIELDS_NULL ON
  • NUMERIC_ROUNDABORT OFF

Modern Microsoft SQL Server drivers (e.g. ODBC, OLE DB, SqlClient, JDBC) connect with all these session settings properly set by default. The lone exception is that ARITHABORT OFF is set by default in the context of a database in the SQL Server 2000 compatibility level (80). ARITHABORT OFF will not be an issue for most since the unsupported SQL Server 2008 version was the last to include the SQL Server 2000 database compatibility level. ARITHABORT will be set ON by default in SQL Server 2012 and later versions.

QUOTED_IDENTIFIER, ANSI_NULLS, and ANSI_PADDING Session Settings
Given these session settings are initially set properly to ON by default, an OFF setting is a result of one or more of the following:

• An explict T-SQL SET OFF statement after the connection is made
• Non-default OFF setting specified by API connection (e.g. connection string keyword, DSN property, or set in app code programmatically)
• Overridden by a persisted object meta-data setting

Improper SET OFF statements most often leak into T-SQL code due to inattention to detail. The same is true for OFF API connection settings and persisted meta-data settings. T-SQL code that adheres to ISO SQL standard single-quotes literal enclosures and IS NULL/IS NOT NULL comparison operators can run with these settings either ON or OFF with the same outcome. Consequently, there’s no reason not to use the proper ON default settings when these ISO SQL practices are followed. One can simply remove unintentional T-SQL SET OFF statements, fix API settings, and correct persisted meta-data (discussed later) in order to future proof code so that it can run with or without features that require the proper settings.

Legacy T-SQL code that requires OFF settings due to non-ISO compliant constructs needs to be remediated to leverage the aforementioned features before changing settings to ON. Below is a summary of these settings and considerations for changing to use the ON setting.

QUOTED_IDENTIFIER
QUOTED_IDENTIFIER OFF is required only when T-SQL code uses double-quotes instead of single quotes to enclose literals. It’s typically a trivial effort to fix non-conformant code to use single quotes instead of double quotes with a search/replace. A caveat is that single quotes embedded within literals must to be escaped with two consecutive single quotes. The code will then run regardless of the QUOTED_IDENTIFIER setting and follow ISO SQL standards. Only minimal testing is usually needed after remediation since the setting is evaluated at parse time; parsing errors will occur immediately if double-quotes are used to enclose literals. An exception is dynamic SQL where errors won’t be found until one tries to execute an invalid dynamic SQL statement containing double-quote literal enclosures.

The current session QUOTED_IDENTIFER setting is persisted as meta-data when a view, stored procedure, function, or trigger is created or altered. To change the persisted OFF setting to ON, recreate the object from a session with QUOTED_IDENTIFER and ANSI_NULLS ON. Be mindful to ensure the session setting is ON when executing DDL scripts to prevent improper settings going forward. See the considerations topic later in this article for gotchas with SQL Server tools.

ANSI_NULLS
ANSI_NULLS OFF has long been deprecated. The OFF setting allows code to test for NULL values using equality/inequality predicates instead of the ISO SQL standard “IS NULL” and “IS NOT NULL” operators. For example, the “ColumnName = NULL” will evaluate to TRUE instead of UNKNOWN with the ANSI_NULLS OFF setting. Such code should be changed to “ColumnName IS NULL” to follow ISO SQL standards and provide the same behavior regardless of the session ANSI_NULLS setting. Changes made for ANSI_NULLS compliance necessitate more extensive testing because runtime behavior changes rather than parse time errors like QUOTED_IDENTIFIER.

Like QUOTED_IDENTIFER, the current session ANSI_NULLS setting is persisted as meta-data when a view, stored procedure, function, or trigger is created or altered. Recreate the object from a session with QUOTED_IDENTIFER and ANSI_NULLS ON to change the persisted OFF setting to ON and take care to ensure the setting is ON when executing DDL scripts.

ANSI_PADDING
ANSI_PADDING OFF has also been deprecated for quite some time and the SQL Server documentation specifically calls out “ANSI_PADDING should always be set to on.” In summary, a column-level ANSI_PADDING OFF setting causes nullable fixed-length char(n) and binary(n) columns to behave like variable-length varchar(n) and varbinary(n) columns. Furthermore, SQL Server automatically trims trailing blank characters from character data and leading binary zeros from binary data and stores the values as variable length instead of storing the provided value as-is during inserts and updates. Varchar(n)/varbinary(n) columns with ANSI_PADDING OFF are similarly trimmed. Note that it is the persisted ANSI_NULLS column meta-data setting that determines the storage and trimming behavior, not the current session ANSI_PADDING setting. The session ANSI_PADDING must still be ON when using features that require proper settings.

The current session ANSI_PADDING setting is persisted as column-level meta data when tables are created and new columns added to an existing tables. This setting affects only char(n) NULL, binary(n) NULL, and varchar(n)/varbinary(n) columns regardless of nullability. The setting doesn’t apply to varchar(MAX) , varbinary(MAX), char(n) NOT NULL, binary(n) NOT NULL, and other data types.

Since SQL Server comparison operators ignore trailing blanks when comparing strings as well as leading binary zeros when comparing binary values, there isn’t usually an impact from a T-SQL perspective with changing ANSI_PADDING from OFF to ON (aside from storage when the provided values aren’t already trimmed). However, application code might consider training blanks and leading binary zeros, resulting in differences when comparing trimmed and non-trimmed values. Testing is needed depending on how data are used in the app code.

To change a persisted column ANSI_PADDING setting from OFF to ON, execute ALTER TABLE…ALTER COLUMN from an ANSI_PADDING ON session, specifying the same definition as the existing column. Note that this technique will only change ANSI_PADDING from OFF to ON. Altering an existing ANSI_PADDING ON column from an ANSI_PADDING OFF session will not change the persisted setting.

Considerations to Ensure Proper Settings
All “magic 7 settings” are set properly by default with current drivers so one might think ensuring proper settings is easy. This is largely true for application code but, sadly, not with SQL Server tools due to backward compatibility behavior and inattention to detail when using them.

SQL Server Agent uses ODBC (which sets all “magic 7 settings” properly) but then explicitly sets QUOTED_IDENTIFIER OFF after connecting for backwards compatibility. The implication is one needs to explicitly add a SET QUOTED_IDENTIFIER ON statement to T-SQL scripts executed by SQL Server Agent T-SQL job steps. This is optional when executing stored procedures because the sticky QUOTED_IDENTIFIER ON setting will override the session setting.

SQLCMD similarly uses ODBC and explicitly sets QUOTED_IDENTIFIER OFF. This is a common cause of inadvertent persisted QUOTED_IDENTIFIER OFF leaking into databases as meta-data when SQLCMD is used to deploy database changes. One must specify the SQLCMD -I (uppercase eye) argument to opt-in for QUOTED_IDENTIFIER ON. Additionally, deployment scripts should either explicitly include SET QUOTED_IDENTIFIER ON, SET ANSI_NULLS ON, and SET ANSI_PADDING ON statements or omit these set statements entirely so session settings are used. Avoid including SET OFF statements in scripts for these options.

BCP also uses ODBC, but as you might have guessed, explicitly sets QUOTED_IDENTIFIER OFF too. One needs to opt-in for QUOTED_IDENTIFIER ON by specify the -q (lower case queue) BCP argument to avoid runtime errors.

SSMS, which uses SqlClient, is nicer in that it sets all options properly by default and doesn’t turn set QUOTED_IDENTIFIER OFF behind your back. But be aware that SSMS will honor customized SET options specified for the current window (Query–>Query Options–>ANSI) and new windows (Tools–>Options–>Query Execution–>SQL Server–>ANSI). Make sure the Magic 7 settings are properly set in SSMS options.

Be mindful that SET statements executed in an SSMS query window change the session settings for the lifetime of the query window connection. Take care when executing DDL scripts in a reused query window and, when in doubt, check current session settings using DBCC USEROPTIONS to verify proper settings.
SSMS (and SMO) scripting tools have a terrible habit of including an extraneous SET ANSI_PADDING OFF at the end of CREATE TABLE scripts. Either remove the statement after scripting or set the ANSI PADDING generation option to False (Tools–>Options–>SQL Server Object Explorer–>Scripting–>Generate SET ANSI PADDING commands). This will help avoid unintentionally creating ANSI_PADDING OFF columns.

How to Identify Improper Persisted Settings
It’s common to find wrong settings in existing databases for the reasons mentioned earlier. The catalog view queries below will identify objects with problem persisted OFF settings.

--stored procedures, views, functions, triggers with QUOTED_IDENTIFIER or ANSI_NULLS OFF
 SELECT
       OBJECT_SCHEMA_NAME(o.object_id) AS SchemaName
     , OBJECT_NAME(o.object_id) AS ObjectName
     , o.type_desc AS ObjectType
 FROM sys.objects AS o
 WHERE
     0 IN(
           OBJECTPROPERTY(o.object_id, 'ExecIsQuotedIdentOn')
         , OBJECTPROPERTY(o.object_id, 'ExecIsAnsiNullsOn')
 )
 ORDER BY
       SchemaName
     , ObjectName;
 --columns with ANSI_PADDING OFF
 SELECT
       OBJECT_SCHEMA_NAME(t.object_id) AS SchemaName
     , OBJECT_NAME(t.object_id) AS ObjectName
     , c.name AS ColumnName
 FROM sys.tables AS t
 JOIN sys.columns AS c ON c.object_id = t.object_id
 JOIN sys.types AS ty ON ty.system_type_id = c.system_type_id AND ty.user_type_id = c.user_type_id
 WHERE
     c.is_ansi_padded = 0
     AND (
         (ty.name IN ('varbinary','varchar') AND c.max_length <> -1)
         OR (ty.name IN ('binary','char') AND c.is_nullable = 1)
 );

Summary
Attention to connection settings will facilitate using SQL Server features.

Performance testing with DBCC DROPCLEANBUFFERS

DBCC DROPCLEANBUFFERS is a common practice when unit testing SQL Server performance on an isolated test instance. This allows one to evaluate different candidates for query, stored procedure, and index tuning based on execution times in a worst-case cold buffer cache scenario and provides better test repeatability by leveling the playing field before each test. However, clearing cache in this way has considerations one should be aware of.

An important detail sometimes overlooked is that one must first execute a CHECKPOINT command in the context of the database(s) to be tested before executing DBCC DROPCLEANBUFFERS. DBCC DROPCLEANBUFFERS only frees pages that are not dirty (cached version same as on disk version) so modified pages will remain in cache when CHECKPOINT isn’t first executed. Overlooking the CHECKPOINT can result in non-repeatable test timings. One should always run CHECKPOINT before DBCC DROPCLEANBUFFERS.

One can make the argument that DBCC DROPCLEANBUFFERS might not be particularly valuable for testing. First, the storage engine in SQL Server Enterprise Edition (or Developer Edition, which is often used when testing) behaves differently with a cold cache versus a warm one. With a warm cache, a page not already in cache (e.g. index seek by primary key) will be fetched from disk using a single 8K page IO request as one expects. However, when the cache isn’t fully warmed up (Buffer Manager’s Target Pages not yet met), the entire 64K extent (8 contiguous 8K pages) is read for the single page request regardless of whether the adjacent pages are actually needed by the query. This has the benefit of warming the cache much more quickly than would otherwise occur, but given that the normal steady state of a production SQL Server is a warm cache, testing with a cold cache isn’t a fair comparison of different plans. More data than normal will be transferred from storage so timings may not be indicative of actual performance.

The storage engine also behaves differently during scans when data are not already cached regardless of the SQL Server edition. During sequential scans, read-ahead prefetches multiple extents from storage at a time so that data is in cache by the time it is actually needed by the query. This greatly reduces the time needed for large scans because fewer IOPS are required and sequential access reduces costly seek time against spinning media. Furthermore, Enterprise and Developer editions perform read-ahead reads more aggressively than lesser editions, up to 4MB (500 pages) in a single scatter-gather IO in later SQL Server versions.

The implication with cold cache performance testing is that both full extent reads and read-ahead prefetches are much more likely to occur such that test timings of different execution plans are not fairly comparable. These timings will over emphasize hardware (storage) performance rather than query performance as intended. Given hardware differences on a test system and that cold cache is not the typical production state, cold cache testing isn’t a good indicator of query performance and resource usage one will experience in a production system.

I recommend using logical reads as a primary performance measure when evaluating query and index tuning candidates. Logical reads is a count of the number of pages touched by the query regardless of whether data was read from storage or already cached, making it a better comparison indicator of data access resource utilization. The number of logical reads can be determined by running the query or procedure with SET STATISTICS IO ON and will be consistent regardless of whether physical IO was needed or not. Query times may be used as a secondary measure by running the query more than once, discarding the results of first run, and taking the average of subsequent executions. This is not to say these logical read measurements and timings will predict actual production performance but will allow one to more accurately evaluate resource usage of different execution plans.