Trace Flag 4199 Query Optimiser HotFixes

Background & History Of Trace Flag 4199

To summarise what Microsoft have documented here – Starting with SQL Server 2005 SP3 CU6, SQL Server 2008 SP1 CU7 & SQL Server 2008 R2 RTM, Trace Flag 4199 has been used as a ‘bucket’ to combine several individual query optimiser hotfixes under a single trace flag.

The query optimiser hotfixes contained under Trace Flag 4199 are intentionally not enabled by default.  This means when upgrading from SQL Server 2008 R2 to SQL Server 2012 for example, new query optimiser logic is not enabled.   The reason behind this according to the article linked above is to prevent plan changes that could cause query performance regressions.  This makes sense for highly optimised environments where application critical queries are tuned and rely on specific execution plans and any change in query optimiser logic could potentially cause unexpected / unwanted query regressions.

Conversely, you could argue that the majority of SQL Server environments probably aren’t highly optimised, don’t rely on specific execution plans and are potentially missing out on the benefit of the various query optimiser hotfixes due to this default behaviour.  With that in mind, the argument could be to have the query optimiser hotfixes enabled by default with the behaviour of Trace Flag 4199 reversed to disable the various query optimiser hotfixes if any serious regressions were encountered.

Trace Flag 4199 can be enabled at various scopes; query, session, globally.  The T-SQL for each of those scopes is as follows and see my previous blog post for a more comprehensive look at enabling trace flags in general.

-- Query Scope
SELECT ColumnName FROM TableName
OPTION ( QUERYTRACEON 4199 );
GO

-- Session Scope
DBCC TRACEON (4199);
GO

-- Global Scope
DBCC TRACEON (4199, -1);
GO

Trace Flag 4199 SQL Server 2016 Changes

Summarising again what Microsoft have documented well here, all fixes under Trace Flag 4199 to date will be enabled by default in SQL Server 2016 for databases at the latest compatibility level 130.  Going forward, Trace Flag 4199 will be reused as a bucket for future query optimiser hotfixes.

Database Scoped Configurations, one of the great new features in SQL Server 2016, gives you the ability to make several database-level configuration changes for properties that were previously configured at the instance-level.  In particular, the QUERY_OPTIMIZER_HOTFIXES database scoped configuration allows you to enable query optimiser hotfixes at the database scope where in previous versions of SQL Server you were restricted to enabling query optimiser hotfixes at the query, session or global scope using the logical equivalent Trace Flag 4199.  The T-SQL for this is:

USE [MyDatabase];
GO
ALTER DATABASE SCOPED CONFIGURATION SET QUERY_OPTIMIZER_HOTFIXES = ON;
GO

When To Use Trace Flag 4199 (Prior To SQL Server 2016)

New SQL Server / New Application:  When new SQL Server Instances are installed for new applications, there’s no risk of plan regressions and enabling Trace Flag 4199 globally merits consideration.

SQL Server Upgrade / Existing Application:  When upgrading a SQL Server Instance for an existing application, this will be down to the SQL Server DBAs preference.  Your stance might be that plan regressions are expected anyway as a side effect of the upgrade and you’d rather tackle this with TF 4199 enabled.  My advice would be, prior to upgrading, test your specific application workload under the latest database compatibility level for the version you’re upgrading to, with and without TF 4199 enabled and check for any plan regressions using the methodology outlined here.

Existing SQL Server / Existing Application:   The decision to enable TF 4199 shouldn’t be taken lightly as there’s always the possibility of plan regressions so thorough testing should be done to evaluate if this trace flag makes improvements to your specific application workload or specific queries.  Bear in mind that when enabled globally, all databases on the SQL Server instance will be affected.  TF 4199 could be a useful tool at the query level for the queries that might benefit from new query optimiser logic and whose performance could do with improvement.

Trace Flag 4199 Workload Testing

For the purpose of this demo, I will use HammerDB to generate and populate a database named ‘tpcc’ that I will then run TPC-C (OLTP) workloads against.  The tpcc database will be created using the ‘original schema’ and I will use the exact same HammerDB configuration options detailed in my previous blog post.  Benchmarks will be taken against the tpcc database with and without Trace Flag 4199 enabled globally to show the potential impact this trace flag can have on a workload:

  1. Benchmark 1: SQL Server 2008 R2 SP3 (Database Compatibility Level 100)
  2. Benchmark 2: SQL Server 2012 SP2 (Database Compatibility Level 110)
  3. Benchmark 3: SQL Server 2014 SP2 (Database Compatibility Level 120)

My demo environment consists of three Windows Server 2012 R2 VMs each having a different version of SQL Server installed as above.   Each instance of SQL Server is configured to use a maximum of 8GB of memory with MAXDOP set to 4, CTOP set to 25 and the tpcc database using the simple recovery model.  I will create the tpcc database on the SQL Server 2008 R2 SP3 VM first and immediately backup the database which will be copied to the other VMs and will serve as the starting point for each workload test.

I will then run another set of tests using the same strategy but against a tpcc database generated by HammerDB using the updated schema.

The metrics I will gather are HammerDB’s TPM (Transactions Per Minute) and NOPM (New Orders Per Minute) along with PerfMon counters ‘Batches Requests/Sec’ and ‘% Processor Time’.  The results are as follows:

Original HammerDB Schemaoriginalschemaresults

Updated HammerDB Schemaupdatedschemaresults

As you can see from the results above, there were no serious regressions across workloads when TF 4199 was enabled and there were no massive improvements either.  This just reinforces the need to test your specific application workload or specific queries with TF 4199 as there’s no guarantee it will improve query performance.

Trace Flag 4199 In Action

For this demo, I’m going to restore a copy of AdventureWorks2012 onto my SQL Server 2016 RTM instance and leave the database at compatibility level 110.  I’m then going to issue a query that will expose a known bug and we’ll take a look into that.

First, restore AdventureWorks2012, ensure the database compatibility level is at 110 and then issue the query that exposes the known bug and be sure to capture the actual execution plan.

ALTER DATABASE [AdventureWorks2012] SET COMPATIBILITY_LEVEL = 110
GO

SELECT SC.CustomerID, SC.StoreID FROM Sales.Customer SC
INNER JOIN Sales.SalesOrderHeader SOH ON SOH.CustomerID = SC.CustomerID
WHERE SOH.ShipMethodID = 1
AND SC.CustomerID NOT IN
(
        SELECT SC2.CustomerID FROM Sales.Customer SC2
        WHERE SC2.TerritoryID = 4 AND SC2.CustomerID = SC.CustomerID
);
GO

query110_notf

query110_notf_mjproperties

Looking at the execution plan above and the properties of the ‘Right Anti Semi Join’ logical operator, we can see an inaccurate cardinality estimation.  The estimated number of rows is 1, but the actual number of rows returned is 15124.  Use the below T-SQL to insert the query stats for this query into a temporary table for later analysis:

SELECT  DB_NAME(qt.dbid) AS [Database],
		'No Trace Flags Enabled' AS [Trace Flags],
		qp.query_plan.value('(//@CardinalityEstimationModelVersion)[1]','INT') AS [CE_Version],
        CONVERT(float, (qp.query_plan.value('(//@EstimateRows)[1]', 'varchar(128)'))) AS [Estimated_Rows],
        qs.last_rows AS [Last_Rows],
        CONVERT(float, (qp.query_plan.value('(//@EstimateRows)[1]', 'varchar(128)'))) - (qs.last_rows) AS [Estimate_Skew],
        qs.total_logical_reads / qs.execution_count AS [Avg_Logical_Reads],
        CAST((qs.total_elapsed_time ) / qs.execution_count AS DECIMAL(28, 2)) AS [Avg_Execution_Time(µs)],
        CAST((qs.total_worker_time) / qs.execution_count AS DECIMAL(28, 2)) AS [Avg_CPU_Time(µs)],
        SUBSTRING(qt.TEXT, (qs.statement_start_offset / 2) + 1, ((CASE qs.statement_end_offset WHEN - 1 THEN DATALENGTH(qt.TEXT) ELSE qs.statement_end_offset END - qs.statement_start_offset ) / 2 ) + 1) AS [Query_Text],
        qs.query_hash AS [QueryHash],
        qp.query_plan AS [QueryPlan]
INTO #TempQueryStats
FROM sys.dm_exec_cached_plans cp WITH (NOLOCK)
        CROSS APPLY sys.dm_exec_query_plan (cp.plan_handle) qp
        CROSS APPLY sys.dm_exec_sql_text (cp.plan_handle) qt
        INNER JOIN sys.dm_exec_query_stats qs ON qs.plan_handle = cp.plan_handle
WHERE   qt.text LIKE 'SELECT SC%'
AND   DB_NAME(qt.dbid) = 'AdventureWorks2012'
GO

We know that inaccurate cardinality estimations can lead to inefficient plan choices which can in turn result in degraded performance.  Let’s see if Trace Flag 4199 can help us out here.  First, flush the plan cache and then re-execute the query but this time using Trace Flag 4199:

DBCC FREEPROCCACHE;
GO
SELECT SC.CustomerID, SC.StoreID FROM Sales.Customer SC
INNER JOIN Sales.SalesOrderHeader SOH ON SOH.CustomerID = SC.CustomerID
WHERE SOH.ShipMethodID = 1
AND SC.CustomerID NOT IN
(
        SELECT SC2.CustomerID FROM Sales.Customer SC2
        WHERE SC2.TerritoryID = 4 AND SC2.CustomerID = SC.CustomerID
)
OPTION (QUERYTRACEON 4199);
GO

query110_tf4199

query110_tf4199_mjproperties

Looking at the properties of the ‘Right Anti Semi Join’ logical operator, we can see that under Trace Flag 4199, the query optimiser estimated 15124 rows which was exactly right.  Now that the query optimiser is able to accurately estimate cardinality, it has compiled a different execution plan from that of the first execution without TF 4199.

Again, execute the below T-SQL to insert the query stats for this query into a temporary table for later analysis:

INSERT INTO #TempQueryStats
SELECT  DB_NAME(qt.dbid) AS [Database],
		'TF 4199 Enabled' AS [Trace Flags],
		qp.query_plan.value('(//@CardinalityEstimationModelVersion)[1]','INT') AS [CE_Version],
        CONVERT(float, (qp.query_plan.value('(//@EstimateRows)[1]', 'varchar(128)'))) AS [Estimated_Rows],
        qs.last_rows AS [Last_Rows],
        CONVERT(float, (qp.query_plan.value('(//@EstimateRows)[1]', 'varchar(128)'))) - (qs.last_rows) AS [Estimate_Skew],
        qs.total_logical_reads / qs.execution_count AS [Avg_Logical_Reads],
        CAST((qs.total_elapsed_time ) / qs.execution_count AS DECIMAL(28, 2)) AS [Avg_Execution_Time(µs)],
        CAST((qs.total_worker_time) / qs.execution_count AS DECIMAL(28, 2)) AS [Avg_CPU_Time(µs)],
        SUBSTRING(qt.TEXT, (qs.statement_start_offset / 2) + 1, ((CASE qs.statement_end_offset WHEN - 1 THEN DATALENGTH(qt.TEXT) ELSE qs.statement_end_offset END - qs.statement_start_offset ) / 2 ) + 1) AS [Query_Text],
        qs.query_hash AS [QueryHash],
        qp.query_plan AS [QueryPlan]
--INTO #TempQueryStats
FROM sys.dm_exec_cached_plans cp WITH (NOLOCK)
        CROSS APPLY sys.dm_exec_query_plan (cp.plan_handle) qp
        CROSS APPLY sys.dm_exec_sql_text (cp.plan_handle) qt
        INNER JOIN sys.dm_exec_query_stats qs ON qs.plan_handle = cp.plan_handle
WHERE   qt.text LIKE 'SELECT SC%'
AND   DB_NAME(qt.dbid) = 'AdventureWorks2012'
GO

As stated earlier, “All fixes under Trace Flag 4199 to date will be enabled by default in SQL Server 2016 for databases at the latest compatibility level 130”.  Let’s test that out.  First, set the database compatibility level to 130, flush the plan cache and re-execute the same query but with no trace flags this time:

ALTER DATABASE [AdventureWorks2012] SET COMPATIBILITY_LEVEL = 130
GO

DBCC FREEPROCCACHE
GO

SELECT SC.CustomerID, SC.StoreID FROM Sales.Customer SC
INNER JOIN Sales.SalesOrderHeader SOH ON SOH.CustomerID = SC.CustomerID
WHERE SOH.ShipMethodID = 1
AND SC.CustomerID NOT IN
(
        SELECT SC2.CustomerID FROM Sales.Customer SC2
        WHERE SC2.TerritoryID = 4 AND SC2.CustomerID = SC.CustomerID
);
GO

query130_notf

query130_notf_mjproperties

With the database compatibility level set to 130, the query optimiser makes an accurate cardinality estimation and compiles the same execution plan that was seen when the database compatibility level was 110 and when using Trace Flag 4199.  There’s the proof that in SQL Server 2016 the query optimiser hotfixes previously collected under TF 4199 are enabled by default for databases at the latest compatibility level.

Use the below T-SQL to insert the query stats into the same temporary table:

INSERT INTO #TempQueryStats
SELECT  DB_NAME(qt.dbid) AS [Database],
		'No Trace Flags Enabled' AS [Trace Flags],
		qp.query_plan.value('(//@CardinalityEstimationModelVersion)[1]','INT') AS [CE_Version],
        CONVERT(float, (qp.query_plan.value('(//@EstimateRows)[1]', 'varchar(128)'))) AS [Estimated_Rows],
        qs.last_rows AS [Last_Rows],
        CONVERT(float, (qp.query_plan.value('(//@EstimateRows)[1]', 'varchar(128)'))) - (qs.last_rows) AS [Estimate_Skew],
        qs.total_logical_reads / qs.execution_count AS [Avg_Logical_Reads],
        CAST((qs.total_elapsed_time ) / qs.execution_count AS DECIMAL(28, 2)) AS [Avg_Execution_Time(µs)],
        CAST((qs.total_worker_time) / qs.execution_count AS DECIMAL(28, 2)) AS [Avg_CPU_Time(µs)],
        SUBSTRING(qt.TEXT, (qs.statement_start_offset / 2) + 1, ((CASE qs.statement_end_offset WHEN - 1 THEN DATALENGTH(qt.TEXT) ELSE qs.statement_end_offset END - qs.statement_start_offset ) / 2 ) + 1) AS [Query_Text],
        qs.query_hash AS [QueryHash],
        qp.query_plan AS [QueryPlan]
--INTO #TempQueryStats
FROM sys.dm_exec_cached_plans cp WITH (NOLOCK)
        CROSS APPLY sys.dm_exec_query_plan (cp.plan_handle) qp
        CROSS APPLY sys.dm_exec_sql_text (cp.plan_handle) qt
        INNER JOIN sys.dm_exec_query_stats qs ON qs.plan_handle = cp.plan_handle
WHERE   qt.text LIKE 'SELECT SC%'
AND   DB_NAME(qt.dbid) = 'AdventureWorks2012'
GO

Execute this T-SQL to take a look at the query stats for the various query executions:

SELECT * FROM #TempQueryStats
ORDER BY [Avg_CPU_Time(µs)] DESC OPTION(RECOMPILE);

query_stats

The worst performer was the first execution when the AdventureWorks2012 database compatibility level was 110 and TF 4199 was not enabled.  Under this compatibility level, the query optimiser will use the legacy CE to compile execution plans which can be seen in the output above – ‘CE_Version’ = 70.  The estimate skew is -22184.35425 but note that this is the root level skew referring to the far left / final operator in the execution plan.  We know that the skew originated in the leaf / intermediate levels of the execution plan, specifically the ‘Right Anti Semi Join’ logical operator, where the query optimiser was unable to accurately estimate the cardinality.  In this case, the leaf-level skew flowed up to the root and drove the optimiser to choose an inefficient plan which can be seen from the query stats as average logical reads and average CPU time are multiple times higher compared to the other executions.

When Trace Flag 4199 was enabled, the query optimiser was able to make an accurate cardinality estimation and a more efficient plan was chosen.  This is reflected in the query stats above.  The same behaviour was observed in the final execution when the database compatibility level was raised to 130 and the query was compiled using the new CE ( ‘CE_Version’ = 130 ).

Resources & Further Reading

SQL_Sasquatch ( @sql_sasquatch )has a great series on this trace flag which can be found here.

I also recommend this post by Microsoft which shows how they used TF 4199 to solve a QO bug in SQL Server 2014 SP1.

Paul White ( @SQL_Kiwi ) has a great answer to this StackExchange question regarding the anti-semi join bug.

 

Advertisements
This entry was posted in Statistics, CE & QO. Bookmark the permalink.

One Response to Trace Flag 4199 Query Optimiser HotFixes

  1. Pingback: Query Optimizer Hotfixes – Curated SQL

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s