Update Multiple Statistics On The Same Table Concurrently – Locks & Trace Flag 7471

Default Behaviour

By default, only one statistic object (index statistic or column statistic) on a single table can be updated at a time.  This is down to the locks acquired by the database engine during the statistic update operation.  For all automatic and manual statistic update operations, SQL Server acquires ‘Sch-S: SCHEMA’ and ‘X: UPDSTATS’ locks:

  • Sch-S: SCHEMA: The engine uses schema stability (Sch-S) locks when compiling and executing queries.  Sch-S locks do not block any transactional locks, including exclusive (X) locks.  However, concurrent DDL operations, and concurrent DML operations that acquire Sch-M locks, cannot be performed on the table.  In the context of the statistic operation, the schema stability lock ensures that the table itself or indexes on the table are not dropped while the statistic is being updated.
  • X: UPDSTATS: The exclusive (X) lock acquired by the statistics operation is on the UPDSTATS resource of the table – a resource only relevant to statistics operations and which prevents concurrent updates of multiple statistics against the same table.  The UPDSTATS resource exists to limit the number of statistic operations on a single table to one at a time.  Automatic statistics operations do not wait to acquire locks on this resource, they abort if another operation has an exclusive lock on it.  Conversely, manual statistics operations will wait for the resource.

Let’s take a look at this in action.  For this demo, I’m using the AdventureWorks2012 database and Jonathan Kehayias’ scripts to create the Sales.SalesOrderDetailEnlarged table.  In two separate SSMS windows, I will issue UPDATE STATISTICS commands against two different statistic objects belonging to the Sales.SalesOrderDetailEnlarged table.  I will then take a look at the locks and blocking caused by the concurrent statistics operations using Adam Machanic’s sp_whoisactive and the sys.dm_tran_locks DMV:

-- SELECT @@SPID = 56
UPDATE STATISTICS Sales.SalesOrderDetailEnlarged
(PK_SalesOrderDetailEnlarged_SalesOrderID_SalesOrderDetailID)
WITH FULLSCAN;

-- SELECT @@SPID = 57
UPDATE STATISTICS Sales.SalesOrderDetailEnlarged
(IX_SalesOrderDetailEnlarged_ProductID)
WITH FULLSCAN;

Once the update statistics commands have been issued, execute the sp_WhoIsActive stored procedure:

EXEC master.dbo.sp_WhoIsActive
	@get_locks = 1,
	@output_column_list = '[session_id],[status],[blocking_session_id], [wait_info],[locks],[sql_text]';

1

The output above shows that the second update statistic operation issued by session 57 is being blocked by session 56.  Session 57 will only complete once the first update statistics operation issued by session 56 completes and releases the exclusive (X) lock it holds.

Querying sys.dm_tran_locks reveals similar information:

SELECT 	request_session_id, resource_type,
		resource_subtype, resource_description,
		request_status, request_mode
FROM sys.dm_tran_locks
WHERE request_session_id IN (56,57)
AND request_owner_type != 'SHARED_TRANSACTION_WORKSPACE'
ORDER BY request_session_id

2

The output above again shows the exclusive (X) lock taken on the OBJECT.UPDSTATS resource by session 56 is blocking session 57 that is also attempting to acquire an exclusive (X) lock on the same resource.

Behaviour Under Trace Flag 7471

Running multiple UPDATE STATISTICS commands for different statistics on a single table concurrently has been available under global Trace Flag 7471 since SQL Server 2014 SP1 CU6 and SQL Server 2016 CU1.  Microsoft have documented this trace flag here and here.

Let’s enable the trace flag:

DBCC TRACEON (7471,-1);
GO

Now, re-issue the same update statistics commands and take a look at the locks/blocks using the same methods as before:3

The sp_WhoIsActive stored procedure output above shows that there is no blocking now.

4

Looking at sys.dm_tran_locks we can see the locking behaviour has changed under Trace Flag 7471.  Previously, an exclusive (X) lock was acquired on the OBJECT.UPDSTATS resource and now, under Trace Flag 7471 an update (U) lock is acquired on the METADATA.STATS resource for the specific statistic object that is being updated.  This can be seen in the rows highlighted in red in the image above and in the image below where the resource_ description column shows the update (U) locks taken on the separate statistic metadata resources; stats_id = 1 and stats_id = 3.

5

This change in locking behaviour allows the concurrent executions of UPDATE STATISTICS on the same table.

Conclusion

Trace Flag 7471 could be leveraged to potentially reduce the overall execution time of update statistics maintenance jobs as shown by Jonathan Kehayias.  It’s probably only practical for VLDBs running on capable hardware and with a sufficient I/O subsystem though.

Microsoft’s recommended best practice is to only use TF 7471 during dedicated maintenance windows due to the fact that it increases the possibility of deadlocks.  There are other factors to consider too though:

  • Trace Flag 7471 is a global trace flag meaning all databases on the SQL Server instance will be affected.
  • Information regarding this trace flag is sparse both in terms of official documentation and un-official documentation (blog posts etc) suggesting that it isn’t widely used.
  • Updating multiple statistics on a single table concurrently can be very resource intensive. Updating four separate statistics with a full scan on the same table concurrently will still require four scans of the table.

 

Advertisements
Posted in Statistics, CE & QO | 1 Comment

Statistics – Single Threaded & Parallel Operations

With default database settings, SQL Server automatically creates and updates statistics to use during query optimisation in order to build efficient query execution plans.  Let’s take a look at how these statistics operations are processed internally and determine which operations run in parallel (across multiple threads) and which operations run in serial (single-threaded).

Sampled Statistics

During query compilation, if the optimiser decides that cardinality estimates would be improved by having statistical information about the data in a column it will attempt to create the necessary column statistic during this process.  The column statistic creation is done over a random sample of the pages to keep compilation time to a minimum.

Similarly, the query optimiser looks at how many changes have occurred for existing statistic objects as part of query optimisation. If the number of rows that have changed exceed certain thresholds, SQL Server will, by default, update the statistic synchronously and then optimise the query.  The statistic update in this scenario is again done over a random sample of selected pages and will only be triggered when/if the threshold has been reached AND the index/column is touched by the query.  This as a Recompilation Threshold, not a Statistics Threshold.

The default sample size is calculated using a non-linear algorithm where the sample size decreases as the table size increases.  The default sampling rate algorithm is as follows:

  • If the table < 8MB then the statistic is gathered using a fullscan.
  • If the table > 8MB then the statistic is gathered using a non-linear algorithm where the sampling rate reduces as the number of rows in the table increases.

The sampling algorithm does not randomly sample rows, it actually samples all of the rows on “randomly” selected pages as highlighted by Conor Cunningham.  When statistics are created and updated, the optimiser must choose an access path to gather the statistics.  Where possible, for sampled statistics SQL Server will try to find an access path that is NOT physically sorted on the leading statistics key column in order to provide a more random sample and ultimately, more accurate statistics.  For statistics gathered with a full scan, the sort order of the access paths is not significant for statistical accuracy, so the lowest cost access path is chosen.

Sometimes the default sampling algorithm doesn’t capture enough interesting points in the data and the resulting histogram doesn’t accurately represent the distribution of data in the column.  In these cases, a higher sample rate can be manually/explicitly specified with the UPDATE STATISTICS command to produce a better histogram.

Sampled Statistics Demo

Restore the AdventureWorks2012 DB onto a SQL Server 2012 SP3 VM and then use Jonathan Kehayias’ scripts to create the Sales.SalesOrderHeaderEnlarged table.  Use the below T-SQL to take a look at the statistics on this table:

SELECT
sp.last_updated, OBJECT_NAME(so.object_id) AS [table], stat.stats_id,
stat.name as stats_name, STUFF((SELECT ', ' + cols.name
FROM sys.stats_columns AS statcols
JOIN sys.columns AS cols ON statcols.column_id=cols.column_id
	AND statcols.object_id=cols.object_id
WHERE statcols.stats_id = stat.stats_id
	AND statcols.object_id=stat.object_id
ORDER BY statcols.stats_column_id
FOR XML PATH(''), TYPE ).value('.', 'NVARCHAR(MAX)'), 1, 2, '') as stat_cols,
sp.modification_counter, sp.rows, sp.rows_sampled
FROM sys.stats as stat
CROSS APPLY sys.dm_db_stats_properties (stat.object_id, stat.stats_id) AS sp
JOIN sys.objects as so ON stat.object_id=so.object_id
JOIN sys.schemas as sc ON so.schema_id=sc.schema_id
WHERE
	sc.name = 'Sales' -- Schema Name
	AND so.name = 'SalesOrderHeaderEnlarged' -- Table Name
ORDER BY 1 DESC OPTION (RECOMPILE)
GO

1

Configure an Extended Events session to capture the following events; auto_stats, query_post_execution_showplan, sp_statement_completed and sql_statement_completed.  Add a filter on the database name to capture only events for the AdventureWorks2012 database and then start the XE session:

CREATE EVENT SESSION [StatsCapture] ON SERVER
ADD EVENT sqlserver.auto_stats(
    WHERE ([sqlserver].[database_name]=N'AdventureWorks2012')),
ADD EVENT sqlserver.query_post_execution_showplan(
    WHERE ([sqlserver].[database_name]=N'AdventureWorks2012')),
ADD EVENT sqlserver.sp_statement_completed(SET collect_object_name=(1),collect_statement=(1)
    WHERE ([sqlserver].[equal_i_sql_unicode_string]([sqlserver].[database_name],N'AdventureWorks2012')),
ADD EVENT sqlserver.sql_statement_completed(
    WHERE ([sqlserver].[database_name]=N'AdventureWorks2012'))
ADD TARGET package0.event_file(SET filename=N'C:\Temp\StatsCapture.xel',max_file_size=(256))
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=5 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF)
GO

Issue the below T-SQL query to initiate the auto-creation of a column statistic:

SELECT * FROM Sales.SalesOrderHeaderEnlarged
WHERE Status <> 5;
GO

There are no existing column statistics for the status column so we would expect the optimiser to automatically create a column statistic in order to get an accurate cardinality estimate.  We would also expect sampled statistics to be gathered to keep compilation time to a minimum.  Reviewing the trace, there should be a number of entries:

xe1

The first two events query_post_execution_showplan & sp_statement_completed are as follows:

01

SELECT StatMan([SC0], [SB0000])
FROM
(
	SELECT TOP 100 PERCENT [SC0], step_direction([SC0]) over (order by NULL) AS [SB0000] FROM
		(SELECT [Status] AS [SC0] FROM [Sales].[SalesOrderHeaderEnlarged] TABLESAMPLE SYSTEM (6.223095e+000 PERCENT) WITH (READUNCOMMITTED)) AS _MS_UPDSTATS_TBL_HELPER
	ORDER BY [SC0], [SB0000]
) AS _MS_UPDSTATS_TBL  OPTION (MAXDOP 1)

These first two events show how the auto-creation of the column statistic on the status column is processed internally.  The first event, query_post_execution_showplan, is the graphical rendering of the execution plan that is used to generate the column statistic.  The second event, sp_statement_completed, shows the actual internal query executed to create the column statistic.  A few things to note:

  • The internal query uses the TABLESAMPLE clause, ‘TABLESAMPLE SYSTEM (6.223095e+000 PERCENT)’, which is responsible for the sampling and is documented here.
  • SQL Server also generates an ‘OPTION (MAXDOP 1)’ query hint on the internal query. The graphical rendering above shows the serial execution plan that was generated.
  • The StatMan function referenced in the internal query is a special internal aggregate function used for statistics operations and is represented by the stream aggregate operator in the graphical execution plan. The details of StatMan are undocumented, but Conor Cunningham provides some information here.

The third event, auto_stats, also lets us know that a statistic object has been automatically created.  The fourth and fifth event represent the original SELECT query submitted.  Note the order of the events here – the statistic creation is initiated and the query optimiser waits for the statistics before compiling and executing the user submitted query.  This is the default behaviour.

Re-running the same T-SQL query against sys.dm_db_stats_properties, the new column statistic will now be visible:

SELECT
sp.last_updated, OBJECT_NAME(so.object_id) AS [table], stat.stats_id,
stat.name as stats_name, STUFF((SELECT ', ' + cols.name
FROM sys.stats_columns AS statcols
JOIN sys.columns AS cols ON statcols.column_id=cols.column_id
	AND statcols.object_id=cols.object_id
WHERE statcols.stats_id = stat.stats_id
	AND statcols.object_id=stat.object_id
ORDER BY statcols.stats_column_id
FOR XML PATH(''), TYPE ).value('.', 'NVARCHAR(MAX)'), 1, 2, '') as stat_cols,
sp.modification_counter, sp.rows, sp.rows_sampled
FROM sys.stats as stat
CROSS APPLY sys.dm_db_stats_properties (stat.object_id, stat.stats_id) AS sp
JOIN sys.objects as so ON stat.object_id=so.object_id
JOIN sys.schemas as sc ON so.schema_id=sc.schema_id
WHERE
	sc.name = 'Sales' -- Schema Name
	AND so.name = 'SalesOrderHeaderEnlarged' -- Table Name
ORDER BY 1 DESC OPTION (RECOMPILE)
GO

3

79803 rows were sampled out of a total 1258600 rows. This further confirms that sampling was used when the column statistic was automatically created.  The sampling rate can be calculated as:

Sampling Rate = (Sampled Rows / Total Rows) * 100
Sampling Rate = SELECT (79803.000/1258600.000) * 100.000 = 6.341

The 6.341 sampling rate derived from the above calculation is not far away from the value specified in the ‘TABLESAMPLE SYSTEM (6.223095e+000 PERCENT)’ clause of the internal StatMan query.

Now that we have a column statistic for the status column, we’re going to invalidate the statistic and then re-issue the same query to this time initiate an auto-update of statistics.  Run the below T-SQL:

UPDATE STATISTICS Sales.SalesOrderHeaderEnlarged
_WA_Sys_00000006_24B26D99 WITH SAMPLE 0 ROWS;

The update statistics sampling zero rows command basically wipes out all statistics information and will result in an empty statistic object, this is documented here.  Using DBCC SHOW_STATASTICS we can confirm this:

DBCC SHOW_STATISTICS ('Sales.SalesOrderHeaderEnlarged', _WA_Sys_00000006_24B26D99)
WITH STAT_HEADER;

4

Now, issue a non-updating update.  The below T-SQL will update the status column in a single row of the table which will increment the column modification counter. The combination of an empty statistics object and an incremented modification counter simulates creating statistics on an empty table, then adding a row which will initiate a statistic update as per the recompilation thresholds described earlier.

BEGIN TRANSACTION;
    UPDATE TOP (1)
        Sales.SalesOrderHeaderEnlarged
    SET Status = Status;
ROLLBACK TRANSACTION;

This strategy of updating the statistic object with ‘SAMPLE 0 ROWS’ and then issuing a non-updating update for one value of that column is a very useful way to quickly invalidate statistics on very large tables when you want to trigger an auto-update of specific statistics.

Restart the XE session and re-execute the query:

DBCC FREEPROCCACHE;

SELECT * FROM Sales.SalesOrderHeaderEnlarged
WHERE Status <> 5;
GO

xe2

Reviewing the trace information, the order of events clearly show that the column statistic is updated during query optimisation.  Looking at the first query_post_execution_showplan event and the sp_statement_completed event we see the same graphical execution plan and the same internal query as before and from this information we can deduce that the automatic update of column statistics also uses default sampling.  We can further deduce that gathering sampled statistics, during both auto-creation or auto-updating operations, is a single-threaded operation.

What if statistics are updated manually specifying a higher sample percentage than the default sample rate would generate?  This can be useful when the default sampling algorithm doesn’t capture enough interesting points in the data and the resulting histogram doesn’t accurately represent the distribution of data in the column and you maybe don’t want to pay the full cost of updating statistics using a full scan:

UPDATE STATISTICS Sales.SalesOrderHeaderEnlarged
_WA_Sys_00000006_24B26D99 WITH SAMPLE 50 PERCENT;

xe2-5

updstats50

SELECT StatMan([SC0], [SB0000])
FROM
(
	SELECT TOP 100 PERCENT [SC0], step_direction([SC0]) over (order by NULL) AS [SB0000]  FROM
		(SELECT [Status] AS [SC0] FROM [Sales].[SalesOrderHeaderEnlarged] TABLESAMPLE SYSTEM (4.999828e+001 PERCENT) WITH (READUNCOMMITTED)) AS _MS_UPDSTATS_TBL_HELPER
	ORDER BY [SC0], [SB0000]
)
AS _MS_UPDSTATS_TBL  OPTION (MAXDOP 1)

Reviewing the XE session information above it is clear that the gathering of sampled statistics for a manual update using the sample clause is also a single threaded operation.

FULLSCAN Statistics

SQL Server implicitly creates an associated statistic object when an index is created.  Similarly, when an index is rebuilt, the associated index statistic is updated.  The index statistic created/updated as a by-product of the index creation/rebuild operation will be gathered with a full scan as the index operations involve reading all of the pages into memory anyway.  This is an optimisation, the index operations are costly so SQL Server performs the statistic creation/update as part of this process to save having to re-read those pages again later.  Column statistics can also be manually created/updated with fullscan.

The gathering of statistics using FULLSCAN can be run in parallel since SQL Server 2005.

FULLSCAN Statistics Demo

Restart the XE session and issue the below T-SQL command to manually update the column statistic with a full scan:

UPDATE STATISTICS Sales.SalesOrderHeaderEnlarged
_WA_Sys_00000006_24B26D99 WITH FULLSCAN;

Reviewing the trace file, the query_post_execution_showplan event below shows a parallel plan was generated.  Notice also the higher CPU and reads for this full scan statistic operation compared to the sampled statistic operation in the previous demo, which is obviously to be expected.

xe3

7

The sp_statement_completed event shows the internal query:

SELECT StatMan([SC0])
FROM
(
	SELECT TOP 100 PERCENT [Status] AS [SC0] FROM [Sales].[SalesOrderHeaderEnlarged] WITH (READUNCOMMITTED)
	ORDER BY [SC0]
)
AS _MS_UPDSTATS_TBL  OPTION (MAXDOP 16)

The ‘OPTION (MAXDOP 16)’ query hint cannot be manually overridden, if you’d like to be able to manually specify a MAXDOP value, which is available for index rebuilds and database consistency checks, then be sure to upvote this connect item.   Resource Governor can be used to reduce the parallelism if required but it is an Enterprise Edition only feature and probably overkill for this use case as vented by Aaron Bertrand here.

Run the query against sys.dm_db_stats_properties again to check how many rows were sampled:

statsfullscan

As expected, every row was sampled for the _WA_Sys_00000006_24B26D99 column statistic as the statistics were gathered using a full scan.

Now, drop the column statistic:

DROP STATISTICS Sales.SalesOrderHeaderEnlarged._WA_Sys_00000006_24B26D99;

Create an index on the status column and re-run the query against sys.dm_db_stats_properties again to confirm an index statistic was created as a by-product of the index creation and all the rows were sampled as the index operation had to process all of the rows anyway:

CREATE INDEX IX_SalesOrderHeaderEnlarged_Status
ON Sales.SalesOrderHeaderEnlarged (Status)

8

Rebuilding the index has the same effect:

ALTER INDEX IX_SalesOrderHeaderEnlarged_Status
ON Sales.SalesOrderHeaderEnlarged REBUILD;

9

Drop the index now to cleanup:

DROP INDEX Sales.SalesOrderHeaderEnlarged.IX_SalesOrderHeaderEnlarged_Status;

Parallel Sampled Statistics In SQL Server 2016

“Starting with SQL Server 2016, sampling of data to build statistics is done in parallel, when using compatibility level 130, to improve the performance of statistics collection. The query optimiser will use parallel sample statistics whenever a table size exceeds a certain threshold.”

As per the previous demos, prior to SQL Server 2016, the gathering of sampled statistics are serial, single-threaded operations.  Only statistic operations gathered using a full scan would qualify for parallelism.  Now in SQL Server 2016, all automatic and manual statistic operations qualify for parallelism.

Parallel Sampled Statistics In SQL Server 2016 Demo

Backup the AdventureWorks2012 DB, restore it to a SQL Server 2016 instance and set the database compatibility level to 130.  As before, configure an Extended Events session to capture the same events.  Issue the below T-SQL query to initiate the auto-creation of a column statistic:

SELECT * FROM Sales.SalesOrderHeaderEnlarged
WHERE Status <> 5;
GO

There are no existing column statistics for the status column so we would expect the optimiser to automatically create a column statistic in order to get an accurate cardinality estimate.  We would also expect sampled statistics to be gathered to keep compilation time to a minimum.  Since we are on a SQL Server 2016 instance and the database is at the latest compatibility level, we would also expect the sampled statistics to qualify for parallelism.  Let’s verify this by reviewing the trace file:

xe4

The first two events query_post_execution_showplan & sp_statement_completed are as follows:

10

SELECT StatMan([SC0], [SB0000])
FROM
(
	SELECT TOP 100 PERCENT [SC0], step_direction([SC0]) over (order by NULL) AS [SB0000] FROM
		(SELECT [Status] AS [SC0] FROM [Sales].[SalesOrderHeaderEnlarged] TABLESAMPLE SYSTEM (6.223095e+000 PERCENT) WITH (READUNCOMMITTED)) AS _MS_UPDSTATS_TBL_HELPER
	ORDER BY [SC0], [SB0000]
)
AS _MS_UPDSTATS_TBL  OPTION (MAXDOP 4)

These first two events show how the auto-creation of the column statistic on the status column is processed internally.  The first event, query_post_execution_showplan, is the graphical rendering of the execution plan that is used to generate the column statistic and it clearly shows parallel operators in the execution plan.  Furthermore, the dop column in the Extended Event trace file highlighted in red above, shows that the value for degrees of parallelism was four.

The second event, sp_statement_completed, shows the actual internal query executed to create the column statistic.  In SQL Server 2016 SP1, SQL Server generates an ‘OPTION (MAXDOP 4)’ query hint on the internal query which is respecting the ‘Max Degree Of Parallelism’ (MAXDOP) instance level setting.  With the introduction of ‘Database Scoped Configurations’ in SQL Server 2016, MAXDOP can be set at the database level and the parallel plan generated by the statistic operation would respect this setting if set.

The third and fourth auto_stats events also lets us know that a statistic object has been automatically created and loaded.  Looking at sys.dm_db_stats_properties we can again verify that the auto-creation of the column statistic used default sampling:

11

Now that we have a column statistic for the status column and proved that the auto-creation of the column statistic used default sampling and ran in parallel, we’re going to invalidate the statistic and then re-issue the same query to initiate an auto-update of statistics.  Run the below T-SQL:

BEGIN TRANSACTION;
    UPDATE TOP (314655)
        Sales.SalesOrderHeaderEnlarged
    SET Status = Status;
ROLLBACK TRANSACTION;

Restart the XE session and re-execute the query:

DBCC FREEPROCCACHE;

SELECT * FROM Sales.SalesOrderHeaderEnlarged
WHERE Status <> 5;
GO

xe5

Reviewing the trace information, the order of events clearly show that the column statistic is updated during query optimisation.  Looking at the second event, query_post_execution_showplan, the dop column again contains a value of four meaning that the statistic update operation ran in parallel.  Looking at sys.dm_db_stats_properties we can again verify that the auto-update of the column statistic used default sampling:

11

What if statistics are updated manually specifying a higher sample percentage than the default sample rate would generate?

UPDATE STATISTICS Sales.SalesOrderHeaderEnlarged
_WA_Sys_00000006_24B26D99 WITH SAMPLE 50 PERCENT;

xe6

This also qualifies for parallelism.  With this information, we can deduce that gathering sampled statistics, during the auto-creation, auto-updating and manual updating of statistics, qualifies for parallelism in SQL Server 2016.

Conclusion

Prior to SQL Server 2016, the gathering of sampled statistics are serial, single-threaded operations.  Only statistic operations gathered using a full scan would qualify for parallelism.  Now in SQL Server 2016, all automatic and manual statistic operations qualify for parallelism.

Resources

https://sqlperformance.com/2016/07/sql-statistics/statistics-maxdop

https://technet.microsoft.com/en-us/library/cc966419.aspx?f=255&MSPPError=-2147217396#XSLTsection129121120120

https://blogs.msdn.microsoft.com/sqlserverstorageengine/2016/05/23/query-optimizer-additions-in-sql-server/

http://sqlblog.com/blogs/elisabeth_redei/archive/2009/03/01/lies-damned-lies-and-statistics-part-i.aspx

http://dba.stackexchange.com/questions/34883/how-does-sampling-work-when-updating-statistics

https://blogs.msdn.microsoft.com/sqlserverstorageengine/2016/05/23/query-optimizer-additions-in-sql-server/

Posted in Statistics, CE & QO | 1 Comment

Sync Vs Async Statistics: The Old Debate

Statistic objects in SQL Server are light-weight objects containing statistical information about the distribution of values in one or more columns.  Statistics are used to calculate cardinality estimates during optimisation allowing the query optimiser to generate efficient query execution plans.

Since SQL Server 2005, statistic objects can either be updated synchronously(default) or asynchronously.  This setting is configured at the database level only.  The ‘Auto Update Statistics’ database option controls whether statistics are updated automatically or not.  The ‘Auto Update Statistics Asynchronously’ database option controls whether statistics are automatically updated synchronously or asynchronously:

  • For the synchronous update of statistics; ‘Auto Update Statistics’ must be set to ‘True’ and ‘Auto Update Statistics Asynchronously’ must be set to ‘False’.
  • For the asynchronous update of statistics; ‘Auto Update Statistics’ and ‘Auto Update Statistics Asynchronously’ must both be set to ‘True’.

db-options

Synchronous Vs Asynchronous – The Differences

The query optimiser looks at how many changes have occurred for a given statistic object as part of query optimisation. If the number of rows that have changed exceed certain thresholds, SQL Server will either update the statistic object synchronously or asynchronously (depending on the database settings), then optimise the query.

Note that the statistics don’t auto-update when the rows are changed. It’s only when a query is being compiled and the statistics would actually be used to make an optimisation decision that the update is kicked off.

Synchronous:  If the query optimiser identifies out-of-date statistics prior to compiling a query or before executing a cached query plan, a statistic update is initiated and the query optimiser waits for updated statistics before compiling and executing the query.  This effectively causes query execution to stall while the statistic object is automatically updated synchronously, not only for the session that issued the initial query, but for other sessions too that have issued queries dependent on the same statistic object.  Depending on the size of the object the statistic is created on, your underlying hardware and I/O subsystem, the statistic update operation could take anywhere from milliseconds to minutes.

Synchronous statistic updates execute under the context of the principal who submitted the query that initiated the statistic update.

Asynchronous:  With asynchronous statistics updates, queries compile with existing statistics even if they are considered as being out-of-date.   This potentially could cause the query optimiser to choose a suboptimal query plan based on the stale statistics affecting currently running queries and only queries that compile after the asynchronous updates have completed will benefit from using the updated statistics.

Asynchronous statistic updates are handled in the background by a system thread under the context of sa.  You can check if there are asynchronous statistic updates currently executing using the sys.dm_exec_background_job_queue DMV:

SELECT  time_queued, des.session_id, des.login_name,
        OBJECT_NAME(object_id1) AS [object],
        ss.name as [statistic_name], sc.name AS [statistic_columns],
        auto_created, user_created, no_recompute, has_filter, is_temporary, is_incremental
FROM    sys.dm_exec_background_job_queue bq
JOIN    sys.stats ss ON ss.object_id = bq.object_id1
                        AND ss.stats_id = bq.object_id2
JOIN    sys.stats_columns stc ON stc.object_id = ss.object_id
                                 AND stc.stats_id = ss.stats_id
JOIN    sys.columns sc ON sc.column_id = stc.column_id
                          AND sc.object_id = stc.object_id
JOIN	 sys.dm_exec_sessions des ON des.session_id = bq.session_id
WHERE   in_progress = 1
OPTION(RECOMPILE);

Official & Industry Recommendations

Microsoft:  Microsoft’s official recommendations are:

Synchronous:  Consider using synchronous statistics when you perform operations that change the distribution of data, such as truncating a table or performing a bulk update of a large percentage of the rows. If you do not manually update the statistics after completing the operation, using synchronous statistics will ensure statistics are up-to-date before executing queries on the changed data.

Asynchronous:  Consider using asynchronous statistics to achieve more predictable query response times for the following scenarios:

  • Your application frequently executes the same query, similar queries, or similar cached query plans. Your query response times might be more predictable with asynchronous statistics updates than with synchronous statistics updates because the query optimizer can execute incoming queries without waiting for up-to-date statistics. This avoids delaying some queries and not others.
  • Your application has experienced client request time outs caused by one or more queries waiting for updated statistics. In some cases, waiting for synchronous statistics could cause applications with aggressive time outs to fail.

Brent Ozar:  Brent’s popular sp_Blitz stored procedure checks for databases that have automatic asynchronous statistics updates enabled and recommends “setting it back to the default (disabled) unless you’ve got evidence that delays caused by statistics updates are causing you problems.”

Glenn Berry (SQLSkills):  Glenn states here that he believes asynchronous statistic updates is beneficial for most workloads.

Jos de Bruijn:  The new Microsoft sample database WideWorldImporters is configured to use the asynchronous update of statistics.  When I asked Jos if there was any specific reason for this, his response was “enabling async statistics is best practice in most cases”.

My Two Cents & Conclusion

I think the best approach is proactive management of the statistics on your largest and most important tables.  Generally speaking, statistic updates are usually a pretty quick operation and only potentially become problematic on very large tables (hundreds of millions of rows).

Consider using synchronous auto-update of statistics when:

  • The data distribution and density of your columns changes frequently, statistics become stale quickly and you have evidence that this affects query performance negatively.

Consider using asynchronous auto-update of statistics when:

  • The synchronous updating of statistics on very large tables is impacting the system, taking a long duration and in extreme cases causing unpredictable query performance or even query timeouts.
  • The data distribution and density of your columns is fairly static and query execution plans do not change after an update to statistics.

With that said, it would seem that asynchronous statistics are better suited to OLTP environments and synchronous statistics are better suited to OLAP environments.  As synchronous statistics are the default though, people are reluctant to change this setting without good reason to.  I’ve worked exclusively in OLTP environments over the last few years and have never seen asynchronous statistics rolled out as the default.  I have personally been bitten by synchronous statistic updates on large tables causing query timeouts which I resolved by switching to asynchronous statistic updates.

Asynchronous statistic updates may not be well suited for data warehouses / OLAP environments where large scale data modification is common through various ETL processes.  The proactive SQL Server DBA would manually include statistic update logic at the end of their ETL process and use the synchronous update of statistics database setting as a safety measure.

If you are considering switching to asynchronous statistic updates in your OLTP environments my advice would be use tools like Distributed Replay or RML Utilities to capture your actual workload which you could then replay in a test environment under the different settings.

Posted in Statistics, CE & QO | 2 Comments

Trace Flag 4199 Query Optimiser HotFixes

Background & History Of Trace Flag 4199

To summarise what Microsoft have documented here – Starting with SQL Server 2005 SP3 CU6, SQL Server 2008 SP1 CU7 & SQL Server 2008 R2 RTM, Trace Flag 4199 has been used as a ‘bucket’ to combine several individual query optimiser hotfixes under a single trace flag.

The query optimiser hotfixes contained under Trace Flag 4199 are intentionally not enabled by default.  This means when upgrading from SQL Server 2008 R2 to SQL Server 2012 for example, new query optimiser logic is not enabled.   The reason behind this according to the article linked above is to prevent plan changes that could cause query performance regressions.  This makes sense for highly optimised environments where application critical queries are tuned and rely on specific execution plans and any change in query optimiser logic could potentially cause unexpected / unwanted query regressions.

Conversely, you could argue that the majority of SQL Server environments probably aren’t highly optimised, don’t rely on specific execution plans and are potentially missing out on the benefit of the various query optimiser hotfixes due to this default behaviour.  With that in mind, the argument could be to have the query optimiser hotfixes enabled by default with the behaviour of Trace Flag 4199 reversed to disable the various query optimiser hotfixes if any serious regressions were encountered.

Trace Flag 4199 can be enabled at various scopes; query, session, globally.  The T-SQL for each of those scopes is as follows and see my previous blog post for a more comprehensive look at enabling trace flags in general.

-- Query Scope
SELECT ColumnName FROM TableName
OPTION ( QUERYTRACEON 4199 );
GO

-- Session Scope
DBCC TRACEON (4199);
GO

-- Global Scope
DBCC TRACEON (4199, -1);
GO

Trace Flag 4199 SQL Server 2016 Changes

Summarising again what Microsoft have documented well here, all fixes under Trace Flag 4199 to date will be enabled by default in SQL Server 2016 for databases at the latest compatibility level 130.  Going forward, Trace Flag 4199 will be reused as a bucket for future query optimiser hotfixes.

Database Scoped Configurations, one of the great new features in SQL Server 2016, gives you the ability to make several database-level configuration changes for properties that were previously configured at the instance-level.  In particular, the QUERY_OPTIMIZER_HOTFIXES database scoped configuration allows you to enable query optimiser hotfixes at the database scope where in previous versions of SQL Server you were restricted to enabling query optimiser hotfixes at the query, session or global scope using the logical equivalent Trace Flag 4199.  The T-SQL for this is:

USE [MyDatabase];
GO
ALTER DATABASE SCOPED CONFIGURATION SET QUERY_OPTIMIZER_HOTFIXES = ON;
GO

When To Use Trace Flag 4199 (Prior To SQL Server 2016)

New SQL Server / New Application:  When new SQL Server Instances are installed for new applications, there’s no risk of plan regressions and enabling Trace Flag 4199 globally merits consideration.

SQL Server Upgrade / Existing Application:  When upgrading a SQL Server Instance for an existing application, this will be down to the SQL Server DBAs preference.  Your stance might be that plan regressions are expected anyway as a side effect of the upgrade and you’d rather tackle this with TF 4199 enabled.  My advice would be, prior to upgrading, test your specific application workload under the latest database compatibility level for the version you’re upgrading to, with and without TF 4199 enabled and check for any plan regressions using the methodology outlined here.

Existing SQL Server / Existing Application:   The decision to enable TF 4199 shouldn’t be taken lightly as there’s always the possibility of plan regressions so thorough testing should be done to evaluate if this trace flag makes improvements to your specific application workload or specific queries.  Bear in mind that when enabled globally, all databases on the SQL Server instance will be affected.  TF 4199 could be a useful tool at the query level for the queries that might benefit from new query optimiser logic and whose performance could do with improvement.

Trace Flag 4199 Workload Testing

For the purpose of this demo, I will use HammerDB to generate and populate a database named ‘tpcc’ that I will then run TPC-C (OLTP) workloads against.  The tpcc database will be created using the ‘original schema’ and I will use the exact same HammerDB configuration options detailed in my previous blog post.  Benchmarks will be taken against the tpcc database with and without Trace Flag 4199 enabled globally to show the potential impact this trace flag can have on a workload:

  1. Benchmark 1: SQL Server 2008 R2 SP3 (Database Compatibility Level 100)
  2. Benchmark 2: SQL Server 2012 SP2 (Database Compatibility Level 110)
  3. Benchmark 3: SQL Server 2014 SP2 (Database Compatibility Level 120)

My demo environment consists of three Windows Server 2012 R2 VMs each having a different version of SQL Server installed as above.   Each instance of SQL Server is configured to use a maximum of 8GB of memory with MAXDOP set to 4, CTOP set to 25 and the tpcc database using the simple recovery model.  I will create the tpcc database on the SQL Server 2008 R2 SP3 VM first and immediately backup the database which will be copied to the other VMs and will serve as the starting point for each workload test.

I will then run another set of tests using the same strategy but against a tpcc database generated by HammerDB using the updated schema.

The metrics I will gather are HammerDB’s TPM (Transactions Per Minute) and NOPM (New Orders Per Minute) along with PerfMon counters ‘Batches Requests/Sec’ and ‘% Processor Time’.  The results are as follows:

Original HammerDB Schemaoriginalschemaresults

Updated HammerDB Schemaupdatedschemaresults

As you can see from the results above, there were no serious regressions across workloads when TF 4199 was enabled and there were no massive improvements either.  This just reinforces the need to test your specific application workload or specific queries with TF 4199 as there’s no guarantee it will improve query performance.

Trace Flag 4199 In Action

For this demo, I’m going to restore a copy of AdventureWorks2012 onto my SQL Server 2016 RTM instance and leave the database at compatibility level 110.  I’m then going to issue a query that will expose a known bug and we’ll take a look into that.

First, restore AdventureWorks2012, ensure the database compatibility level is at 110 and then issue the query that exposes the known bug and be sure to capture the actual execution plan.

ALTER DATABASE [AdventureWorks2012] SET COMPATIBILITY_LEVEL = 110
GO

SELECT SC.CustomerID, SC.StoreID FROM Sales.Customer SC
INNER JOIN Sales.SalesOrderHeader SOH ON SOH.CustomerID = SC.CustomerID
WHERE SOH.ShipMethodID = 1
AND SC.CustomerID NOT IN
(
        SELECT SC2.CustomerID FROM Sales.Customer SC2
        WHERE SC2.TerritoryID = 4 AND SC2.CustomerID = SC.CustomerID
);
GO

query110_notf

query110_notf_mjproperties

Looking at the execution plan above and the properties of the ‘Right Anti Semi Join’ logical operator, we can see an inaccurate cardinality estimation.  The estimated number of rows is 1, but the actual number of rows returned is 15124.  Use the below T-SQL to insert the query stats for this query into a temporary table for later analysis:

SELECT  DB_NAME(qt.dbid) AS [Database],
		'No Trace Flags Enabled' AS [Trace Flags],
		qp.query_plan.value('(//@CardinalityEstimationModelVersion)[1]','INT') AS [CE_Version],
        CONVERT(float, (qp.query_plan.value('(//@EstimateRows)[1]', 'varchar(128)'))) AS [Estimated_Rows],
        qs.last_rows AS [Last_Rows],
        CONVERT(float, (qp.query_plan.value('(//@EstimateRows)[1]', 'varchar(128)'))) - (qs.last_rows) AS [Estimate_Skew],
        qs.total_logical_reads / qs.execution_count AS [Avg_Logical_Reads],
        CAST((qs.total_elapsed_time ) / qs.execution_count AS DECIMAL(28, 2)) AS [Avg_Execution_Time(µs)],
        CAST((qs.total_worker_time) / qs.execution_count AS DECIMAL(28, 2)) AS [Avg_CPU_Time(µs)],
        SUBSTRING(qt.TEXT, (qs.statement_start_offset / 2) + 1, ((CASE qs.statement_end_offset WHEN - 1 THEN DATALENGTH(qt.TEXT) ELSE qs.statement_end_offset END - qs.statement_start_offset ) / 2 ) + 1) AS [Query_Text],
        qs.query_hash AS [QueryHash],
        qp.query_plan AS [QueryPlan]
INTO #TempQueryStats
FROM sys.dm_exec_cached_plans cp WITH (NOLOCK)
        CROSS APPLY sys.dm_exec_query_plan (cp.plan_handle) qp
        CROSS APPLY sys.dm_exec_sql_text (cp.plan_handle) qt
        INNER JOIN sys.dm_exec_query_stats qs ON qs.plan_handle = cp.plan_handle
WHERE   qt.text LIKE 'SELECT SC%'
AND   DB_NAME(qt.dbid) = 'AdventureWorks2012'
GO

We know that inaccurate cardinality estimations can lead to inefficient plan choices which can in turn result in degraded performance.  Let’s see if Trace Flag 4199 can help us out here.  First, flush the plan cache and then re-execute the query but this time using Trace Flag 4199:

DBCC FREEPROCCACHE;
GO
SELECT SC.CustomerID, SC.StoreID FROM Sales.Customer SC
INNER JOIN Sales.SalesOrderHeader SOH ON SOH.CustomerID = SC.CustomerID
WHERE SOH.ShipMethodID = 1
AND SC.CustomerID NOT IN
(
        SELECT SC2.CustomerID FROM Sales.Customer SC2
        WHERE SC2.TerritoryID = 4 AND SC2.CustomerID = SC.CustomerID
)
OPTION (QUERYTRACEON 4199);
GO

query110_tf4199

query110_tf4199_mjproperties

Looking at the properties of the ‘Right Anti Semi Join’ logical operator, we can see that under Trace Flag 4199, the query optimiser estimated 15124 rows which was exactly right.  Now that the query optimiser is able to accurately estimate cardinality, it has compiled a different execution plan from that of the first execution without TF 4199.

Again, execute the below T-SQL to insert the query stats for this query into a temporary table for later analysis:

INSERT INTO #TempQueryStats
SELECT  DB_NAME(qt.dbid) AS [Database],
		'TF 4199 Enabled' AS [Trace Flags],
		qp.query_plan.value('(//@CardinalityEstimationModelVersion)[1]','INT') AS [CE_Version],
        CONVERT(float, (qp.query_plan.value('(//@EstimateRows)[1]', 'varchar(128)'))) AS [Estimated_Rows],
        qs.last_rows AS [Last_Rows],
        CONVERT(float, (qp.query_plan.value('(//@EstimateRows)[1]', 'varchar(128)'))) - (qs.last_rows) AS [Estimate_Skew],
        qs.total_logical_reads / qs.execution_count AS [Avg_Logical_Reads],
        CAST((qs.total_elapsed_time ) / qs.execution_count AS DECIMAL(28, 2)) AS [Avg_Execution_Time(µs)],
        CAST((qs.total_worker_time) / qs.execution_count AS DECIMAL(28, 2)) AS [Avg_CPU_Time(µs)],
        SUBSTRING(qt.TEXT, (qs.statement_start_offset / 2) + 1, ((CASE qs.statement_end_offset WHEN - 1 THEN DATALENGTH(qt.TEXT) ELSE qs.statement_end_offset END - qs.statement_start_offset ) / 2 ) + 1) AS [Query_Text],
        qs.query_hash AS [QueryHash],
        qp.query_plan AS [QueryPlan]
--INTO #TempQueryStats
FROM sys.dm_exec_cached_plans cp WITH (NOLOCK)
        CROSS APPLY sys.dm_exec_query_plan (cp.plan_handle) qp
        CROSS APPLY sys.dm_exec_sql_text (cp.plan_handle) qt
        INNER JOIN sys.dm_exec_query_stats qs ON qs.plan_handle = cp.plan_handle
WHERE   qt.text LIKE 'SELECT SC%'
AND   DB_NAME(qt.dbid) = 'AdventureWorks2012'
GO

As stated earlier, “All fixes under Trace Flag 4199 to date will be enabled by default in SQL Server 2016 for databases at the latest compatibility level 130”.  Let’s test that out.  First, set the database compatibility level to 130, flush the plan cache and re-execute the same query but with no trace flags this time:

ALTER DATABASE [AdventureWorks2012] SET COMPATIBILITY_LEVEL = 130
GO

DBCC FREEPROCCACHE
GO

SELECT SC.CustomerID, SC.StoreID FROM Sales.Customer SC
INNER JOIN Sales.SalesOrderHeader SOH ON SOH.CustomerID = SC.CustomerID
WHERE SOH.ShipMethodID = 1
AND SC.CustomerID NOT IN
(
        SELECT SC2.CustomerID FROM Sales.Customer SC2
        WHERE SC2.TerritoryID = 4 AND SC2.CustomerID = SC.CustomerID
);
GO

query130_notf

query130_notf_mjproperties

With the database compatibility level set to 130, the query optimiser makes an accurate cardinality estimation and compiles the same execution plan that was seen when the database compatibility level was 110 and when using Trace Flag 4199.  There’s the proof that in SQL Server 2016 the query optimiser hotfixes previously collected under TF 4199 are enabled by default for databases at the latest compatibility level.

Use the below T-SQL to insert the query stats into the same temporary table:

INSERT INTO #TempQueryStats
SELECT  DB_NAME(qt.dbid) AS [Database],
		'No Trace Flags Enabled' AS [Trace Flags],
		qp.query_plan.value('(//@CardinalityEstimationModelVersion)[1]','INT') AS [CE_Version],
        CONVERT(float, (qp.query_plan.value('(//@EstimateRows)[1]', 'varchar(128)'))) AS [Estimated_Rows],
        qs.last_rows AS [Last_Rows],
        CONVERT(float, (qp.query_plan.value('(//@EstimateRows)[1]', 'varchar(128)'))) - (qs.last_rows) AS [Estimate_Skew],
        qs.total_logical_reads / qs.execution_count AS [Avg_Logical_Reads],
        CAST((qs.total_elapsed_time ) / qs.execution_count AS DECIMAL(28, 2)) AS [Avg_Execution_Time(µs)],
        CAST((qs.total_worker_time) / qs.execution_count AS DECIMAL(28, 2)) AS [Avg_CPU_Time(µs)],
        SUBSTRING(qt.TEXT, (qs.statement_start_offset / 2) + 1, ((CASE qs.statement_end_offset WHEN - 1 THEN DATALENGTH(qt.TEXT) ELSE qs.statement_end_offset END - qs.statement_start_offset ) / 2 ) + 1) AS [Query_Text],
        qs.query_hash AS [QueryHash],
        qp.query_plan AS [QueryPlan]
--INTO #TempQueryStats
FROM sys.dm_exec_cached_plans cp WITH (NOLOCK)
        CROSS APPLY sys.dm_exec_query_plan (cp.plan_handle) qp
        CROSS APPLY sys.dm_exec_sql_text (cp.plan_handle) qt
        INNER JOIN sys.dm_exec_query_stats qs ON qs.plan_handle = cp.plan_handle
WHERE   qt.text LIKE 'SELECT SC%'
AND   DB_NAME(qt.dbid) = 'AdventureWorks2012'
GO

Execute this T-SQL to take a look at the query stats for the various query executions:

SELECT * FROM #TempQueryStats
ORDER BY [Avg_CPU_Time(µs)] DESC OPTION(RECOMPILE);

query_stats

The worst performer was the first execution when the AdventureWorks2012 database compatibility level was 110 and TF 4199 was not enabled.  Under this compatibility level, the query optimiser will use the legacy CE to compile execution plans which can be seen in the output above – ‘CE_Version’ = 70.  The estimate skew is -22184.35425 but note that this is the root level skew referring to the far left / final operator in the execution plan.  We know that the skew originated in the leaf / intermediate levels of the execution plan, specifically the ‘Right Anti Semi Join’ logical operator, where the query optimiser was unable to accurately estimate the cardinality.  In this case, the leaf-level skew flowed up to the root and drove the optimiser to choose an inefficient plan which can be seen from the query stats as average logical reads and average CPU time are multiple times higher compared to the other executions.

When Trace Flag 4199 was enabled, the query optimiser was able to make an accurate cardinality estimation and a more efficient plan was chosen.  This is reflected in the query stats above.  The same behaviour was observed in the final execution when the database compatibility level was raised to 130 and the query was compiled using the new CE ( ‘CE_Version’ = 130 ).

Resources & Further Reading

SQL_Sasquatch ( @sql_sasquatch )has a great series on this trace flag which can be found here.

I also recommend this post by Microsoft which shows how they used TF 4199 to solve a QO bug in SQL Server 2014 SP1.

Paul White ( @SQL_Kiwi ) has a great answer to this StackExchange question regarding the anti-semi join bug.

 

Posted in Statistics, CE & QO | 1 Comment

A Guide On Forcing The Legacy CE

Trace Flag 9481

In my previous blog post I showed that when the database compatibility level is 120 or higher, trace flag 9481 can force the query optimiser to use the legacy CE (cardinality estimator model version 70) when compiling an execution plan.  This trace flag can be enabled at various scopes and allows us to influence the CE behavior depending on the granularity required:

  • Query Scope:  The QUERYTRACEON query hint specified in the OPTION clause of a query enables you to use trace flags as hints at the query level and enables certain plan-affecting changes in the query optimiser.  Query trace flags are active for the context of a specific query.
    USE WideWorldImporters;
    GO
    SELECT PurchaseOrderID, OrderDate
    FROM Purchasing.PurchaseOrders
    WHERE OrderDate = '2016-05-31'
    OPTION ( QUERYTRACEON 9481 );
    

    ce_model_version

  • Session Scope: DBCC TRACEON can be used to enable specific trace flags at the session level.  Session trace flags are active for a connection and only directly effect that specific connection:
    DBCC FREEPROCCACHE;
    GO
    DBCC TRACEON (9481);
    GO
    SELECT PurchaseOrderID, OrderDate
    FROM Purchasing.PurchaseOrders
    WHERE OrderDate = '2016-05-31';
    GO
    DBCC TRACEOFF (9481);
    GO
    
  • Global Scope: Global trace flags are set at the server level and affect every connection to the SQL Server instance.  Global trace flags can be enabled using the following methods:
  1. DBCC TRACEON: Using the -1 flag with DBCC TRACEON enables the trace flag globally.  Note that this doesn’t persist through restarts and would need to be executed again.
    DBCC FREEPROCCACHE;
    GO
    DBCC TRACEON (9481, -1);
    GO
    SELECT PurchaseOrderID, OrderDate
    FROM Purchasing.PurchaseOrders
    WHERE OrderDate = '2016-05-31';
    GO
    DBCC TRACEOFF (9481, -1);
    GO
    
  2. Startup Parameter:  In SQL Server Configuration Manager, open the properties of your SQL Server instance and add -T9481 under the ‘Startup Parameters’ tab.  This requries a restart of the SQL Server instance to take effect.  (Note that the ‘Startup Parameters’ tab was introduced in SQL Server 2012, in previous versions of SQL Server startup parameters were added under the ‘Advanced’ tab.)
    StartUp Param.PNG

3. Registry:  It is possible to enable trace flags by editing the registry hive but you don’t want to go there, trust me.

When To Use Trace Flag 9481

Query Scope:  You’ve moved (migrated/upgraded) to SQL Server 2014 / 2016, your databases are at compatibility level 120 / 130 and using the new CE, your workload is performing well overall but there are a few regressions where a small number of queries actually perform worse.  Use Trace Flag 9481 on a per query basis as a temporary measure until you can tune / rewrite the query so it performs well without the hint.

Session Scope:  You’re testing regression of specific stored procedures and don’t want to manually edit the stored procedure code to include a query hint.  Bear in mind that if there isn’t already a cached plan, an execution plan will be compiled using the legacy CE and all subsequent executions of the stored procedure will use that same plan, even if they do not have the trace flag enabled.

Global Scope:  You plan on migrating all databases of a specific SQL Server instance to SQL Server 2014 / SQL Server 2016, you want to leverage new engine features but you want to continue using the legacy CE as you haven’t tested your workload against the new CE yet. With your databases set to the latest compatibilty level and Trace Flag 9481 enabled globally you can get continue to use the legacy CE across the board.

LEGACY_CARDINALITY_ESTIMATION Database Scoped Configuration

We’ve covered how Trace Flag 9481 could be utilised in SQL Server 2014 / 2016 to force the legacy cardinality estimation model and the different scenarios where you might use this trace flag .  We’ve seen that TF 9481 can be enabled at various scopes; query, session and globally.  But what if you want to enable the legacy CE at the database scope?

In SQL Server 2014, you were limited to using the database compatibility level to control the CE version at the database scope.  Setting the compatibility level to  110 or lower will force the legacy CE  (cardinality estimation model version 70).
compat-levelThe problem with lowering the database compatibility level is that you can’t leverage the new engine functionality available under the latest compatibility level.

This problem was solved in SQL Server 2016 with the introduction of Database Scoped Configurations which gives you the ability to make several database-level configuration changes for properties that were previously configured at the instance-level.  In particular, the LEGACY_CARDINALITY_ESTIMATION database scoped configuration allows you to set the cardinality estimation model independent of the database compatibility level. This option allows you to leverage all new functionality provided with compatibility level 130 but still use the legacy CE in the odd chance that the latest CE casuses severe query regressions across your workload.

The database scoped configurations are exposed via the following mechanisms:

GUI

Via the options tab of the Database Properties window as follows:DSC.PNG

T-SQL

A new T-SQL statement ALTER DATABASE SCOPED CONFIGURATION has been introduced to set these new database level configurations for a particular database:

ALTER DATABASE SCOPED CONFIGURATION
SET LEGACY_CARDINALITY_ESTIMATION = ON;
GO

DMV

There is a new DMV sys.database_scoped_configurations which exposes the configuration values for the current database:

SELECT configuration_id, name, value, value_for_secondary
FROM sys.database_scoped_configurations OPTION (RECOMPILE);
GO

dsc_dmv

FORCE_LEGACY_CARDINALITY_ESTIMATION Query Hint

I’ve shown how Trace Flag 9481 can be used with QUERYTRACEON to enable the trace flag at the query level.  There are a few issues with this though:

  1. Executing a query with the QUERYTRACEON option requires membership in the sysadmin fixed server role.
  2. Trage flag numbers are not intuitive.  Anyone not familiar with a particular trace flag number will not know the behaviour it exhibits.

SQL Server 2016 SP1 introduced a new class of query hints under ‘USE HINT’ which lets you influence the query optimiser without elevated credentials or without being a member of the sysadmin server role.  Additionally, the new hint options are descriptive to allow for easier understanding of the behaviour they exhibit:

USE WideWorldImporters;
GO
SELECT PurchaseOrderID, OrderDate
FROM Purchasing.PurchaseOrders
WHERE OrderDate = '2016-05-31'
OPTION (USE HINT('FORCE_LEGACY_CARDINALITY_ESTIMATION'));

This is the logical equivalent of using OPTION (QUERYTRACEON 9481).

Find Execution Plans Using the Legacy CE

Use the following query to look at what cardinality estimator model version is being used by the execution plans in your plan cache:

SELECT  DB_NAME(qt.dbid) AS [Database], cp.objtype AS [Cached_Plan_Type],
        cp.usecounts AS [Use_Counts],
        qp.query_plan.value('(//@CardinalityEstimationModelVersion)[1]','INT') AS [CE_Version],
        CONVERT(float, (qp.query_plan.value('(//@EstimateRows)[1]', 'varchar(128)'))) AS [Estimated_Rows],
        qs.last_rows AS [Last_Rows],
        CONVERT(float, (qp.query_plan.value('(//@EstimateRows)[1]', 'varchar(128)'))) - (qs.last_rows) AS [Estimate_Skew],
        qs.total_logical_reads / qs.execution_count AS [Avg_Logical_Reads],
        CAST((qs.total_elapsed_time ) / qs.execution_count AS DECIMAL(28, 2)) AS [Avg_Execution_Time(µs)],
        CAST((qs.total_worker_time) / qs.execution_count AS DECIMAL(28, 2)) AS [Avg_CPU_Time(µs)],
        SUBSTRING(qt.TEXT, (qs.statement_start_offset / 2) + 1, ((CASE qs.statement_end_offset WHEN - 1 THEN DATALENGTH(qt.TEXT) ELSE qs.statement_end_offset END - qs.statement_start_offset ) / 2 ) + 1) AS [Query_Text],
        qs.query_hash AS [QueryHash],
        qp.query_plan AS [QueryPlan]
FROM sys.dm_exec_cached_plans cp WITH (NOLOCK)
        CROSS APPLY sys.dm_exec_query_plan (cp.plan_handle) qp
        CROSS APPLY sys.dm_exec_sql_text (cp.plan_handle) qt
        INNER JOIN sys.dm_exec_query_stats qs ON qs.plan_handle = cp.plan_handle
WHERE   qt.text NOT LIKE '%sys.%'
AND   DB_NAME(qt.dbid) = 'User_Database'
ORDER BY [Avg_CPU_Time(µs)] DESC OPTION(RECOMPILE);
GO
Posted in Statistics, CE & QO | 1 Comment

Troubleshooting Query Regressions Caused By The New Cardinality Estimator

In this blog post I’ll show methods that can be used to identify and troubleshoot query plan regressions caused by the new cardinality estimator.  I’ll show how this was a very manual task in SQL Server 2014 and how the process was made much easier in SQL Server 2016 with the introduction of Query Store.

For the purpose of this demo, I will use HammerDB to generate and populate a database that I will then run TPC-C (OLTP) workloads against.  Benchmarks will be taken against the HammerDB generated database under the different database compatibility levels to show the effects the different cardinality estimator model versions have on the workload:

  1. Benchmark 1: Database Compatibility Level = 110 (Legacy CE)
  2. Benchmark 2: Database Compatibility Level = 120 (New SQL Server 2014 CE)
  3. Benchmark 3: Database Compatibility Level = 130 (New SQL Server 2016 CE)

Trace Flag 4199 which enables various query optimiser hotfixes could potentially affect the results of these workload tests and will not be enabled in this demo.

Using HammerDB in this manner to generate a synthetic workload is a very contrived example.  In reality you should be using tools like RML Utilities or Distributed Replay to test what effect the new cardinality estimator has against your specific application workload.

My demo environment consists of a Windows Server 2012 R2 VM with a single installation of SQL Server 2016 RTM configured to use a maximum of 8GB of memory with MAXDOP set to 4, CTOP set to 25 and the tpcc database using the simple recovery model.

Configuring HammerDB & PerfMon

HammerDB:  Launch HammerDB and select ‘MSSQL Server’ from the benchmark options.  Expand ‘TPC-C’, navigate to ‘Schema Build’ and double click ‘Options’.  This is the template for the database which will be created and populated.  For this demo I have used the options below.  To create the ‘tpcc’ database, double click ‘Build’.

tpcc_database

Backup the newly created ‘tpcc’ database which will serve as the starting point for each benchmark test:

BACKUP DATABASE [tpcc] TO  DISK = N'C:\Temp\tpcc.bak'
WITH COPY_ONLY, COMPRESSION,  STATS = 10, CHECKSUM
GO

Prepare the workload driver options as below by double clicking on ‘Options’ under ‘Driver Script’.  Then, double click ‘Load’ which should populate the ‘Script Editor’ window with 387 lines of code.

workload-options

Next, configure the virtual users as below by double clicking ‘Options’ under ‘Virtual User’.  Create the virtual users by double clicking ‘Create’.

virtualuser-options

PerfMon:  Configure PerfMon to capture ‘Batches Requests/Sec’ and ‘% Processor Time’.  Set a sample time of every second for a durations of 300 seconds ( 5 minutes ).

perfmon-counters

perfmon-properties

Tracking Query Regressions The Old Way

SQL Server 2012 Legacy CE Benchmark Test

Preparation:  Set the database compatibility level to 110 so the query optimiser will use the legacy CE model when compiling queries for this database.  Flush the plan cache and buffer pool prior to each workload test to ensure a consistent starting point.

ALTER DATABASE [tpcc] SET COMPATIBILITY_LEVEL = 110
GO
DBCC FREEPROCCACHE
GO
DBCC DROPCLEANBUFFERS
GO

 

With PerfMon configured and the HammerDB workload defined, it’s time to start the test.  Start the PerfMon counters, wait 30 seconds and then start the HammerDB workload by double clicking ‘Run’ under ‘Virtual Users’ or by selecting the green arrow icon highlighted below:

runvirtualusers

Analysing Results:  Under the ‘Virtual User Output’ tab as shown below you can see the results which are: 30875 TPM (Transactions Per Minute) & 6666 NOPM (New Orders Per Minute).

legacy-ce-tpm

Looking at the PerfMon graph, you can see that Average Batch Requests/sec was ~40 and Average % Processor Time was ~15%.

legacyceperfmon

This workload test against the legacy cardinality estimator will serve as the baseline which we will use for comparison after running the same HammerDB workload against the newer cardinality estimator model versions.

hammerdb_workloadresults1

Use the below T-SQL to filter the contents of the plan cache and insert query stat information into a temporary table named #TempCEStats for later analysis.

--INSERT INTO #TempCEStats
SELECT  DB_NAME(qt.dbid) AS [Database], cp.objtype AS [Cached_Plan_Type],
        cp.usecounts AS [Use_Counts],
		qp.query_plan.value('(//@CardinalityEstimationModelVersion)[1]','INT') AS [CE_Version],
		CONVERT(float, (qp.query_plan.value('(//@EstimateRows)[1]', 'varchar(128)'))) AS [Estimated_Rows],
		qs.last_rows AS [Last_Rows],
		CONVERT(float, (qp.query_plan.value('(//@EstimateRows)[1]', 'varchar(128)'))) - (qs.last_rows) AS [Estimate_Skew],
		qs.total_logical_reads / qs.execution_count AS [Avg_Logical_Reads],
		CAST((qs.total_elapsed_time ) / qs.execution_count AS DECIMAL(28, 2)) AS [Avg_Execution_Time(µs)],
		CAST((qs.total_worker_time) / qs.execution_count AS DECIMAL(28, 2)) AS [Avg_CPU_Time(µs)],
		SUBSTRING(qt.TEXT, (qs.statement_start_offset / 2) + 1, ((CASE qs.statement_end_offset WHEN - 1 THEN DATALENGTH(qt.TEXT) ELSE qs.statement_end_offset END - qs.statement_start_offset ) / 2 ) + 1) AS [Query_Text],
		qs.query_hash AS [QueryHash],
        qp.query_plan AS [QueryPlan]
INTO #TempCEStats
FROM sys.dm_exec_cached_plans cp WITH (NOLOCK)
        CROSS APPLY sys.dm_exec_query_plan (cp.plan_handle) qp
        CROSS APPLY sys.dm_exec_sql_text (cp.plan_handle) qt
		INNER JOIN sys.dm_exec_query_stats qs ON qs.plan_handle = cp.plan_handle
WHERE   qt.text NOT LIKE '%sys.%'
AND   DB_NAME(qt.dbid) = 'tpcc'
ORDER BY [Avg_CPU_Time(µs)] DESC OPTION(RECOMPILE);
GO

SQL Server 2014 CE Model Version 120 Benchmark Test

Preparation:  Restore the tpcc database from the initial backup.  Set the database compatibility level to 120 so the query optimiser will use the new cardinality estimator model version 120 when compiling queries for this database.  Flush the plan cache and buffer pool prior to each workload test to ensure a consistent starting point.

 

RESTORE DATABASE [tpcc] FROM  DISK = N'C:\Temp\tpcc.bak' WITH  FILE = 1,  NOUNLOAD,  REPLACE,  STATS = 5
GO
ALTER DATABASE [tpcc] SET COMPATIBILITY_LEVEL = 120
GO
DBCC FREEPROCCACHE
GO
DBCC DROPCLEANBUFFERS
GO

Clear the PerfMon screen, re-start the PerfMon counters, wait 30 seconds and then start the HammerDB workload once again.  Once the workload is complete, run the below T-SQL once again to insert the contents of the plan cache into the same temporary table for later analysis:

INSERT INTO #TempCEStats
SELECT  DB_NAME(qt.dbid) AS [Database], cp.objtype AS [Cached_Plan_Type],
        cp.usecounts AS [Use_Counts],
		qp.query_plan.value('(//@CardinalityEstimationModelVersion)[1]','INT') AS [CE_Version],
		CONVERT(float, (qp.query_plan.value('(//@EstimateRows)[1]', 'varchar(128)'))) AS [Estimated_Rows],
		qs.last_rows AS [Last_Rows],
		CONVERT(float, (qp.query_plan.value('(//@EstimateRows)[1]', 'varchar(128)'))) - (qs.last_rows) AS [Estimate_Skew],
		qs.total_logical_reads / qs.execution_count AS [Avg_Logical_Reads],
		CAST((qs.total_elapsed_time ) / qs.execution_count AS DECIMAL(28, 2)) AS [Avg_Execution_Time(µs)],
		CAST((qs.total_worker_time) / qs.execution_count AS DECIMAL(28, 2)) AS [Avg_CPU_Time(µs)],
		SUBSTRING(qt.TEXT, (qs.statement_start_offset / 2) + 1, ((CASE qs.statement_end_offset WHEN - 1 THEN DATALENGTH(qt.TEXT) ELSE qs.statement_end_offset END - qs.statement_start_offset ) / 2 ) + 1) AS [Query_Text],
		qs.query_hash AS [QueryHash],
        qp.query_plan AS [QueryPlan]
--INTO #TempCEStats
FROM sys.dm_exec_cached_plans cp WITH (NOLOCK)
        CROSS APPLY sys.dm_exec_query_plan (cp.plan_handle) qp
        CROSS APPLY sys.dm_exec_sql_text (cp.plan_handle) qt
		INNER JOIN sys.dm_exec_query_stats qs ON qs.plan_handle = cp.plan_handle
WHERE   qt.text NOT LIKE '%sys.%'
AND   DB_NAME(qt.dbid) = 'tpcc'
ORDER BY [Avg_CPU_Time(µs)] DESC OPTION(RECOMPILE);
GO

Analysing Results:  A quick glance at the PerfMon graph below shows that CPU usage has increased significantly and batch requests/sec has decreased.

newce_sql2014_perfmon

Plug the PerfMon and HammerDB metrics into the table for a better comparison:

hammerdb_workloadresults2

It’s clear that the HammerDB synthetic workload performs better under the legacy cardinality estimator as opposed to the new cardinality estimator model version 120.  What tools are available to us for analysing how individual queries or stored procedures perform?  It’s possible to use Extended Events or Profiler but for this demo we are going to use the various DMOs and fortunately we’ve already been inserting query statistics into a temporary table, #TempCEStats.  Since we know that that there was a significant increase in CPU usage during the second workload test, issue the below query against the temporary table to filter on this metric:

SELECT * FROM #TempCEStats
ORDER BY [Avg_CPU_Time(µs)] DESC

 

query-stats

Instantly it is apparent that the most resource intensive query was the same query across both workload tests and note that the query hash is consistent too.  It is also apparent that this query performs worse under the new cardinality estimator model version 120.  To investigate and understand why this particular query behaves differently under the different cardinality estimators we’ll need to look at the actual query and the execution plans.

Looking at the information in #TempCEStats and the execution plans, the problematic query below belongs to the SLEV stored procedure.  The execution plans are as follows:

ce70_plan

ce120_plan

This excerpt, taken from my previous blog post, is very relevant here:
“Good cardinality estimation (row count expectations at each node of the logical tree) is vital; if these numbers are wrong, all later decisions are affected. Cardinality estimates are a major factor in deciding which physical operator algorithms are used, the overall plan shape (join orders etc) and ultimately they determine the final query plan that executes. Given these critical plan choices, when the cardinality estimation process contains a significantly skewed assumption, this can lead to an inefficient plan choice. This can, in turn, result in degraded performance.”

The different assumption models and algorithms used by the new cardinality estimator model will lead to different cardinality estimates in certain scenarios and this is apparent here in the form of a completely different execution plan shape.  Under the legacy CE, a serial plan with an index seek and RID lookup using a nested loop join was used.  Under the new SQL Server 2014 CE, a parallel plan with a table scan and hash match join was used.

SQL Server 2014 Potential Solutions:  In my previous blog post I showed how Trace Flag 9481 can be used in SQL Server 2014 & SQL Server 2016 at various scopes to force the legacy CE.  In our scenario, the HammerDB synthetic workload performs much the same under the different cardinality estimators with the exception of the SLEV stored procedure.  The recommendation in this scenario then would be to keep using the new SQL Server 2014 CE but force the legacy CE for the one specific query regression as a temporary measure until the stored procedure can be tuned / rewritten so it performs well without the hint.  This can be done by setting Trace Flag 9481 at the individual statement level within the stored procedure assuming we have the permission to edit the stored procedure code.

Let’s modify the stored procedure and re-run the HammerDB workload:

RESTORE DATABASE [tpcc] FROM  DISK = N'C:\Temp\tpcc.bak' WITH  FILE = 1,  NOUNLOAD,  REPLACE,  STATS = 5
GO
USE [tpcc]
GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

ALTER PROCEDURE [dbo].[SLEV]
@st_w_id int,
@st_d_id int,
@threshold int
AS
BEGIN
DECLARE
@st_o_id int,
@stock_count int
BEGIN TRANSACTION
BEGIN TRY
SELECT @st_o_id = DISTRICT.D_NEXT_O_ID FROM dbo.DISTRICT WHERE DISTRICT.D_W_ID = @st_w_id AND DISTRICT.D_ID = @st_d_id
SELECT @stock_count = count_big(DISTINCT STOCK.S_I_ID) FROM dbo.ORDER_LINE, dbo.STOCK
WHERE ORDER_LINE.OL_W_ID = @st_w_id AND ORDER_LINE.OL_D_ID = @st_d_id AND (ORDER_LINE.OL_O_ID &amp;lt; @st_o_id) AND
ORDER_LINE.OL_O_ID &amp;gt;= (@st_o_id - 20) AND STOCK.S_W_ID = @st_w_id AND STOCK.S_I_ID = ORDER_LINE.OL_I_ID AND STOCK.S_QUANTITY &amp;lt; @threshold OPTION (QUERYTRACEON 9481)
SELECT	@st_o_id as N'@st_o_id', @stock_count as N'@stock_count'
END TRY
BEGIN CATCH
SELECT
ERROR_NUMBER() AS ErrorNumber
,ERROR_SEVERITY() AS ErrorSeverity
,ERROR_STATE() AS ErrorState
,ERROR_PROCEDURE() AS ErrorProcedure
,ERROR_LINE() AS ErrorLine
,ERROR_MESSAGE() AS ErrorMessage;
IF @@TRANCOUNT &amp;gt; 0
ROLLBACK TRANSACTION;
END CATCH;
IF @@TRANCOUNT &amp;gt; 0
COMMIT TRANSACTION;
END
GO

ALTER DATABASE [tpcc] SET COMPATIBILITY_LEVEL = 120
GO
DBCC FREEPROCCACHE
GO
DBCC DROPCLEANBUFFERS
GO

Analysing Results:

newce_sql2014_tf9481_perfmon

hammerdb_workloadresults3

The results clearly show the effect having Trace Flag 9481 set on the problematic statement in the SLEV stored procedure has had on the workload performance.  CPU usage is back to what it was under the legacy cardinality estimator and TPM, NOPM and batch requests / sec have all almost doubled.  To confirm that the problematic statement is using the legacy cardinality estimator, open the query execution plan XML and search for CardinalityEstimationModelVersion=”70″.  Or you can scroll down the execution plan XML to the relevant statement and you will find this attribute.

In the scenario where you can’t modify the stored procedure, you could use a plan guide to elicit the same effect.

SQL Server 2016 CE Model Version 130 Benchmark Test

To complete the picture, let’s run the same workload test against the new SQL Server 2016 CE and collect the same metrics.

Preparation:  Restore the tpcc database from the backup taken earlier prior to the first workload test.  Set the database compatibility level to 130 so the query optimiser will use the new SQL Server 2016 cardinality estimator model version 130 when compiling queries for this database.  Flush the plan cache and buffer pool prior to each workload test to ensure a consistent starting point.

RESTORE DATABASE [tpcc] FROM  DISK = N'C:\Temp\tpcc.bak' WITH  FILE = 1,  NOUNLOAD,  REPLACE,  STATS = 5
GO
ALTER DATABASE [tpcc] SET COMPATIBILITY_LEVEL = 130
GO
DBCC FREEPROCCACHE
GO
DBCC DROPCLEANBUFFERS
GO

Clear the PerfMon screen, re-start the PerfMon counters, wait 30 seconds and then start the HammerDB workload once again.  Once the workload is complete, run the below T-SQL once again to insert the contents of the plan cache into the same temporary table for later analysis:

INSERT INTO #TempCEStats
SELECT  DB_NAME(qt.dbid) AS [Database], cp.objtype AS [Cached_Plan_Type],
        cp.usecounts AS [Use_Counts],
		qp.query_plan.value('(//@CardinalityEstimationModelVersion)[1]','INT') AS [CE_Version],
		CONVERT(float, (qp.query_plan.value('(//@EstimateRows)[1]', 'varchar(128)'))) AS [Estimated_Rows],
		qs.last_rows AS [Last_Rows],
		CONVERT(float, (qp.query_plan.value('(//@EstimateRows)[1]', 'varchar(128)'))) - (qs.last_rows) AS [Estimate_Skew],
		qs.total_logical_reads / qs.execution_count AS [Avg_Logical_Reads],
		CAST((qs.total_elapsed_time ) / qs.execution_count AS DECIMAL(28, 2)) AS [Avg_Execution_Time(µs)],
		CAST((qs.total_worker_time) / qs.execution_count AS DECIMAL(28, 2)) AS [Avg_CPU_Time(µs)],
		SUBSTRING(qt.TEXT, (qs.statement_start_offset / 2) + 1, ((CASE qs.statement_end_offset WHEN - 1 THEN DATALENGTH(qt.TEXT) ELSE qs.statement_end_offset END - qs.statement_start_offset ) / 2 ) + 1) AS [Query_Text],
		qs.query_hash AS [QueryHash],
        qp.query_plan AS [QueryPlan]
--INTO #TempCEStats
FROM sys.dm_exec_cached_plans cp WITH (NOLOCK)
        CROSS APPLY sys.dm_exec_query_plan (cp.plan_handle) qp
        CROSS APPLY sys.dm_exec_sql_text (cp.plan_handle) qt
		INNER JOIN sys.dm_exec_query_stats qs ON qs.plan_handle = cp.plan_handle
WHERE   qt.text NOT LIKE '%sys.%'
AND   DB_NAME(qt.dbid) = 'tpcc'
ORDER BY [Avg_CPU_Time(µs)] DESC OPTION(RECOMPILE);
GO

Analysing Results: 

newce_sql2016_perfmon

Putting the metrics into the table below, we can see that the HammerDB workload as a whole performs slightly better under the SQL Server 2016 CE as opposed to the SQL Server 2014 CE.

hammerdb_workloadresults4

The evidence above is reinforced when querying the #TempCEStats temporary table as the single problematic query also performs slightly better under the SQL Server 2016 CE as opposed to the SQL Server 2014 CE, but still not as performant as under the Legacy CE.  Looking at the execution plan it’s unsurprising to see the same parallel plan with a table scan and hash match join.

query-stats_final

The next step now would be to test the HammerDB workload against the SQL Server 2016 CE with trace flag 9481 added to the problematic statement within the SLEV stored procedure but I’ll leave that as an exercise for the reader.

Tracking Query Regressions The New Way (Feat. Query Store)

For this demo, the setup and approach will be slightly different.  The scenario this time is that the tpcc database will be upgraded to the latest database compatibility level 130 but we want to continue using the legacy CE until given the go ahead to switch to the SQL Server 2016 CE.  In SQL Server 2016, this can be achieved using the following methods:

  1. Trace Flag: Executing ‘DBCC TRACEON (9481, -1);’ will enable this trace flag globally and will affect all databases on the SQL Server instance.
  2. Database Scoped Configuration: Executing ‘ALTER DATABASE SCOPED CONFIGURATION SET LEGACY_CARDINALITY_ESTIMATION = ON;’ will only effect the database it is executed under.

Using the database scoped configuration method is the better option and is what we’ll use in this demo.

Preparation:  Restore the tpcc database from the initial backup.  Set the database compatibility level to 130, turn on the Legacy CE database scoped configuration so the query optimiser will use the legacy CE when compiling queries for this database.  Enable query store as it is not active for new databases by default.  Flush the plan cache and buffer pool.

RESTORE DATABASE [tpcc] FROM DISK = N'C:\Temp\tpcc.bak' WITH  FILE = 1,  NOUNLOAD,  REPLACE,  STATS = 5
GO
ALTER DATABASE [tpcc] SET COMPATIBILITY_LEVEL = 130
GO
USE [tpcc]
GO
ALTER DATABASE SCOPED CONFIGURATION SET LEGACY_CARDINALITY_ESTIMATION = ON
GO
ALTER DATABASE [tpcc] SET QUERY_STORE = ON
GO
ALTER DATABASE [tpcc] SET QUERY_STORE (OPERATION_MODE = READ_WRITE, DATA_FLUSH_INTERVAL_SECONDS = 900, INTERVAL_LENGTH_MINUTES = 10)
GO
DBCC FREEPROCCACHE
GO
DBCC DROPCLEANBUFFERS
GO

Run the HammerDB workload and put the results into the table below.  Next, turn off the Legacy CE database scoped configuration with the below T-SQL.  Run another HammerDB workload and enter the results into the table below.

USE [tpcc]
GO
ALTER DATABASE SCOPED CONFIGURATION SET LEGACY_CARDINALITY_ESTIMATION = ON
GO

hammerdb_workloadresults5

Analysing Results:  Looking at the HammerDB metrics above, we can see once again the throughput of our HammerDB workload has decreased significantly when switching to the new SQL Server 2016 CE.  Instead of trawling through the various DMOs, let’s take a look at Query Store.  Expand the tpcc database in object explorer, navigate to ‘Query Store’ and open ‘Top Resource Consuming Queries’.  Filtering on Total CPU Time or Total Duration in the top left pane we can clearly see the same problematic query.

querystore_1

Looking at the top right pane, we can see there are three execution plans for this query and it’s clear that the execution plan depicted by the yellow circle at the top most right performs the worst.  To get more detailed information on the performance of the individual execution plans, select the ‘View Plan Summary In A Grid Format’ box highlighted in red below.  Selecting the individual plan_id’s will show the graphical execution plan in the bottom pane.  The execution plan with plan_id 49 is the most efficient plan and is the one we will force using Query Store.

querystore_2

Switching back to the chart format, I will select plan_id 90 and plan_id 49 and then compare the execution plans side by side.  Looking at the execution plans and plan properties below, we can see the poorly performing plan at the top with plan_id 90, is a parallel plan with a table scan and hash match join compiled using the SQL Server 2016 CE.  The more performant plan at the bottom with plan_id 49 is a serial plan with an index seek and RID lookup using a nested loop join compiled using the Legacy CE.  The query regression is obvious here.  Query Store has made it very easy for us to visually identify that this particular query performs better under the Legacy CE.

querystore_3

querystore_4

To force plan_id 49, select the plan and, click the ‘Force Plan’ button highlighted below and accept the pop up confirmation box.

querystore_5

Kick off another HammerDB workload test and enter the results into the table below:

hammerdb_workloadresults6

Using Query Store to force the good plan has resulted in the best workload throughput.  The PerfMon graph for this series of tests would look like this:

perfmon_querystoretest

Conclusion

Query Store provides insight on query plan choice and performance.  It makes this information easily accessible and it simplifies performance troubleshooting by helping you quickly find performance differences caused by query plan changes.  Prior to Query Store, troubleshooting query regressions was more challenging and relied on manual analysis of the plan cache and various DMVs.

Query Store enables the database administrator to directly affect the performance of queries by forcing Execution Plans for specific queries.  Prior to Query Store, this type of manual intervention was achieved through the use of plan guides and trace flags which required modifying code and was a much more involved process.

The key points of this blog post were to show how useful the Query Store can be in SQL Server DBA’s tool belt and to highlight the importance of testing applications workloads against the new cardinality estimator models prior to upgrading to SQL Server 2014 or SQL Server 2016.

Resources & Further Reading

I highly recommend reading @SQL_JGood’s excellent posts on this.  Using his examples, I built upon and added what I have learned recently out there in the real world.

Enrico Van De Laar ( @evdlaar ) has a great series on Query Store on the Simple-Talk site for anyone wanting a deep look into this feature.

Posted in Statistics, CE & QO | 3 Comments

Statistics & Cardinality Estimator Model Variations

This blog post covers some of the fundamentals of statistics in SQL Server, the cardinality estimation models across different versions of SQL Server, how cardinality estimates are derived in different scenarios and variations in estimates under the different cardinality estimation models.

SQL Server Versions 7 – 2012 used the same cardinality estimation model 70.  In SQL Server 2014, a redesigned, revamped, cardinality estimator model 120 was introduced. Further improvements were made in SQL Server 2016 which uses cardinality esimation model 130. Notice how the CE (Cardinality Estimation) model versions tie in nicely with the database compatibility levels.  Confirming the CE version used for a given query can be accomplished by checking the CardinalityEstimationModelVersion attribute in an execution plan.

The demos/examples in this post use Microsoft’s new sample database WideWorldImporters and the intention behind the sample code provided is so the reader can follow along and see the differences in cardinality estimates under the different models for themselves.

Statistics Terminology & Definitions

Statistic objects in SQL Server are light-weight objects containing statistical information about the distribution of values in one or more columns.  Statistics are used to calculate cardinality estimates during optimisation allowing the query optimiser to generate efficient query execution plans.

Density is a measure of the uniqueness of the data in a column(s). It is a ratio that shows just how many unique values there are within a given column, or set of columns. Highly dense columns aren’t very unique. Density is calculated as:
Density = 1 / ( # distinct values for column(s) )
Density = SELECT 1.0 / COUNT( DISTINCT [ColumnName] ) FROM [TableName];

Selectivity is a measure of the uniqueness of a given predicate. It is a number between 0 and 1, representing the fraction of rows in the table that pass the predicate. For example, if a predicate selects 12 rows from a table of 100 rows, the selectivity is (12/100) = 0.12. When a column is referred to as “highly selective” that usually means values aren’t repeating all that often, if at all. Selectivity for a filter predicate can be calculated as:
Selectivity = (# rows that pass the predicate) / (# rows in the table)

Cardinality is the number of rows returned by a query operator. Each operator in a query plan has an estimated cardinality (the number of rows the optimiser guessed that the operator would return) and an actual cardinality (the number of rows that the operator returned in actuality). You can see both by running a query with “SET STATISTICS PROFILE ON” or by capturing the actual execution plan.

Data Skew is the uneven distribution of data within a column.

Statistical Model Assumptions

The query optimiser needs to perform well across several workload types such as OLTP , relational data warehouses (OLAP) and hybrid workloads. In order to strike a balance given typical SQL Server customer usage scenarios and the vast potential for variations in data distribution, volume and query patterns, the query optimiser has to make certain assumptions which may or may not reflect the actual state of any given database design and data distribution. The core assumption models are:

  • Independence: Data distributions on different columns are independent unless correlation information is available (think multi-column statistics).
  • Uniformity: Within each statistics object histogram step, distinct values are evenly spread and each value has the same frequency.
  • Inclusion: For filter predicates involving a column-equals-constant expression, the constant is assumed to actually exist for the associated column.
  • Containment: For a join predicate involving an equijoin for two tables, it is assumed that distinct join column values from one side of the join will exist on the other side of the join. In addition, the smaller range of distinct values is assumed to be contained in the larger range. In the presence of additional non-join filter predicates the legacy CE assumes some level of correlation. This implied correlation is called “simple containment”.

A few of those assumptions changed in the new  SQL Server 2014/2016 CE, namely:

  • Independence becomes Correlation: In absence of existing multi-column statistics, the legacy CE views the distribution of data contained across different columns as uncorrelated with one another. This assumption of independence often does not reflect the reality of a typical SQL Server database schema, where implied correlations do actually exist. The new CE uses an increased correlation assumption for multiple predicates and an exponential back off algorithm to derive cardinality estimates.
  • Simple Join Containment becomes Base Join Containment: Under the legacy CE, the assumption is that non-join predicates are somehow correlated which is called “Simple Containment”. For the new Cardinality Estimator, these non-join predicates are assumed to be independent (called “Base Containment”), and so this can translate into a reduced row estimate for the join. At a high level, the new CE derives the join selectivity from base-table histograms without scaling down using the associated filter predicates. Instead the new CE computes join selectivity using base-table histograms before applying the selectivity of non-join filters.

The Importance of Accurate Cardinality Estimation

Good cardinality estimation (row count expectations at each node of the logical tree) is vital; if these numbers are wrong, all later decisions are affected. Cardinality estimates are a major factor in deciding which physical operator algorithms are used, the overall plan shape (join orders etc) and ultimately they determine the final query plan that executes. Given these critical plan choices, when the cardinality estimation process contains a significantly skewed assumption, this can lead to an inefficient plan choice. This can, in turn, result in degraded performance.

Under estimating rows can lead to:

  • Memory spills to disk, for example, where not enough memory was requested for sort or hash operations.
  • The selection of a serial plan when parallelism would have been more optimal.
  • Inappropriate join strategies.
  • Inefficient index selection and navigation strategies.

Inversely, over estimating rows can lead to:

  • Inflated memory grants.
  • Selection of a parallel plan when a serial plan might be more optimal.
  • Inappropriate join strategy selection.
  • Inefficient index navigation strategies (scan versus seek).

Improving the accuracy of cardinality estimates can improve the quality of the query execution plan and, as a result, improve the performance of the query.

Statistics Objects in SQL Server

Cardinality estimates are row count estimates calculated for each operator within a query execution plan. Estimates are calculated using input from statistics associated with objects referenced in the query. A statistics object has three main areas of information associated with it; the header, density vector and histogram.

  • Header: The header includes information such as the last time statistics were updated and the number of sampled rows.
  • Density Vector: The density vector measures the uniqueness of a column or set of columns, with lower density values indicating a higher uniqueness.
  • Histogram: The histogram represents a column’s data distribution and frequency of occurrence for distinct values. Histograms are limited to 200 contiguous steps, with steps representing noteworthy boundary values.

Using the WideWorldImporters database, create an index on the orderdate column, then take a look at the statistic object created as a by-product of the index creation:

CREATE INDEX NC_IX_Purchasing_PurchaseOrders_OrderDate
ON Purchasing.PurchaseOrders (OrderDate)

DBCC SHOW_STATISTICS(N'Purchasing.PurchaseOrders', NC_IX_Purchasing_PurchaseOrders_OrderDate)
--WITH STAT_HEADER, DENSITY_VECTOR, HISTOGRAM
GO

1-dbcc-show_statistics

Note that in the statistics header, ‘Rows’ is equal to ‘Rows Sampled’ telling us that this statistic was created/updated using a full scan which is to be expected since statistics created as a by-product of index creation/re-building will use a full scan.  Also note that the density vector contains a second row for the combination of the OrderDate and PurchaseOrderId columns despite the nonclustered index explictly being created only on the OrderDate column.  This is because the clustered index key columns are included in all nonclustered indexes.  Finally, note that the histogram only represents data distribution for the first column in the density vector which is the first named column or left most column of the index.

Querying Statistics With T-SQL

EXEC [sp_helpindex] 'schema.object';
GO

EXEC [sp_helpstats] 'schema.object', 'all';
GO

SELECT
sp.last_updated,
stat.stats_id,
stat.name as stats_name,
STUFF((SELECT ', ' + cols.name
FROM sys.stats_columns AS statcols
JOIN sys.columns AS cols ON
statcols.column_id=cols.column_id
AND statcols.object_id=cols.object_id
WHERE statcols.stats_id = stat.stats_id and
statcols.object_id=stat.object_id
ORDER BY statcols.stats_column_id
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)'), 1, 2, '') as stat_cols,
sp.modification_counter,
sp.rows,
sp.rows_sampled
FROM sys.stats as stat
CROSS APPLY sys.dm_db_stats_properties (stat.object_id, stat.stats_id) AS sp
JOIN sys.objects as so on
stat.object_id=so.object_id
JOIN sys.schemas as sc on
so.schema_id=sc.schema_id
WHERE
sc.name= 'Schema' -- Schema Name
and so.name='Table' -- Table Name
ORDER BY 1 DESC OPTION (RECOMPILE)
GO

DBCC SHOW_STATISTICS(N'schema.object', statistic_name)
--WITH STAT_HEADER, DENSITY_VECTOR, HISTOGRAM
GO 

The above T-SQL is useful when looking at statistics.

Find the Statistics Used to Compile an Execution Plan

Trace Flags: There are a number of undocumented trace flags that will allow you to determine which statistics objects were used by the query optimiser to produce an execution plan.  The below trace flags are useful when troubleshooting cardinality estimate issues:

TraceFlag Details
3604 Redirects trace output to the client
9481 Forces Legacy CE model used by SQL Server 7.0 – 2012
2312 Force New SQL Server 2014 CE
2363 New CE: Shows statistical computational information used by the new CE
9292 Legacy CE: Shows statistics objects which are considered ‘interesting’ by the query optimiser when compiling
9204 Legacy CE: Shows the ‘interesting’ statistics which end up being fully loaded and used by the query optimiser

For the upcoming examples in this blog post, the trace flags are used in the below manner:

SELECT column_name FROM table_name
OPTION
(
QUERYTRACEON 3604, QUERYTRACEON 2363 --New CE Stats Report
QUERYTRACEON 9481, QUERYTRACEON 3604, QUERYTRACEON 9292, QUERYTRACEON 9204 --Legacy CE Stats Report
)

Extended Events: The query_optimizer_estimate_cardinality XEvent fires when the Query Optimiser estimates cardinality on a relational expression.  This is outwith the scope of this blog, but see this post by Pedro Lopes and this post by Joe Sack for more details.

Mining The Plan Cache With TF 8666: If you really want to get your hands dirty, you can use the undocumented trace flag 8666 which exposes hidden internal information that is not normally available within an execution plan.  This trace flag exposes the ‘InternalInfo’ element in the execution plan XML which shows the statistic objects that were used by a query.  The below T-SQL, a modified version of the script in Fabio Amorim’s post can be used to mine the plan cache for this information:

DBCC TRACEON(8666)
GO
WITH XMLNAMESPACES ('http://schemas.microsoft.com/sqlserver/2004/07/showplan' as p)
SELECT	DB_NAME(qt.dbid) AS [Database], cp.objtype AS [CachedPlan_ObjectType],
		cp.usecounts AS [Use_Counts],
		TableName.XMLCol.value('@FieldValue','NVarChar(500)') AS [TableName],
		StatsUsed.XMLCol.value('@FieldValue','NVarChar(500)') AS [StatsName],
		qt.text AS [SQL_Statement],
		qp.query_plan AS [QueryPlan]
FROM sys.dm_exec_cached_plans cp
		CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) qp
		CROSS APPLY sys.dm_exec_sql_text (cp.plan_handle) qt
		CROSS APPLY query_plan.nodes('//p:Field[@FieldName="wszTable"]') TableName(XMLCol)
		CROSS APPLY query_plan.nodes('//p:Field[@FieldName="wszStatName"]') StatsUsed(XMLCol)
WHERE	qt.text NOT LIKE '%sys.%'
--AND	DB_NAME(qt.dbid) = 'UserDatabase'
 GO
 DBCC TRACEOFF(8666)
 GO

If you were considering dropping auto-created column stats, this technique could be used to measure what statistic objects are being used and what ones aren’t.  This trace flag is also discussed by Dave Ballantyne in this excellent post.

Local Variables Cardinality Estimates

The optimiser can sniff the values of parameters and constants but with variables it cannot. This is because optimisation is done at the batch level and at the point when a batch is compiled the variables have not been defined or given values. This leads to different row estimations when using variables as opposed to parameters or constants. In the absence of a known value the optimiser can’t use the histogram to estimate row count, instead it has to resort to the other information in the statistics object to derive cardinality estimates. Depending on the type of predicate and other factors, the estimate will be derived differently:

Equality Predicate (WHERE Parameter = @Value)
If the filtered column is unique, ( that is, it has a primary key or unique constraint defined on it, created implictly through index creation or explicilty otherwise ), the query optimiser knows that the values are unique and there can’t be more than one match, so it esitmates one row. If the column is non-unique, the query optimiser uses the density vector information to derive an estimate:
Estimate = Density Vector x Number Of Rows In Table
Estimate = SELECT [AllDensity_DensityVector] * [TableCardinality]

-- (Ctrl + M) Include Actual Execution Plan

-- Local Variable (New CE Equality)
DECLARE @OrderDate date
SET @OrderDate = '2016-05-31'
SELECT PurchaseOrderID, OrderDate
FROM Purchasing.PurchaseOrders
WHERE OrderDate = @OrderDate
OPTION ( QUERYTRACEON 3604, QUERYTRACEON 2363 );
-- New Ce Estimate -- SELECT (2074 * 0.0009398496) = 1.94925

--Local Variable (Legacy CE Equality)
DECLARE @OrderDate date
SET @OrderDate = '2016-05-31'
SELECT PurchaseOrderID, OrderDate
FROM Purchasing.PurchaseOrders
WHERE OrderDate = @OrderDate
OPTION ( QUERYTRACEON 9481, QUERYTRACEON 3604, QUERYTRACEON 9292, QUERYTRACEON 9204 );
-- Legacy Ce Estimate -- SELECT (2074 * 0.0009398496) = 1.94925

2-local-variable-est-rows

The same estimate is derived under both the legacy CE (CE Model 70) and the new CE (CE Model 130) and the same execution plan shape is generated too.  Looking back at the earlier screenshot of the statistics histogram, the last step in the histogram is ‘2016-05-31’ which has an EQ_ROWS value of 2.  Since the query optimiser can’t use the histogram in this scenario, it uses the density vector to calculate an estimate of 1.94925 rows which in this case is very close to the actual 2 rows returned.  By looking at the messages tab, you will see some information on statistics and decisions made by the query optimiser generated via the various trace flags.

InEquality Predicate (WHERE Parameter >, >=, <, <= @Value)
Estimate = 30% of Number Of Rows In Table
Estimate = SELECT [TableCardinality] * 0.3

--Local Variable (New CE InEquality)
DECLARE @OrderDate date
SET @OrderDate = '2016-05-31'
SELECT PurchaseOrderID, OrderDate
FROM Purchasing.PurchaseOrders
WHERE OrderDate &gt; @OrderDate
OPTION ( QUERYTRACEON 3604, QUERYTRACEON 2363 );
-- New Ce Estimate -- Estimate = SELECT (2074 * 0.3) = 622.2

--Local Variable (Legacy CE InEquality)
DECLARE @OrderDate date
SET @OrderDate = '2016-05-31'
SELECT PurchaseOrderID, OrderDate
FROM Purchasing.PurchaseOrders
WHERE OrderDate &gt; @OrderDate
OPTION ( QUERYTRACEON 9481, QUERYTRACEON 3604, QUERYTRACEON 9292, QUERYTRACEON 9204 )
-- Legacy Ce Estimate -- Estimate = SELECT (2074 * 0.3) = 622.2

BETWEEN
When using the BETWEEN operator with local variables the query optimiser will make estimations differently depending on the CE model:

  • Legacy CE: A hard coded guess of 9% of the tables cardinality is used.
    Estimate = SELECT [TableCardinality] * 0.09
  • New CE: The guess is computed using exponential backoff, applied to the separate >= and <= comparisons that BETWEEN is shorthand for. The guess for >= and <= is 30% (0.3) each, so the overall guess is 0.3 * SQRT(0.3) = 0.164317 (* table cardinality), so it estimates 16.4317% of the tables cardinality.
    Estimate = SELECT [TableCardinality] * 0.164317
--Local Variable (New CE BETWEEN)
DECLARE @OrderDate1 date = '2016-05-28'
DECLARE @OrderDate2 date = '2016-05-31'
SELECT PurchaseOrderID, OrderDate
FROM Purchasing.PurchaseOrders
WHERE OrderDate BETWEEN @OrderDate1 AND @OrderDate2
OPTION ( QUERYTRACEON 3604, QUERYTRACEON 2363 );
-- New Ce Estimate -- Estimate = SELECT (2074 * 0.164317) = 340.793

--Local Variable (Legacy CE BETWEEN)
DECLARE @OrderDate1 date = '2016-05-28'
DECLARE @OrderDate2 date = '2016-05-31'
SELECT PurchaseOrderID, OrderDate
FROM Purchasing.PurchaseOrders
WHERE OrderDate BETWEEN @OrderDate1 AND @OrderDate2
OPTION ( QUERYTRACEON 9481, QUERYTRACEON 3604, QUERYTRACEON 9292, QUERYTRACEON 9204 )
-- Legacy Ce Estimate -- Estimate = SELECT (2074 * 0.09) = 186.66

LIKE
When using the LIKE operator with local variables the query optimiser will make estimations differently depending on the CE model:

  • Legacy CE: Uses a quirky, complex algorithm typical of the legacy LE:
    Estimate = C * (G / M * LOG(D))
    C = Table Cardinality (Number of rows as shown in the statistic header)
    G = The standard 9% guess
    M = A factor 6 (magic number)
    D = Average length rounded down to integer (Average data length taken from statistics density vector). LOG(D) is omitted if D is between one 1 & 2. If D is less than 1 (including for missing or NULL statistics) then D = FLOOR(0.5 * maximum column byte length)
    Estimate = SELECT [TableCardinality] * (0.09 / 6 * LOG(D))
  • New CE: The query optimiser will estimate 9% of the tables cardinality.
    Estimate = SELECT [TableCardinality] * 0.09
-- Local Variable (New CE LIKE)
DECLARE @SupplierRef AS NVARCHAR(25) = N'BC%'
SELECT PurchaseOrderID, OrderDate, SupplierReference
FROM Purchasing.PurchaseOrders
WHERE SupplierReference LIKE @SupplierRef
OPTION ( QUERYTRACEON 3604, QUERYTRACEON 2363 );
-- New Ce Estimate -- Estimate = SELECT (2074 * 0.09) = 186.66

--Local Variable (Legacy CE LIKE)
DECLARE @SupplierRef AS NVARCHAR(25) = N'BC%'
SELECT PurchaseOrderID, OrderDate, SupplierReference
FROM Purchasing.PurchaseOrders
WHERE SupplierReference LIKE @SupplierRef
OPTION ( QUERYTRACEON 9481, QUERYTRACEON 3604, QUERYTRACEON 9292, QUERYTRACEON 9204 );
-- Legacy Ce Estimate -- Estimate = SELECT 2074 * (0.09 / 6 * LOG(14))) = 82.1011
-- Note that the optimiser will automatically generate a column statistic on SupplierReference
-- for this query using the legacy CE and not the query above that used the new CE.  This is
-- because the legacy CE uses information from the density vector, the new CE does not.

Selectivity Guesses (Missing Statistics)

In the absence of statistics, or indexes with the associated statistics, or constraints, the optimiser has to resort to heuristics and essentially has to guess in terms of the number of estimated rows. Based on the predicate form, you’ll get different guess types, like specific percentages that are guessed based on how you formed your predicate:

Equality Predicates
In the absence of statistics, when using an equality predicate (= operator) estimates are derived differently under the different CEs:

  • Legacy CE: Uses an estimate of C^0.75 (an exponent of three quarters) where C is the cardinality.
    Estimate = SELECT POWER( [TableCardinality],.75 )
  • New CE: Uses an estimate of C^0.5 (the square root) where C is the cardinality.
    Estimate = SELECT POWER( [TableCardinality],.5 )
ALTER DATABASE WideWorldImporters SET AUTO_CREATE_STATISTICS OFF
GO

SELECT * INTO Purchasing.PurchaseOrders_NoStats
FROM Purchasing.PurchaseOrders

-- Selectivity Guess (New CE Equality)
SELECT PurchaseOrderID, OrderDate
FROM Purchasing.PurchaseOrders_NoStats
WHERE OrderDate = '2016-05-31'
OPTION ( QUERYTRACEON 3604, QUERYTRACEON 2363 );
-- New Ce Estimate = Estimate = SELECT POWER(2074.00,.5) = 45.54

-- Selectivity Guess (Legacy CE Equality)
SELECT PurchaseOrderID, OrderDate
FROM Purchasing.PurchaseOrders_NoStats
WHERE OrderDate = '2016-05-31'
OPTION ( QUERYTRACEON 9481, QUERYTRACEON 3604, QUERYTRACEON 9292, QUERYTRACEON 9204 )
-- Legacy Ce Estimate -- Estimate = SELECT POWER(2074.00,.75) = 307.33

In the example above, the automatic creation of statistics are disabled and a new table is created to simulate a scenario where the query optimiser has to resort to a selectivity guess.  There will be ‘Columns With No Statistics’ warnings in the execution plan.   For the first query labelled ‘– Selectivity Guess (New CE Equality)’ the selectivity guess can be seen in the messages tab:3-selectivity-guess

Inequality Predicates
In the absence of statistics, when using an inequality predicate (>, >=, <, <= operators) the query optimiser will estimate 30% of the tables cardinality, this is the same across all versions of the CE.
Estimate = SELECT [TableCardinality] * 0.3

-- Selectivity Guess (New CE InEquality)
SELECT PurchaseOrderID, OrderDate
FROM Purchasing.PurchaseOrders_NoStats
WHERE OrderDate &gt; '2016-05-31'
OPTION ( QUERYTRACEON 3604, QUERYTRACEON 2363 );
-- New Ce Estimate -- Estimate = SELECT (2074 * 0.3) = 622.2

-- Selectivity Guess (Legacy CE InEquality)
SELECT PurchaseOrderID, OrderDate
FROM Purchasing.PurchaseOrders_NoStats
WHERE OrderDate &gt; '2016-05-31'
OPTION ( QUERYTRACEON 9481, QUERYTRACEON 3604, QUERYTRACEON 9292, QUERYTRACEON 9204 )
-- Legacy Ce Estimate -- Estimate = SELECT (2074 * 0.3) = 622.2

BETWEEN
In the absence of statistics, when using the BETWEEN operator the query optimiser will estimate 9% of the tables cardinality, this is the same across all versions of the CE.
Estimate = SELECT [TableCardinality] * 0.09

Note that the query optimiser internally evaluates BETWEEN as two inequality operators, ie ‘WHERE FilterPredicate BETWEEN Value1 AND Value2’ is evaluated as ‘WHERE FilterPredicate >= Value1 AND FilterPredicate <= Value2’. So two inequality operators explicitly specified together as a conjuction will result in the same estimate of 9%.

LIKE
In the absence of statistics, when using the LIKE operator, estimates are derived differently under the different CEs and depending on the use of the LIKE operator:

  • Legacy CE: Estimates vary depending on the use of the LIKE operator and the number of wildcards. For a single leading or trailing wildcard ( LIKE ‘%A’; LIKE ‘A%’; ) the estimate will be 26.9616% of the table cardinality.
    Estimate = SELECT [TableCardinality] * 0.269616
    For a leading and trailing wildcard ( LIKE ‘%A%’; ) the estimate will be 53.9232% of the table cardinality.
    Estimate = SELECT [TableCardinality] * 0.539232
  • New CE: The query optimiser will estimate 9% of the tables cardinality regardless of the use of the LIKE operator and the number of wildcards:
    Estimate = SELECT [TableCardinality] * 0.09

This blog post by Joe Sack covers more scenarios.

-- Selectivity Guess (New CE LIKE)
SELECT PurchaseOrderID, OrderDate, SupplierReference
FROM Purchasing.PurchaseOrders_NoStats
WHERE SupplierReference LIKE N'B%'
OPTION ( QUERYTRACEON 3604, QUERYTRACEON 2363 );
-- New Ce Estimate -- Estimate = SELECT (2074 * 0.09) = 186.66

-- Selectivity Guess (Legacy CE LIKE)
SELECT PurchaseOrderID, OrderDate, SupplierReference
FROM Purchasing.PurchaseOrders_NoStats
WHERE SupplierReference LIKE N'B%'
OPTION ( QUERYTRACEON 9481, QUERYTRACEON 3604, QUERYTRACEON 9292, QUERYTRACEON 9204 )
-- Legacy Ce Estimate -- Estimate = SELECT (2074 * 0.269616) = 559.183

DROP TABLE Purchasing.PurchaseOrders_NoStats

ALTER DATABASE WideWorldImporters SET AUTO_CREATE_STATISTICS ON
GO

Multiple Predicate Cardinality Estimates

By default, during query optimisation, SQL Server automatically creates single-column statistics on all predicate columns (if not already available). SQL Server will not automatically create multi-column statistics, they could be created manually, but would still only create a histogram over the first named column.

In the absence of multi-column column statistics, the cardinality estimate model is as follows:

Legacy CE: The cardinality estimator uses the independency assumption model where SQL Server generally assumes that values of different attributes in a table are distributed completely independently of each other.

  • AND Selectivity: Using the independency assumption, three predicates connected by AND (known as a conjunction) with selectivities S1, S2 and S3 result in a combined selectivity of: (S1 * S2 * S3)
    Estimate = Table Cardinality * (S1 * S2 * S3)
    Estimate = SELECT ( SELECT COUNT(*) FROM [TableName] ) * ( S1 * S2 * S3 )
  • OR Selectivity: Two predicates connected by OR (a disjunction) with selectivities S1 and S2, results in a combined selectivity of: (S1 + S2) – (S1 * S2)
    Estimate = Table Cardinality (TotalRows) * ( (S1 + S2) – (S1 * S2) )
    Estimate = SELECT ( SELECT COUNT(*) FROM [TableName] ) * ( (S1 + S2) – (S1 * S2) )

New CE: The new cardinality estimator uses an exponential backoff formula for both conjunctive(AND) and disjunctive(OR) predicates, though the formula used in the disjunctive (OR) case has not yet been documented (officially or otherwise). The new approach to conjunctive predicates is to use exponential backoff: given a table with cardinality C, and predicate selectivities S1, S2, S3, S4 where S1 is the most selective and S4 the least:
Estimate = Table Cardinality * S1 * SQRT(S2) * SQRT(SQRT(S3)) * SQRT(SQRT(SQRT(S4)))
Estimate = SELECT ( SELECT COUNT(*) FROM [TableName] ) * S1 * SQRT(S2) * SQRT(SQRT(S3)) * SQRT(SQRT(SQRT(S4)))

-- Manually create single column statistics
CREATE STATISTICS AC_Region ON Application.Countries (Region) WITH FULLSCAN
CREATE STATISTICS AC_SubRegion ON Application.Countries (SubRegion) WITH FULLSCAN

-- Multiple Predicate in absence of multi-column statistics (New CE Conjunction)
SELECT CountryName, Region, Subregion
FROM Application.Countries
WHERE Region = 'Americas'
AND Subregion = 'Caribbean'
OPTION ( QUERYTRACEON 3604, QUERYTRACEON 2363 );
-- Region Selectivity = SELECT(35.00 / 190) = 0.184210 (S2)
-- SubRegion Selectivity = SELECT(13.00 / 190) = 0.068421 (S1 Most Selective)
-- New Ce Estimate -- Estimate = SELECT ( 190 ) * 0.068421 * SQRT(0.184210) = 5.5795

-- Multiple Predicate in absence of multi-column statistics (Legacy CE Conjunction)
SELECT CountryName, Region, Subregion
FROM Application.Countries
WHERE Region = 'Americas'
AND Subregion = 'Caribbean'
OPTION ( QUERYTRACEON 9481, QUERYTRACEON 3604, QUERYTRACEON 9292, QUERYTRACEON 9204 )
-- Legacy Ce Estimate -- Estimate = SELECT (190) * (0.068421 * 0.184210) = 2.39474

-- Multiple Predicate in absence of multi-column statistics (New CE Disjunction)
SELECT CountryName, Region, Subregion
FROM Application.Countries
WHERE Region = 'Americas'
OR Subregion = 'Caribbean'
OPTION ( QUERYTRACEON 3604, QUERYTRACEON 2363 );
-- New Ce Estimate -- Estimate = 40.3966
-- Exponential backoff formula for disjunctive(OR) predicates is undocumented
-- Region Selectivity = SELECT(35.00 / 190) = 0.184210 (S2)
-- SubRegion Selectivity = SELECT(13.00 / 190) = 0.068421 (S1 Most Selective)

-- Multiple Predicate in absence of multi-column statistics (Legacy CE Disjunction)
SELECT CountryName, Region, Subregion
FROM Application.Countries
WHERE Region = 'Americas'
OR Subregion = 'Caribbean'
OPTION ( QUERYTRACEON 9481, QUERYTRACEON 3604, QUERYTRACEON 9292, QUERYTRACEON 9204 )
-- Legacy Ce Estimate -- Estimate = SELECT (190) * ((0.068421 + 0.184210) - (0.068421 * 0.184210)) = 45.6053

In the presence of multi-column column statistics, the cardinality estimates are calculated under the different models as follows:

  • SQL Server 7 – 2012 (Legacy CE): The legacy cardinality estimator will take advantage of multi-column statistics and takes the ‘all density’ value for the combination of columns and multiplies it by the total number of rows in the table.
  • SQL Server 2014: The new CE introduced in SQL Server 2014 RTM does not make use of multi-column statistics and uses exponential backoff with the individual single column statistics.
  • SQL Server 2016: Further improvements were made in SQL Server 2016 and the CE model version 130 takes advantage of multi-column statistics again using the ‘all density’ value for the combination of columns and multipliying it by the total number of rows in the table.

Despite multi-column, column statistics (and filtered statistics for that matter) not being accessible by the SQL Server 2014 CE, you can still leverage these by forcing the legacy CE using trace flags and query hints.

-- Manually create a multi-column statistic
CREATE STATISTICS AC_Region_Subregion ON Application.Countries (Region, Subregion) WITH FULLSCAN

-- Multiple Predicate (New CE Conjunction With MultiColumn Stats)
SELECT CountryName, Region, Subregion
FROM Application.Countries
WHERE Region = 'Americas'
AND Subregion = 'Caribbean'
OPTION ( QUERYTRACEON 3604, QUERYTRACEON 2363 );
-- New Ce Estimate -- Estimate = SELECT 0.04545455 * 190 = 8.63636 

-- Multiple Predicate (Legacy CE Conjunction With MultiColumn Stats)
SELECT CountryName, Region, Subregion
FROM Application.Countries
WHERE Region = 'Americas'
AND Subregion = 'Caribbean'
OPTION ( QUERYTRACEON 9481, QUERYTRACEON 3604, QUERYTRACEON 9292, QUERYTRACEON 9204 )
-- Legacy Ce Estimate -- Estimate = SELECT 0.04545455 * 190 = 8.63636

ALTER DATABASE [WideWorldImporters] SET COMPATIBILITY_LEVEL = 120
GO

-- Multiple Predicate (New CE (SQL Server 2014 only) Conjunction With MultiColumn Stats)
SELECT CountryName, Region, Subregion
FROM Application.Countries
WHERE Region = 'Americas'
AND Subregion = 'Caribbean'
OPTION ( QUERYTRACEON 3604, QUERYTRACEON 2363 );
-- SQL Server 2014 CE does not make use of multi-column stats
-- Region Selectivity = SELECT(35.00 / 190) = 0.184210 (S2)
-- SubRegion Selectivity = SELECT(13.00 / 190) = 0.068421 (S1 Most Selective)
-- New Ce Estimate -- Estimate = SELECT ( 190 ) * 0.068421 * SQRT(0.184210) = 5.5795 

DROP STATISTICS Application.Countries.AC_Region
DROP STATISTICS Application.Countries.AC_SubRegion
DROP STATISTICS Application.Countries.AC_Region_Subregion

ALTER DATABASE [WideWorldImporters] SET COMPATIBILITY_LEVEL = 130
GO

Scaling Cardinality Estimates

All of the demos/examples so far have used statistic objects generated with a full scan where no data is being updated and no new rows are being inserted so the statistics can be considered as being fresh, accurate and up to date.  What happens when the data isn’t static, are the cardinality estimates affected?

Statistics Up To Date
When there is a direct hit in the statistics histogram for an equality predicate, the query optimiser can use the EQ_ROWS (equality rows) value to get a perfect estimation, assuming the statistics are up to date. When the equality predicate falls within the entire histogram range but doesn’t result in a direct histogram step hit the query optimiser can use the AVG_RANGE_ROWS (average range rows) to derive an accurate estimation. This is the same across all versions of the CE.

Statistics Out Of Date – Literal or Parameter Equality Predicate
Both CEs are aware of outdated statistics and scale the estimation according to actual total number of rows in the table. When there is a direct histogram step hit, the below formula is used where the equality rows from the statistics histogram is multiplied by the current number of rows in the table then divided by the number of rows as shown in the statistics header:

Estimate = (EQ_ROWS * Actual Current Rows) / Statistics Total Rows

When the equality predicate is within the entire histogram range but doesn’t result in a direct histogram step hit the query optimiser will use the AVG_RANGE_ROWS value from the statistics object to derive an estimate.  The behaviour is slightly different under the different CE models.  Under the legacy CE, the estimate does not scale and the estimate will be based solely on the AVG_RANGE_ROWS value from the statistics object.   Under the new CE, the estimation is again scaled using the same principal:

Legacy CE Estimate = AVG_RANGE_ROWS
New CE Estimate = (AVG_RANGE_ROWS * Actual Current Rows) / Statistics Total Rows

-- All of the previous examples so far have had up to date, full scan statistics, but what happens when
-- new rows are inserted, do the cardinality estimates scale?

-- Insert rows, some within the current histogram range, some outside the current histogram range, but not
-- enough rows to breach the threshold for triggering an automatic update of statistics.
-- Turn off actual execution plans before doing the INSERT operation
INSERT INTO [Purchasing].[PurchaseOrders]
           ([SupplierID],[OrderDate],[DeliveryMethodID],[ContactPersonID],[ExpectedDeliveryDate],[SupplierReference]
           ,[IsOrderFinalized],[Comments],[InternalComments],[LastEditedBy])
     VALUES (7, '2016-05-31', 2, 2, '2016-06-09', 'BC0280982', 1, NULL, NULL, 7);
GO 163
INSERT INTO [Purchasing].[PurchaseOrders]
           ([SupplierID],[OrderDate],[DeliveryMethodID],[ContactPersonID],[ExpectedDeliveryDate],[SupplierReference]
           ,[IsOrderFinalized],[Comments],[InternalComments],[LastEditedBy])
     VALUES (7, '2016-11-08', 2, 2, '2016-11-20', 'BC0280982', 1, NULL, NULL, 7);
GO 163

DBCC SHOW_STATISTICS(N'Purchasing.PurchaseOrders', NC_IX_Purchasing_PurchaseOrders_OrderDate)
--WITH STAT_HEADER, DENSITY_VECTOR, HISTOGRAM
GO

-- Scaling Estimate (New CE Equality - Direct Histogram Step Hit)
SELECT PurchaseOrderID, OrderDate
FROM Purchasing.PurchaseOrders
WHERE OrderDate = '2016-05-31'
OPTION ( QUERYTRACEON 3604, QUERYTRACEON 2363 );
-- Estimate = (EQ_ROWS * Actual Current Rows) / Statistics Total Rows
-- New Ce Estimate -- Estimate = SELECT( 2.00 * 2400 ) / 2074 = 2.31437

-- Scaling Estimate (Legacy CE Equality - Direct Histogram Step Hit)
SELECT PurchaseOrderID, OrderDate
FROM Purchasing.PurchaseOrders
WHERE OrderDate = '2016-05-31'
OPTION ( QUERYTRACEON 9481, QUERYTRACEON 3604, QUERYTRACEON 9292, QUERYTRACEON 9204 );
-- Legacy Ce Estimate -- Estimate = SELECT( 2.00 * 2400 ) / 2074 = 2.31437

-- Scaling Estimate (New CE Equality - Non Direct Histogram Step Hit)
SELECT PurchaseOrderID, OrderDate
FROM Purchasing.PurchaseOrders
WHERE OrderDate = '2013-01-03'
OPTION ( QUERYTRACEON 3604, QUERYTRACEON 2363 );
-- New Ce Estimate -- Estimate = SELECT( 5.00 * 2400 ) / 2074 = 5.78592

-- Scaling Estimate (Legacy CE Equality - Non Direct Histogram Step Hit)
SELECT PurchaseOrderID, OrderDate
FROM Purchasing.PurchaseOrders
WHERE OrderDate = '2013-01-03'
OPTION ( QUERYTRACEON 9481, QUERYTRACEON 3604, QUERYTRACEON 9292, QUERYTRACEON 9204 );
-- Legacy Ce Estimate = 5
-- The estimate does not scale and is based on the AVG_RANGE_ROWS value from the statistics object.

Statistics Out Of Date – Local Variable Equality Predicate
The values assigned to variables are not known by the optimiser at compile time and as such it can’t use the histogram and has to resort to the density vector to derive cardinality estimates. In this scenario, where the statistics are out of date, the estimate does not scale under the legacy CE:

Estimate = SELECT [AllDensity_DensityVector] * [StatisticsHeaderTotalRows]

Under the new CE, the estimation is again scaled and uses the actual number of rows in the table:

Estimate = SELECT [AllDensity_DensityVector] * [ActualTotalRows]

-- Scaling Estimate (New CE Equality - Local Variable)
DECLARE @OrderDate date
SET @OrderDate = '2016-05-31'
SELECT PurchaseOrderID, OrderDate
FROM Purchasing.PurchaseOrders
WHERE OrderDate =  @OrderDate
OPTION ( QUERYTRACEON 3604, QUERYTRACEON 2363 );
-- New Ce Estimate -- Estimate = SELECT 0.0009398496 * 2400 = 2.25564
-- The estimate is scaled and is based on the tables current cardinality

-- Scaling Estimate (Legacy CE Equality - Local Variable)
DECLARE @OrderDate date
SET @OrderDate = '2016-05-31'
SELECT PurchaseOrderID, OrderDate
FROM Purchasing.PurchaseOrders
WHERE OrderDate =  @OrderDate
OPTION ( QUERYTRACEON 9481, QUERYTRACEON 3604, QUERYTRACEON 9292, QUERYTRACEON 9204 )
-- Legacy Ce Estimate -- Estimate = SELECT 0.0009398496 * 2074 = 1.94295
-- The estimate does not scale and is based on the total rows in the statistic object

Searching For Off-Histogram Values (Selectivity Guesses)

The inclusion assumption dictates that we query for data that does exist and as such the cardinality estimator will (almost) never estimate zero rows.  Even if a query selects from a table that has been truncated, the query optimiser will still estimate one row despite the table being completely empty as pointed out by Kendra Little here.

Legacy CE: In the absence of an actual histogram step when a filter predicate falls outside the histogram range, the cardinality estimator assumes the value actually exists and will estimate one row. This is true regardless of the predicate, ( =, >, <, >=, <=, BETWEEN ), as long as the filter predicate falls outside the histogram range, the estimate will be one row.

-- Searching For Off-Histogram Values (Legacy CE Equality)
SELECT PurchaseOrderID, OrderDate
FROM Purchasing.PurchaseOrders
WHERE OrderDate = '2016-11-08'
OPTION ( QUERYTRACEON 9481, QUERYTRACEON 3604, QUERYTRACEON 9292, QUERYTRACEON 9204 )
-- Legacy Ce Estimate = 1

New CE (SQL Server 2014 & 2016): In the absence of an actual histogram step when an equality filter predicate falls outside the histogram range, the cardinality estimator will use the density vector information. The estimate is calculated by using the all density value from the density vector multiplied by the total number of rows from the statistics header, the estimate is not scaled:

Estimate = Density Vector x Table Cardinality
Estimate = SELECT [AllDensity_DensityVector] * [StatisticsHeaderTotalRows]

-- Searching For Off-Histogram Values (New CE Equality)
SELECT PurchaseOrderID, OrderDate
FROM Purchasing.PurchaseOrders
WHERE OrderDate = '2016-11-08'
OPTION ( QUERYTRACEON 3604, QUERYTRACEON 2363 );
-- New Ce Estimate -- Estimate = SELECT 0.0009398496 * 2074 = 1.94925

The New CE simply ignores literals if they are beyond the statistics histogram and for inequality predicates it estimates 30% of the total number of modifications for the leading statistics column since the last time statistics were updated. So it makes a 30% guess over the added rows, which can be calculated as such:

Estimate =SELECT(rowmodctr*0.3) FROM sys.sysindexes WHERE name=’IndexName’

For the BETWEEN logical operator, the 9% guess over the added rows is made:

Estimate =SELECT(rowmodctr*0.09) FROM sys.sysindexes WHERE name=’IndexName’

-- Searching For Off-Histogram Values (New CE InEquality)
SELECT PurchaseOrderID, OrderDate
FROM Purchasing.PurchaseOrders
WHERE OrderDate &gt; '2016-05-31'
OPTION ( QUERYTRACEON 3604, QUERYTRACEON 2363 );
-- Estimate = SELECT (rowmodctr * 0.3) FROM sys.sysindexes WHERE name = 'NC_IX_Purchasing_PurchaseOrders_OrderDate'
-- New Ce Estimate = 97.8 

-- Searching For Off-Histogram Values (Legacy CE InEquality)
SELECT PurchaseOrderID, OrderDate
FROM Purchasing.PurchaseOrders
WHERE OrderDate &gt; '2016-05-31'
OPTION ( QUERYTRACEON 9481, QUERYTRACEON 3604, QUERYTRACEON 9292, QUERYTRACEON 9204 )
-- Legacy Ce Estimate = 1

Ascending Key Problem

Ascending Key is a common data pattern found in most databases usually implemented using identity columns, real time timestamp columns etc. Insert operations append new values to ascending columns and when the number of rows added is too small to trigger a statistics update, the most recently added range is not modelled by the histogram which could lead to inaccurate cardinality estimates as was demonstrated in the pevious section ‘Searching For Off-Histogram Values (Selectivity Guesses)’.

SQL Server can detect when the leading column of a statistics object is ascending and can mark it as ascending. A statistics object that belongs to an ascending column is branded as “Ascending” after three statistics updates using full scan and for each statistics update, more than 90% of newly added rows need to have a value greater than the current maximum value in the statistics histogram. If older data is inserted, breaking the ascending sequence, the column will be branded as “Stationary”.

A statistics’ branding is hidden by default but can be viewed using trace flag 2388. With this trace flag turned on it changes the output of DBCC SHOW_STATISTICS and displays a new column called “Leading column type” which contains the column branding.

By default, the query optimiser keeps the information about the branding of statistics, but doesn’t make use of it. The optimiser won’t choose a different plan based on whether the column is ascending or not. To change this, there are several trace flags available:

  • TF 2389: Helps the Query Optimiser to come up with better estimations and correct execution plans only for columns branded as Ascending.
  • TF 2390: Instructs SQL Server to perform on-the-fly calculation for statistics objects for columns marked as Ascending or Unknown.
  • TF 4139: Works with all three Leading Column Types: Ascending, Unknown and Stationary. However, it is available from SQL Server 2012 SP1 CU10 or SP2 CU1

These trace flags are not required with the new CE in SQL Server 2014 as this behaviour is enabled by default.

Identifying Cardinality Estimate Issues

Execution Plans: The estimated plan will not help you in diagnosing cardinality esitmation issues, but the actual plan will and this can be captured using:

  • SET STATISTICS XML ON;
  • SET STATISTICS PROFILE ON; — (Deprecated)
  • Graphical Showplan (“Include Actual Execution Plan”)
  • Extended Events ( query_post_execution_showplan ) — (Beware of Overhead)

Query Profiles DMV: The sys.dm_exec_query_profiles DMV (introduced in SQL Server 2014) shows the number of rows processed by the various operators at that point in time. Look for row counts that are much higher than the estimated row count which could potentially be used to find skews due to overestimates. If it’s an under-estimate, this might be trickier to see using this DMV because you don’t know whether it’s a true under-estimate or whether all the rows just haven’t flowed through the operators yet.

SELECT [session_id], [node_id], [physical_operator_name],
[estimate_row_count], [row_count]
FROM [sys].[dm_exec_query_profiles]
ORDER BY [node_id];

Root Level Skews via Query Stats: The sys.dm_exec_query_stats DMV can be cross applied to sys.dm exex_sql_text and sys.dm exec query plan to show the estimated number of rows (which is shredded from the query plan xml), the last rows (which is the last number of rows that were returned for a particular execution of one of the queries) and the query text. This can be used to identify root level skews but will not show leaf level skews that are not propagated up through the tree to the root level.

SELECT DB_NAME(qt.dbid) AS [Database], cp.objtype AS [Cached_Plan_Type], cp.usecounts AS [Use_Counts],
qp.query_plan.value('(//@CardinalityEstimationModelVersion)[1]','INT') AS [CE_Version],
CONVERT(float, (qp.query_plan.value('(//@EstimateRows)[1]', 'varchar(128)'))) AS [Estimated_Rows],
qs.last_rows AS [Last_Rows],
CONVERT(float, (qp.query_plan.value('(//@EstimateRows)[1]', 'varchar(128)'))) - (qs.last_rows) AS [Estimate_Skew],
qs.total_logical_reads / qs.execution_count AS [Avg_Logical_Reads],
CAST((qs.total_elapsed_time ) / qs.execution_count AS DECIMAL(28, 2)) AS [Avg_Execution_Time(µs)],
CAST((qs.total_worker_time) / qs.execution_count AS DECIMAL(28, 2)) AS [Avg_CPU_Time(µs)],
SUBSTRING(qt.TEXT, (qs.statement_start_offset / 2) + 1, ((CASE qs.statement_end_offset WHEN - 1 THEN DATALENGTH(qt.TEXT) ELSE qs.statement_end_offset END - qs.statement_start_offset ) / 2 ) + 1) AS [Query_Text],
qs.query_hash AS [QueryHash], qp.query_plan AS [QueryPlan]
FROM sys.dm_exec_cached_plans cp WITH (NOLOCK)
CROSS APPLY sys.dm_exec_query_plan (cp.plan_handle) qp
CROSS APPLY sys.dm_exec_sql_text (cp.plan_handle) qt
INNER JOIN sys.dm_exec_query_stats qs ON qs.plan_handle = cp.plan_handle
WHERE qt.text NOT LIKE '%sys.%'
AND DB_NAME(qt.dbid) = '[Database_Name]'
ORDER BY [Avg_CPU_Time(µs)] DESC OPTION(RECOMPILE);
GO

Troubleshooting Questions

When troubleshooting cardinality estimate skews, there are several troubleshooting questions that should be asked in order to get to the root cause:

  • Are statistics missing? Is auto-creation of statistics disabled? Missing statistics warnings in query execution plans?
  • Are my statistics stale? Are the statistics stale and no longer reflect realistic distributions? Is auto-update of statistics disabled? Is ‘No Recompute’ enabled for specific statistics? Are some tables large enough that automatic statistics updates simply aren’t happening frequently enough?
  • Is the sampling adequate? If the statistics are sampled, would better quality statistics be produced with a full scan, or a higher sampling percentage?
  • Would multi-column statistics help? If multiple predicates are involved in the query and they happen to be correlated, would multi-column statistics help with the estimates?
  • Is this a parameter sniffing issue? This effects objects that take input parameters such as stored procedures.
  • Are table variables and/or MSTVF’s causing problems? If there are significant underestimates, look at the use of table variables or multi-statement table valued functions, which both use fixed cardinality and don’t have statistics associated with them.
  • Are only the most recent rows being queried? Is this a typical ascending key problem where the most recent data most likely isn’t reflected in the most recent histogram.
  • Data type conversion issues? Some data type conversions are known to block proper estimation.
  • Are linked servers involved? Linked servers can be associated with skews if your linked server credentials don’t have high enough permissions to pull statistics information. This was something fixed starting in SQL Server 2012 SP1.
  • Are there Intra-Table Column Comparisons? Cardinality estimation issues can occur when comparing columns within the same table. This is a known issue. If you have to do so, you can improve the cardinality estimates of the column comparisons by using computed columns instead or by re-writing the query to use self-joins or common table expressions.

Missing Statistics: Is the ‘Auto Create Statistics’ database setting set to false? If so, why? Verify that the database is not read-only. If the database is read-only, the query optimiser cannot save statistics (unless it’s a secondary read-only replica in an Availability Group in which case temporary statistics can be created in tempdb).

Stale Statistics: Stale statistics does not necessarily mean old statistics or statistics that haven’t been updated in a while, but rather outdated statistics that no longer accurately represent the current data distribution.

With default database settings, the query optimiser looks at how many changes have occurred for a given column statistic as part of query optimisation. If the number of rows in the column that have changed exceed certain thresholds, SQL Server updates the statistic, then optimises the query.  Why optimise a query on bad data estimates?

The query optimiser determines whether to update statistics based on changes to column modification counters (colmodctrs) and uses the below thresholds in SQL Server 2005 – 2014.

  • If the table has zero rows and one or more rows are added.
  • If the table has between 1-500 rows and 500 rows have changed.
  • If the table has 500+ rows and 500 rows + 20% of the total rows have changed.

Note that the statistics don’t auto-update when the rows are changed. It’s only when a query is being compiled and the statistics would actually be used to make an optimisation decision that the update is kicked off.

For very large tables (millions of rows) the default thresholds for automatic updating of statistics may not be triggered frequently enough. This could lead to potential performance problems. SQL Server 2008 R2 Service Pack 1 and later versions introduce trace flag 2371 that you can enable to change this default behaviour. Trace Flag 2371 makes the formula for large tables more dynamic and only applies to tables with more than 25,000 rows. Trace flag 2371 uses a decreasing, dynamic statistics update threshold, calculated as the square root of the table cardinality multiplied by 1000 which can be expressed as (RT = SQRT(1000 * Table Cardinality). Simply put, under this trace flag, the higher the number of rows in a table, the lower the threshold will become to trigger an update of the statistics. For example, if the trace flag is activated, update statistics will be triggered on a table with 1 billion rows when ~1 million changes occur. If the trace flag is not activated, then the same table with 1 billion records would need ~200 million changes before an update statistics is triggered. SQL Server 2016 automatically uses this improved algorithm.

Are statistics staled due to statistics set to not recompute?

  • When creating or altering an index, there is an option ‘WITH STATISTICS_NORECOMPUTE = { ON | OFF}’ which specifies whether distribution statistics are recomputed. The default is OFF which enables automatic statistics updating for the particular statistic objects associated with that index. If set to ON, out-of-date statistics are not automatically recomputed which may prevent the query optimiser from picking optimal execution plans.
  • When creating or updating statistics, there is an option ‘with NORECOMPUTE’ that when specified will complete the current statistics update but disable future updates, effectively disabling the automatic statistics update option for the specified statistic.

The below query can be used to check for stale statistics on a specific object:

SELECT
OBJECT_NAME([sp].[object_id]) AS "Table",
[sp].[stats_id] AS "Statistic ID",
[s].[name] AS "Statistic",
[sp].[last_updated] AS "Last Updated",
[sp].[rows],
[sp].[rows_sampled],
[sp].[unfiltered_rows],
[sp].[modification_counter] AS "Modifications"
FROM [sys].[stats] AS [s]
OUTER APPLY sys.dm_db_stats_properties ([s].[object_id],[s].[stats_id]) AS [sp]
WHERE [s].[object_id] = OBJECT_ID(N'schema.object');

Sampling: Often, sampling is good enough, but sometimes you might need to use full scan for your statistics generation in order to capture all interesting points in your data. Histograms are not loss-less, sometimes distinct values aren’t represented by steps or some frequencies are not accurate, they might be an average across very high and very low values and this can cause issues around estimates.

The automatic creation, or automatic updating of statistics triggered during query optimisation will use default sampling. Statistics created as a by-product of index creation will use a full scan. The default sample size is calculated using a non-linear algorithm where the sample size decreases as the table size gets larger. If the default sample does not create a good histogram, it is possible to manually update statistics with a higher sampling rate:

  • FULLSCAN: Scans every row to create the best histogram possible.
    UPDATE STATISTICS WITH FULLSCAN
  • SAMPLE PERCENT: Specify the percentage of data or number of rows to scan.
    UPDATE STATISTICS WITH SAMPLE 25 PERCENT
    UPDATE STATISTICS WITH SAMPLE 75 PERCENT
  • RESAMPLE: Inherits the previous sample size
    UPDATE STATISTICS WITH RESAMPLE

Table Variables & MSTVFs: Table variables do not have statistics objects associated with them and the query optimiser uses a hard-coded estimate of one row. Consider using alternatives like temporary tables and/or permanent staging tables.

The same applies to MSTVF’s where the query optimiser uses a hard-coded estimate of one row, although with the new SQL Server 2014 CE it now uses a fixed guess of 100 rows. Consider using inline table valued functions instead.

Data Type Conversions: Mismatched data type usage within join or filter predicates can affect cardinality estimates and the associated plan selection. To avoid cardinality estimate issues caused by data type mismatches, use identical data types in search and join conditions.

Resources

For anyone wanting to learn more on this topic, I would highly recommend Joe Sack’s ‘Common Query Tuning Problems and Solutions‘ PluralSight course which gave me the inspiration for this blog post.  Joe also has a great blog over at SQLSkills.  I’d also recommend reading anything by PaulWhite who helped me out with the quirky, complex algorithm used by the legacy CE in the local variable and LIKE operator example.

Posted in Statistics, CE & QO | Leave a comment