Thursday, June 20, 2019

SQL interceptors

SQL interceptors are a way to apply filtering by tenant for securing multi-tenant applications. Here is a good read on it http://xabikos.com/2014/11/18/Create-a-multitenant-application-with-Entity-Framework-Code-First-Part-2/

However, you need to be careful with it since they modify your query at runtime. 
Specifically, DbExpressionBuilder.Bind(databaseExpression) in the interceptor causes a random variable to be created which creates a random query text for the same query on every reinitialize which is generally a recycle on the VM where the application is running.

This puts unnecessary unnecessary pressure on QDS (Query Data Store).
Also, if you force a query plan for a specific query hash, a new query hash will be generated the next time so the forced plan will not work.

To fix this, make sure to bind it with a specific variable name like DbExpressionBuilder.BindAs(databaseExpression, "Filter")

Troubleshooting SQL Azure issues

Query Performance Insights

The most common place to look for SQL Azure issues is Query Performance Insights on Azure portal https://docs.microsoft.com/en-us/azure/sql-database/sql-database-query-performance

Troubleshoot SQL query timeouts

There are various Data Management Views (DMV’s) created by SQL Azure team https://docs.microsoft.com/en-us/azure/sql-database/sql-database-monitoring-with-dmvs
I tweaked them a little below:


Find query hashes which are timing out


Look for Aborted_Execution_Count column.
-- Top 15 CPU consuming queries by query hash
-- note that a query  hash can have many query id if not parameterized or not parameterized properly
-- it grabs a sample query text by min
WITH AggregatedCPU AS (SELECT q.query_hash, p.query_plan_hash, SUM(count_executions * avg_cpu_time / 1000.0) AS total_cpu_millisec, SUM(count_executions * avg_cpu_time / 1000.0)/ SUM(count_executions) AS avg_cpu_millisec, MAX(rs.max_cpu_time / 1000.00) AS max_cpu_millisec, MAX(max_logical_io_reads) max_logical_reads, COUNT(DISTINCT p.plan_id) AS number_of_distinct_plans, COUNT(DISTINCT p.query_id) AS number_of_distinct_query_ids, SUM(CASE WHEN rs.execution_type_desc='Aborted' THEN count_executions ELSE 0 END) AS Aborted_Execution_Count, SUM(CASE WHEN rs.execution_type_desc='Regular' THEN count_executions ELSE 0 END) AS Regular_Execution_Count, SUM(CASE WHEN rs.execution_type_desc='Exception' THEN count_executions ELSE 0 END) AS Exception_Execution_Count, SUM(count_executions) AS total_executions, MIN(qt.query_sql_text) AS sampled_query_text
                       FROM sys.query_store_query_text AS qt
                            JOIN sys.query_store_query AS q ON qt.query_text_id=q.query_text_id
                            JOIN sys.query_store_plan AS p ON q.query_id=p.query_id
                            JOIN sys.query_store_runtime_stats AS rs ON rs.plan_id=p.plan_id
                            JOIN sys.query_store_runtime_stats_interval AS rsi ON rsi.runtime_stats_interval_id=rs.runtime_stats_interval_id
                       WHERE rs.execution_type_desc IN ('Regular', 'Aborted', 'Exception')AND rsi.start_time>=DATEADD(HOUR, -24, GETUTCDATE())
                       GROUP BY q.query_hash, p.query_plan_hash), OrderedCPU AS (SELECT query_hash, query_plan_hash, total_cpu_millisec, avg_cpu_millisec, max_cpu_millisec, max_logical_reads, number_of_distinct_plans, number_of_distinct_query_ids, total_executions, Aborted_Execution_Count, Regular_Execution_Count, Exception_Execution_Count, sampled_query_text, ROW_NUMBER() OVER (ORDER BY total_cpu_millisec DESC, query_hash ASC) AS RN
                                                              FROM AggregatedCPU)
SELECT OD.query_hash, OD.query_plan_hash, OD.total_cpu_millisec, OD.avg_cpu_millisec, OD.max_cpu_millisec, OD.max_logical_reads, OD.number_of_distinct_plans, OD.number_of_distinct_query_ids, OD.total_executions, OD.Aborted_Execution_Count, OD.Regular_Execution_Count, OD.Exception_Execution_Count, OD.sampled_query_text, OD.RN
FROM OrderedCPU AS OD
WHERE OD.RN<=15
ORDER BY total_cpu_millisec DESC;


Compare query plans for timing out query hash


Look for avg_cpu_millisec column. The query plan with lower value is better. If there is only one query plan, then you need to change your query to use the right indexes.
-- Top 15 CPU consuming queries by query hash
-- note that a query  hash can have many query id if not parameterized or not parameterized properly
-- it grabs a sample query text by min
WITH AggregatedCPU AS (SELECT q.query_hash, p.query_plan_hash, SUM(count_executions * avg_cpu_time / 1000.0) AS total_cpu_millisec, SUM(count_executions * avg_cpu_time / 1000.0)/ SUM(count_executions) AS avg_cpu_millisec, MAX(rs.max_cpu_time / 1000.00) AS max_cpu_millisec, MAX(max_logical_io_reads) max_logical_reads, COUNT(DISTINCT p.plan_id) AS number_of_distinct_plans, COUNT(DISTINCT p.query_id) AS number_of_distinct_query_ids, SUM(CASE WHEN rs.execution_type_desc='Aborted' THEN count_executions ELSE 0 END) AS Aborted_Execution_Count, SUM(CASE WHEN rs.execution_type_desc='Regular' THEN count_executions ELSE 0 END) AS Regular_Execution_Count, SUM(CASE WHEN rs.execution_type_desc='Exception' THEN count_executions ELSE 0 END) AS Exception_Execution_Count, SUM(count_executions) AS total_executions, MIN(qt.query_sql_text) AS sampled_query_text
                       FROM sys.query_store_query_text AS qt
                            JOIN sys.query_store_query AS q ON qt.query_text_id=q.query_text_id
                            JOIN sys.query_store_plan AS p ON q.query_id=p.query_id
                            JOIN sys.query_store_runtime_stats AS rs ON rs.plan_id=p.plan_id
                            JOIN sys.query_store_runtime_stats_interval AS rsi ON rsi.runtime_stats_interval_id=rs.runtime_stats_interval_id
                       WHERE rs.execution_type_desc IN ('Regular', 'Aborted', 'Exception')AND rsi.start_time>=DATEADD(HOUR, -24, GETUTCDATE())
                       GROUP BY q.query_hash, p.query_plan_hash), OrderedCPU AS (SELECT query_hash, query_plan_hash, total_cpu_millisec, avg_cpu_millisec, max_cpu_millisec, max_logical_reads, number_of_distinct_plans, number_of_distinct_query_ids, total_executions, Aborted_Execution_Count, Regular_Execution_Count, Exception_Execution_Count, sampled_query_text, ROW_NUMBER() OVER (ORDER BY total_cpu_millisec DESC, query_hash ASC) AS RN
                                                              FROM AggregatedCPU)
SELECT OD.query_hash, OD.query_plan_hash, OD.total_cpu_millisec, OD.avg_cpu_millisec, OD.max_cpu_millisec, OD.max_logical_reads, OD.number_of_distinct_plans, OD.number_of_distinct_query_ids, OD.total_executions, OD.Aborted_Execution_Count, OD.Regular_Execution_Count, OD.Exception_Execution_Count, OD.sampled_query_text, OD.RN
FROM OrderedCPU AS OD
WHERE OD.query_hash= query_hash
ORDER BY total_cpu_millisec DESC;


Analyze the query plan


Once you have the query plan hash for which the query is timing out, you can view and analyze it by running this query
select qsq.query_hash
 ,qsp.query_plan_hash
 ,qsq.query_id
 ,qsp.plan_id
 ,qsq.query_text_id
 ,qsq.is_internal_query
 ,qsrts.first_execution_time
 ,qsrts.last_execution_time
 ,qsqt.query_sql_text
 ,CAST(qsp.query_plan AS XML)
 ,qsrts.count_executions
from
 [sys].[query_store_runtime_stats] qsrts
 inner join [sys].[query_store_plan] qsp on qsrts.plan_id = qsp.plan_id
 inner join [sys].[query_store_query] qsq on qsp.query_id = qsq.query_id
 inner join [sys].[query_store_query_text] qsqt on qsq.query_text_id = qsqt.query_text_id
where  qsp.query_plan_hash = query_plan_hash


Force a query plan


If you have multiple query plans for a query, where one query plan is performing far better than the other, you can force that query plan. You need to analyze both query plans before you decide a force a specific plan so that it will work for both large and small volume of data.
https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-query-store-force-plan-transact-sql?view=sql-server-2017

select * from sys.query_store_plan where is_forced_plan=1

EXEC sp_query_store_force_plan query_id, plan_id;