John's Technical Blog: SQL Server Detailed Performance Tips

1. SQL Server tracks incidents in the form of wait statistics (see the CAT white paper) You can use this. E.g. page_IO_latch issue, you can be pretty sure that your bottleneck has to do with I/O. And, if you see many LCK_XX type waits occur, you're seeing a blocking issue. In this case, you would spend your time more wisely by investigating the cause of the blockage instead of by looking at the I/O issues.

2. Locate I/O bottlenecks

Check whether you see high page_IO_latch waits or log_write waits in your wait statistics.
Use the DMF sys.dm_io_virtual_file_stats() to locate any areas in which you have excessive physical I/O or excessive stalls on that I/O. These issues can occur at the database level or even at the file level.
Use the trusty PerfMon counters. At a minimum, use the Avg. Disk sec/Read and Avg. Disk sec/Write counters to see the latency of the reads and writes. On an OLTP system, you would, ideally, want to see log file latency to be just a few ms and data file latency to be less than 10ms. Remember that these are ideal values. Your system might tolerate larger latency and still be fine. Also keep in mind that many times when you find that the storage subsystem can't keep up with the current demand, the cause might not be an I/O bottleneck at all. It might, instead, be pooled.

When you find that you have many physical I/O bottlenecks occurring, your first instinct should be to find the queries that are causing all the physical I/O, and then try to tune them before you add more hardware. One performance aspect that you should never ignore is high latency for log writes. If you start to hold up the writing to the log file, all further DML operations can quickly become impeded, and they’ll remain so until you alleviate the bottleneck. High latency in log writes is a sure way to hinder performance in the database. For a better understanding of storage issues and I/O subsystems in a SQL Server environment, I recommend that you read the Microsoft articles "Storage Top 10 Practices" and "SQL Server Best Practices Article."

3. Identify slow running queries using the following:

SELECT
COALESCE(DB_NAME(t.[dbid]),'Unknown') AS [DB Name],
ecp.objtype AS [Object Type],
t.[text] AS [Adhoc Batch or Object Call],
        SUBSTRING(t.[text], (qs.[statement_start_offset]/2) + 1,
        ((CASE qs.[statement_end_offset]
            WHEN -1 THEN DATALENGTH(t.[text]) ELSE qs.[statement_end_offset] END
                    - qs.[statement_start_offset])/2) + 1) AS [Executed Statement]
        , qs.[execution_count] AS [Counts]
        , qs.[total_worker_time] AS [Total Worker Time], (qs.[total_worker_time] /
qs.[execution_count]) AS [Avg Worker Time]
        , qs.[total_physical_reads] AS [Total Physical Reads],
(qs.[total_physical_reads] / qs.[execution_count]) AS [Avg Physical Reads]
        , qs.[total_logical_writes] AS [Total Logical Writes],
(qs.[total_logical_writes] / qs.[execution_count]) AS [Avg Logical Writes]
        , qs.[total_logical_reads] AS [Total Logical Reads],
(qs.[total_logical_reads] / qs.[execution_count]) AS [Avg Logical Reads]
        , qs.[total_clr_time] AS [Total CLR Time], (qs.[total_clr_time] /
qs.[execution_count]) AS [Avg CLR Time]
        , qs.[total_elapsed_time] AS [Total Elapsed Time], (qs.[total_elapsed_time]
/ qs.[execution_count]) AS [Avg Elapsed Time]
        , qs.[last_execution_time] AS [Last Exec Time], qs.[creation_time] AS [Creation Time]
FROM sys.dm_exec_query_stats AS qs
    JOIN sys.dm_exec_cached_plans ecp ON qs.plan_handle = ecp.plan_handle
        CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS t
--    ORDER BY [Total Worker Time] DESC
--    ORDER BY [Total Physical Reads] DESC
--    ORDER BY [Total Logical Writes] DESC
--    ORDER BY [Total Logical Reads] DESC
--    ORDER BY [Total CLR Time] DESC
--    ORDER BY [Total Elapsed Time] DESC
        ORDER BY [Counts] DESC

The sys.dm_exec_query_stats DMV query contains aggregate performance statistics that are associated with each of the cached query plans in the SQL Server instance. This query easily enables you to order the results in several ways, depending on what type of resource usage you want to concentrate on. The key is to weigh the number of times that the statement was executed versus the average resource usage to better determine which statements would have the largest impact on performance if they were optimized better.

I will caution you not to put too much weight on the total elapsed time or on the overall duration of the statement because other factors, such as blocking, can influence the overall duration. But by using this query, you should be able to quickly identify the top offending statements in your system, and then prioritise the statements so that you can tune them as efficiently as possible.

4. Plan to reuse

If possible try and reuse query plans in queries. The following query can help identify:
SELECT b.[cacheobjtype], b.[objtype], b.[usecounts], a.[dbid], a.[objectid], b.[size_in_bytes], a.[text]
FROM sys.dm_exec_cached_plans as b
CROSS APPLY sys.dm_exec_sql_text (b.[plan_handle]) AS a
ORDER BY [usecounts] DESC

This query sorts all of the plans in the procedure cache in descending order of use counts. The use counts column is incremented every time that a plan is reused, and it lets us easily identify which plans have reuse. You can also order the plans by the text column to determine which statements have many similar entries that have a use count of one. This value indicates statements that you call often but that don't reuse the existing plan. After you've identified these statements, you can prioritise which parts of the application you must work on first to get the biggest bang for your buck in terms of plan reuse. Don’t underestimate how seriously a lack of plan reuse can affect performance as the transaction rate increases.

5. Monitor Index Usage

The sys.dm_db_index_operational_stats() DMF is a widely underutilized source of information. It can provide you valuable information about your index usage. By using this DMF, you can decipher all kinds of information to determine not only which indexes are used but also how they're used. For example, are you scanning the index or are you using seeks? This DMF will tell you. It will even tell you things such as the time elapsed for processes, such as latching and locking. Have you ever asked yourself, "Is this index being used?" or, "Which indexes am I using for a given table?" We've all asked these questions at one time or another. So you can use this DMF to get a handle on your index usage.

6. Separate Data and Log Files

One of the most basic but often disregarded rules for good performance is to separate the data and the log files onto separate physical drive arrays whenever possible. This is especially true when you use DAS, but it also applies to a SAN. The key principle here is to separate the mostly random access of the data files from the sequential access that occurs by writing to the transaction logs. One aspect familiar to a SAN environment is that even though you're presented with different drive letters or LUNs, you can't be sure that these represent different physical drive arrays. Often, these apparent drives are carved from the same larger disk array, and this will defeat the intent of separating them in the first place. So make sure that you know what you're really getting when you ask for your storage on a SAN. You'll be amazed at how much difference this can make as the volume of your transactions increases.

7. Use separate Staging Databases

People frequently import data into a table that is real but temporary in nature. Then, they proceed to manipulate the data by performing extensive updates to prepare it for one of the final production tables. The problem with this is that most production databases are in full recovery mode. This means that practically all activity is fully logged in the transaction log, and these operations can be expensive.

By using a separate staging database within the SQL Server instance that's in simple recovery mode, you can achieve several performance benefits. One is that you can often get a minimally logged load instead of a fully logged load. This can dramatically speed up the import process to begin with. With a minimally logged load, the amount of data that's logged to the transaction log is very small in relation to what it would be for a fully logged load, so you have a much lighter burden on the server as a whole.

8. Pay attention to log files and reduce auto grow

Too many people underestimate the importance of the transaction log file in relation to performance. The most common mistake people make is not leaving enough free space in the transaction log file for your normal operations to occur without forcing an auto-grow operation. Growing the log file can be extremely time-consuming. It can force any DML operations to wait until the growth is complete before the operation can proceed.

9. Change max Memory Limit

There have been improvements in the 64-bit versions of SQL Server regarding memory allocation and sharing with the OS and other applications, but I've yet to see where leaving the MAX Memory setting at the default is ideal in real life. Even though your host server might be dedicated to SQL Server, there are always other applications or parts of the OS that require memory from time to time or even all the time. Do yourself a favor and set the MAX memory setting to at least 1 to 2GB less than the total amount of memory on the server if this is a single instance. If you have multiple instances, or if you're in a multi-instance cluster, you also have to account for that. How much memory you leave depends on what else you have running and how much memory it requires to operate efficiently. But you can adjust this value up or down over time as you see fit.

10. Don't shrink Data Files
Shrinking data files has been a general bad practice for a long time, and it can really impact performance in one of three ways. The shrinking can be very painful to begin with. But because it can cause a lot of fragmentation, your subsequent queries might suffer as a result. And if you don’t have Instant File Initialization turned on, the resultant growth later can also hinder performance and potentially cause timeouts. Although there are times when shrinking a file might be necessary, make sure that you know the impact before you try it. For more details, see Paul Randal's blog post "Why you should not shrink your data files."

11. Create a customised maintenance plan

Indexes become fragmented over time, which leads to performance degradation. Statistics become out-of-date, which leads to bad query plans and poor performance. I/O subsystems can get corrupted, and there's the ever-present need for backups.

You can tackle all these problems by having a comprehensive maintenance plan tailored for your databases. A customized plan is far better than a generic plan that doesn't adequately address your needs.

John's Technical Blog

Monday, 29 October 2018

SQL Server Detailed Performance Tips

1 comment:

About Me

Blog Archive