2014-03-02

SQL Server performance counters

This is some notes on a basic performance baseline. The baseline is created by some Performance Counters.
In general I am interested in these key resources
  • CPU
  • Memory
  • Storage
  • Network
These resources I look at from both Windows Server and SQL Server Database Engine.
A quick and broad collection can be done by using these Performance Counter objects at top level
  • Server
  • System
  • Processor
  • Memory
  • LogicalDisk
  • NetworkInterface
Use the <All instances> selection of instances to get all counters on each object. Define the collection with * (star) as this is less expensive.

The Collection can be automated using Logman. This I might look into later...

Data Collector

A User Defined Data Collector Set can be created in the Performance Monitor by a right-click on
Data Collector Sets \ User Defined
and select to create the data collector set manually.

Performance Counters

The sample interval is set to 5 seconds and the data are logged in a binary (.blg) file on a local drive. Set the maximum size of the blg-file to 1024 MB. This is to reduce the risk of the drive to be filled and to make it easier to handle for moving around and like. If you need logging for a longer period then set the collector to automatic restart. Keeping the file size low you are able to offload blg-files and analyse offline without stopping the collector.
To further reduce the risk of hurting the computer by the collector I usually put the blg-file(-s) on another drive than the system drive or drives holding active database files (mdf, ndf, ldf etc).

Never do a performance measure remote as you will create more preasure by the measure and then get a blurred result.
Also logging direct in a Performance Monitor graph might give you a performance penalty. Even if it is done local on the server.
Disk performance counters are on individual drives where user data (.mdf & .ndf), transaction log (.ldf) and tempdb data (.mdf & .ndf) are placed. If there are more drives installed for one of theese all drives should be included in the measure.
Windows Server
LogicalDisk \ % Disk Read Time (data; translog; tempdb; backup )
LogicalDisk \ % Disk Write Time (data; translog; tempdb; backup )
LogicalDisk \ Avg. Disk Queue Length ( data; translog; tempdb; backup )
LogicalDisk \ Avg. Disk Reads/sec ( data; translog; tempdb; backup ) - read throughput (IOPS)
LogicalDisk \ Avg. Disk Writes/sec ( data; translog; tempdb; backup ) - write throughput (IOPS)
LogicalDisk \ Avg. Disk sec/Read ( data; translog; tempdb; backup ) - latency on read
LogicalDisk \ Avg. Disk sec/Write ( data; translog; tempdb; backup ) - latency on write
LogicalDisk \ % Disk Idle Time (translog, tempdb, data, backup)
Logical Disk \ Split IO/sec (data, tempdb, backup) - Fragmentation indicator when one IO operration is split in multiple disk requests.
Memory \ Cache Bytes
Memory \ Available Mbytes
Memory \ Page Faults/sec
Memory \ Pages/sec
Memory \ Standby Cache Core Bytes
Memory \ Standby Cache Normal Priority Bytes
Memory \ Standby Cache Reserve Bytes
PhysicalDisk \ Avg. Disk Queue Length ( data; translog; tempdb; backup )
PhysicalDisk \ Avg. Disk sec/Read ( data; translog; tempdb; backup )
PhysicalDisk \ Avg. Disk sec/Write ( data; translog; tempdb; backup )
PhysicalDisk \ Disk Reads/sec ( data; translog; tempdb; backup )
PhysicalDisk \ Disk Writes/sec ( data; translog; tempdb; backup )
Processor \ % Privileged Time (All instances)
Processor \ % Processor Time (All instances)
NUMA \ total %
Network Interface \ Output Queue Length ( <physical interface> )
Paging File(_Total) \ % Usage
System \ Context Switches/sec
System \ Processor Queue Length

If you are looking for connectivity issues then you could add these counters:

TCPv4\Connections Established
TCPv4\Connection Failures
TCPv4\Connections Reset

Please take a look at the Microsoft documentation (link) for a description on the three counters.

Please notice that when you select <All instances> on Processor counters, you will get a total and not numbers on each core. You have to select the cores of the machine to get individual numbers on the cores.
This is also the case with the disk drives where each drive must be selected for collection.

Disk counters are nicely described in the article "Windows Performance Monitor Disk Counters Explained" (link).
SQL Server Database Engine
Also use the performance counters for Windows Server mentioned above.

Process(sqlservr) \ % Processor Time
Process(sqlservr) \ Page Faults/sec
Process(SQLAGENT) \ % Processor Time
Process(SQLAGENT) \ Page Faults/sec
MSSQL : Access Methods \ Forwarded Records/sec
MSSQL : Access Methods \ Full Scans/sec
MSSQL : Access Methods \ Index Searches/sec
MSSQL : Access Methods \ Page Splits/sec
MSSQL : Access Methods \ Table Lock Escalations/sec
MSSQL : Buffer Manager \ Buffer cache hit ratio
MSSQL : Buffer Manager \ Page life expectancy
MSSQL : Buffer Manager \ Page reads/sec
MSSQL : Buffer Manager \ Page writes/sec
MSSQL : General Statistics \ User Connections
MSSQL : Latches \ Average Latch Wait Time (ms)
MSSQL : Locks(_Total) \ Average Wait Time (ms)
MSSQL : Locks(_Total) \ Lock Timeouts/sec
MSSQL : Locks(_Total) \ Number of Deadlocks/sec
MSSQL : Locks(_Total) \ Locks Waits/sec
MSSQL : Memory Manager \ Total Server Memory (KB)
MSSQL : Memory Manager \ Target Server Memory (KB)
MSSQL : Plan Cache \ Cache Hit Ratio (_Total)
MSSQL : SQL Errors(_Total) \ *
MSSQL : SQL Statistics \ Batch Requests/sec
MSSQL : SQL Statistics \ SQL Compilations/sec
MSSQL : SQL Statistics \ SQL Re-Compilations/sec
MSSQL : Wait Statistics(Average wait time (ms)) \ Network IO waits
MSSQL : Wait Statistics(Average wait time (ms)) \ Page IO latch waits
SQL Server Integration Services
Also use the performance counters for Windows Server and SQL Server Database Engine mentioned above.

Process(MsDtsSrvr) \ % Processor Time
Process(MsDtsSrvr) \ Page Faults/sec
SQLServer : Databases \ Bulk Copy Rows/sec \ DW database (KB)
SQLServer : Databases \ Bulk Copy Throughput/sec \ DW database (KB)
SQLServer : SSIS Pipeline 12.0 \ Buffer Memory
SQLServer : SSIS Pipeline 12.0 \ Buffters in use
SQLServer : SSIS Pipeline 12.0 \ Buffers spooled
SQLServer : SSIS Pipeline 12.0 \ Rows read
SQLServer : SSIS Pipeline 12.0 \ Rows written
SQL Server Master Data Services
Also use the performance counters for Windows Server and SQL Server Database Engine mentioned above.

Process(???) \ % Processor Time
Process(???) \ Page Faults/sec
SQL Server Analysis Services
Also use the performance counters for Windows Server mentioned above.

Process(msmdsrv) \ % Processor Time
Process(msmdsrv) \ Page Faults/sec
Process(msmdsrv) \ Private Bytes
MSAS12 : Proc Aggregations \ Current Partitions
MSAS12 : Proc Aggregations \ Rows created/sec
MSAS12 : Proc Aggregations \ Temp file bytes written/sec
MSAS12 : Proc Indexes \ Current partitions
MSAS12 : Proc Indexes \ Rows/sec
MSAS12 : Processing \ Rows read/sec
MSAS12 : Processing \ Rows written/sec
MSAS12 : Threads \ Processing pool busy I/O job threads
MSAS12 : Threads \ Processing pool busy non-I/O threads
MSAS12 : Threads \ Processing pool idle I/O job threads
MSAS12 : Threads \ Processing pool idle non-I/O threads
MSAS12 : Threads \ Processing pool job queue length

Thomas Kejser, John Sirmon & Denny Lee: „Microsoft SQL Server 2008 R2 Analysis Services Operations Guide“.
SQL Server Reporting Services
Also use the performance counters for Windows Server and SQL Server Database Engine mentioned above.

Process(ReportingServicesService) \ % Processor Time
Process(ReportingServicesService) \ Page Faults/sec

Analysis

A quick view on one baseline measure series is easy done with a User Defined Report in Performance Monitor.
When you want to do some statistics a spreadsheet is a common tool.
To export the data from Performance Monitor to a spreadsheet you open the logged data in Performance Monitor and right-click in the graph area. Then you click Save Data As and save the data as tabulator seperated values (.tsv).
To import the data in a Excel spreadsheet you create a new sheet and select the tab Data. Click the item From Text in the panel, and do the proper selections. In Denmark where I live we use comma as decimal seperator and dot as thusand seperator. This is one of the details that could trick you. The first time you do a import you might not get it right, but I am sure you will get it right after a few tries.

Another way to post-process PerfMon files is to use Relog. This Tool can extract data, combine or split PerfMon files and export to database or other file formats. Using this Tool can enable automation on handling more complex or long-running PerfMon runs.

To analyse PerfMon data together with Windows Logs and other data you can use Performance Analysis of Logs (PAL), which is a free and open tool. It can generate some rather neat reports with impressive and usefull graphs. But be aware that it tend to aggregate numbers and then it can hide performance issues as these usually are in spike values and not in aggregated values.

When you have more sheets from several measure series you can do some delta compares and produce some nice diagrams for management. Most diagrams I create as X-Y diagrams, but the type depends on what you are looking for or trying to show.

Some basic statistics that are usefull are
  • Average = AVERAGE()
  • Median = MEDIAN()
  • Most Frequent Number = MODE.SNGL()
  • Standard Deviation = STDEV.S()
  • Min = MIN()
  • Max = MAX()
  • Percentile = PERCENTILE.EXC(). It is different from one organization to another if 95- or 98-percentile is preferred. Or another.
    Sometimes you have to look at the other end of the data. Then you should look at 5- or 2-percentile matching the high percentile.
    I recommend you to use the same (high) percentile as the rest of the organization.
The calculation I usually add in rows at the top of the sheet.

Discussion

Be aware when
(To Be Continued...)

Reference

Jonathan Kehayias: „The Accidental DBA (Day 21 of 30): Essential PerfMon counters“.
Microsoft Core Team blog: "Measuring Disk Latency with Windows Performance Monitor".
SmarterTools blog: "Windows Perfmon, Disk I/O and Possible Disk Issues".

History

2014-02-03  Blog post created with initial list of performance counters.
2015-08-06  Performance counters added: Table Lock Escalations/sec and Context Switches/sec.
2015-08-12  Performance counter on Logical Disk % Disk Time added and backup disk for all disk counters.
2017-05-15  Notes on Relog, Logman and PAL added.
2019-10-29  Counter on Disk Idle Time added on LogicalDisk.
2023-08-21  Percentile analysis added.

No comments: