A few weeks ago, while at the incredible SQL Saturday Madison, I had a great discussion with a few of the other speakers about SQL Server VM-level memory counters and their (lack of) validity on the actual memory usage properties of the SQL Server engine. A while back, I blogged about this but I want to revisit it here.
VM-level memory counters simply cannot be trusted for SQL Server workloads at this time. Period. VM memory counters are not application aware, and as a result, are not aware of the way that in-guest applications use memory.
Active memory counters at the VM layer measure the amount of memory that is touched in each sample interval. VM administrators and even some VM-level performance tools only look at this counter and tell you that it is perfectly fine to reduce the amount of memory assigned to these VMs. Nothing could be further from the truth. For example, this is a screenshot from vCenter Operations Manager on a dedicated SQL Server VM host cluster (which was perfectly fine, by the way).
SQL Server uses memory as a working set I/O buffer, and memory blocks in the SQL Server buffer pool are most commonly used as a read cache. Memory blocks that are repeatedly read only show up in the active counters if that block has been read from or written to in the last sample interval, which is usually 20 seconds. Therefore, SQL Servers that have an appropriate (or too high, but that’s a different story) amount of memory assigned to the VM will have most of the commonly accessed data in memory, but those blocks are rarely accessed in their entirety over that short of a sample window. The VM active memory counters will show a very low active value, but this SQL Server could actually be very busy.
Don’t take my word for it. Even VMware’s own documentation states this fact. Check the document “SQL Server on VMware Best Practices Guide” at VMware.com, Section 4.3.4.1, page 23.
The example of this in practice is a quick profile of the memory counter when I was loading some performance data into a database. The Active memory counter in my VMware lab grew and then shrunk.
But… I was not only NOT done loading data, I was actively running performance reports, all of which were constantly scanning all of the data that I had just loaded.
This is why I reserve all guest RAM for production SQL Server VM workloads as well. When under memory pressure, the host is going to look for the VMs with the least amount of active memory ‘usage’, and because of this discrepancy in view points, your SQL Servers are now prime targets for memory reclamation processes, which are guaranteed slow your SQL Server VMs down.
So… VM admins, do not use this counter as a means to reclaim RAM from these SQL Server VMs. It is not only inaccurate, it’s dangerous because it can result is destroying the performance of that SQL Server VM.
Instead, work with your DBA to fetch the memory usage information from both SQL Server and the Windows operating system inside the VM itself. WMI counters can show you signs of memory pressure inside the VM, and scripts such as Glenn Berry’s SQL Server diagnostic scripts contain methods to look inside SQL Server and see how memory is being managed.
These in-guest counters are the sources of your truth, not the VM-layer counters.