Saturday, January 12, 2008

SQL Server Resource Governor - Understanding Pool Usage

In an earlier posting, I discussed some of the basics of configuring the SQL Server 2008 Resource Governor. Since that posting, I've been spending a fair amount of time digging into the internals of the Resource Governor and understanding exactly how it works. It's been an interesting trip considering the limited amount of information available on the subject (let's hope that by RTM they improve this!), but I think I have finally uncovered enough information to make a useful post.

Resource Governor Architecture

As discussed in my earlier post, the Resource Governor is made up of Resource Groups and Resource Pools. When a user (or process) connects to SQL Server 2008, it is "classified" (either through a built-in classifier or a user-defined classifier function) and then assigned to a resource group based on that classification. One or more resource groups are then assigned to specific resource pools. The architecture looks like this:

Resource Governor Functional Components

There are 2 built-in resource groups, the Internal Group and the Default Group. The Internal group is used to execute certain system functions (such as the Lazywriter and Checkpoint Processes) and the Default group is used when the session does not have a defined classification. (For example, if you follow the steps in my earlier post and create a classifier function to assign the SQL Management Studio to a limited resource pool, all sessions that originate from an application other than management studio would be assigned to the Default group)

Resource Governor Misconceptions

One of the things that I believe will be a contentious issue when SQL Server 2008 hits the street is the way in which the Resource Governor affects connections. For example, if I configure a resource pool to limit the maximum CPU time of any process in the pool, I expect that the governor will honor that and stop anything that exceeds the value. In reality, what happens is an event is fired and the application developer is expected to capture the event and react accordingly. For example, say that you create a resource pool and you set the REQUEST_MAX_CPU_TIME_SEC = 5. What you would expect to happen is when a query reaches 5 seconds of CPU time it becomes terminated. In reality what happens is once the query exceeds the CPU time by 5 seconds (that is 5 seconds above the configured value, so in my example here it would be a minimum of 10 seconds of CPU time) a CPU Threshold Exceeded event is fired. You can look for these events using the SQL trace events that are part of the Performance Monitor tool.

Monitoring Resource Pools

You can use the Performance Monitor (the example that I am showing here is from Windows Server 2008 so it may be different than your environment) to monitor Resource Pool Statistics. Start Perfmon and add the counters from the SQL Server: Resource Pool Stats collection for the pool that you want to monitor as shown here:

image

Once you start the trace, and begin to see activity within the classified connection, you will see a screen that looks similar to the following:

image

You can use this data to determine how well the pool is working and get a good baseline for how the applications that use that pool are consuming resources.

6 comments:

Nils Loeber said...

Hi Ted, your articles on Katmai are quite helpful to me, so thank you!

Maybe you or some of the other readers of your blog can help me with a little problem I'm having with the Resource Governor. Maybe I'm just having a misconception of what it's supposed to do, but anyway, here goes:

I have a Resource Pool called OStress on ServerA, with one Workload Group of the same name. I limited the pool to 20% max CPU. Also, I gave the default pool a setting of 80% min CPU, so I figured that workloads in the OStress pool should not be able to get more than 20% CPU, even if there's not much else going on on the system.

But it doesn't seem to be working that way. Now, when I generate load using OStress.exe on WorkstationB, CPU usage goes up all the way to 100% on ServerA.

Using the SQLServer:Resource Pool and SQLServer:Workload Group performance counters I verified that my classification function works (it's basically modeled after your example). So, the Governor is indeed active, but it just doesn't seem to do its job.

Any ideas? Have you experiences this behaviour too? Thank you in advance!

Ted Malone said...

Hi Nils;

One thing to remember (and I don't think I did a very good job in explaining this in my post) is that the resource governor will only limit resources if they "need" to be limited. In other words, if the server is not constrained to begin with, the governor will not kick in, even if everything is configured correctly.

In your case, one way to test this would be to stress the server using a connection outside of the classification you have setup, and then run your test within the classification. You should see the governor kick in.

Hope this makes sense.

Nils Loeber said...

Hi Ted, many thanks for the quick reply :-)

If I understand you correctly, what you are suggesting is what I am already doing.

I have an OSTRESS process on a different system which is correctly classified into the OStress workload group. Additionally, I have Management Studio window open and run a query there, which should be handled by the default pool (shouldn't it?).

So there are two workloads competing for resources, ergo the resource governor should kick in, if I understand correctly.

Am I getting this right? If this is expected behaviour, then how do I get resource governor to kick in?

Once again, thanks for your suggestions!

Ted Malone said...

Hi Nils;

Sorry to take so long to respond this time...

Anyway, if I understand your situation correctly, you're taking up all resources with the one workload that is classified, and then want to see that workload limited when you connect with Mgmt Studio using the default classification, correct?

If so, try running a resource-intensive query in mgmt studio first, and then run your OSTRESS workload. Since the server will be under stress when you start that workload, it should be classified into the lesser resource pool. Use Profiler to determine if the classification has worked.. Also, if you're using the REQUEST_MAX_CPU_TIME_SEC and expecting a long-running query to stop, realize that it won't (at least in this CTP)

I am going to post some additional Resource Governor configuration steps and results in an upcoming entry. (When I get time back in the office to do so!)

Anonymous said...

I found this site using [url=http://google.com]google.com[/url] And i want to thank you for your work. You have done really very good site. Great work, great site! Thank you!

Sorry for offtopic

Anonymous said...

Who knows where to download XRumer 5.0 Palladium?
Help, please. All recommend this program to effectively advertise on the Internet, this is the best program!