Ordinary Oracle Joe

Just an ordinary DBA's thoughts

Baselines and Adaptive Thresholds on Production Systems #1

Posted by oakesgr on August 6, 2009

Ever since I attended the 10g performance tuning course I’ve wanted to use the Baselines and Adaptive Thresholds in anger. From some brief discussions I’ve had it seems like this feature has been under utilised, certainly within my organisation.

After a quick chat with the application support team I work with, we decided to use a derivative settlements database as our guinea pig.

My early (probably naive) hope for the use of this feature is something along the lines of

  1. Set a rolling 7 day window baseline
  2. Set the adaptive thresholds so that anything above normal usage + X% (X still to be decided!) will generate alerts
  3. Trap the alerts so that they not only get raised via our generic alerting system (this already takes them from Grid Control), but also get them to post to one of our internal chat channels.

Point 3, is quite important (in my mind) due to the way the DBA structure is setup in my company. We have a team in Singapore that works around the clock monitoring the generic alerts system. We also have regional ‘core’ teams that handle general day to day support issues. Finally, we have ‘aligned’ dba teams that work more closely with the business. I’m one of those.

So basically I want these new alerts to be raised to the aligned dba’s notice (as well as the team in Singapore), with the theory being that we will be noted of abnormal activity on the database ahead of the support team, thereby giving us a little more time to examine any potential issues before they turn into high pressue ‘fix this NOW!’ type situations.

When creating the 7 day rolling baseline there are a number of options to consider. As the usage window of this database extends outside of EMEA (covering both US and APAC time zone) there seems little point in using the Day and Night option. Therefore I’m just using Weekday and Weekends.

baseline options

The other usage abnomaly to note is a quarterly event known as the CDS roll. For the uninitiated, CDS stands for Credit Default Swap (http://en.wikipedia.org/wiki/Credit_default_swap) and is a phenomenon across the banking sector. During this period of a few days each quarter, the usage profile on this database (and many others) can double, triple or just go off the scale and has caused any amount of havoc in the past.  My plan during this time is to create a static baseline during the next event. I will then switch to this baseline during all subsequent events. This isn’t due for a while though so I’ll continue with the rolling window option for the moment.

The rolling baseline is now in place, and I’ve decided to start with thresholds of 120% and 150% for warning and critical alerts respectively. This is simply a kick off point and will probably be amended as we get more experience. I also expect these thresholds to differ between each database we implement baselines on.

I intend to post updates to this subject on a weekly or biweekly basis as I think it will take a while to get a mature solution in place.

Now, I’m off to talk to a UNIX SA that I know has already done something along the lines of alerting through chat channels.

 

Edit : Doug Burns wrote a great post about Baselines and Adaptive Thresholds, so in an attempt not to repeat everthing he wrote I’ll try and angle this more as a ‘my experiences’ type post as opposed to ‘this is how you do it’ post.

http://oracledoug.com/serendipity/index.php?/archives/1496-Adaptive-Thresholds-in-10g-Part-1-Metric-Baselines.html

About these ads

6 Responses to “Baselines and Adaptive Thresholds on Production Systems #1”

  1. Doug Burns said

    Easy, tiger! *I* do the baselines and adaptive thresholds stuff round here.

    Honestly, you encourage people and look what happens …

    • oakesgr said

      I don’t think you have the permissions Doug, but if you know a friendly dba he might be able to help you out ;-)

  2. Doug Burns said

    It’s alright, I need to do them at home anyway if there’s any chance of writing the 11g Adaptive Threshold posts.

    Oh, are you still only using 10g for these?

    Then again, Production system and all and you’ll have enough workload to get the best from them.

  3. mwidlake said

    I’ll be interested in how you get on with chosing an alerting level – getting that balance of being warned when things start to become interesting, as opposed to swamped with constant alerts or never hearing a thing!

    • oakesgr said

      Hi Martin,

      you’re absolutely right. It’s all about getting that balance right. That’s what makes me think that this may be a feature that requires a database by database solution, as opposed to one size fits all.

      However, it’s early days for me on this feature so I’m trying to hold back any preconceived ideas at the moment.

      Cheers
      Graham

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

%d bloggers like this: