Web Tortoise

2012-Nov-28

Configuring “Site is Slow” Performance Alerts

Response:

Hello!  This #WebTortoise post was written 2012-NOV-28 at 10:55 AM ET (about #WebTortoise).

Question & Answer

Question: I have various measurements continually recording the Response Time of my website. Now, though, I’d like to configure some Performance alerts to know if there is a Performance degradation, but I don’t know the exact settings to choose. So, how should I configure them?

Answer: First, notice this question is about Performance versus Availability. This is an important distinction because the alert settings would be configured differently for one versus the other.

Second, this question is looking for a “good enough” place to start. For example, if there is already a Response Time threshold set by Management, then the below Webtortoise Story may or may not be considered.

Now, regarding the question, the suggested answer is, “Consider using a Bayesian approach and study prior Rates of Change (explained in the below Story)”. Then consider how sensitive to configure the settings.

Fair warning, each measurement vendor would implement their alert modules in different ways and this below Story is only one specific example. The principle answer still applies, though:

Study prior Rates of Change.

Story

In Webtortoise World, is continually discussed how to measure website Performance and how to alert if it degrades. Have these conversations a lot, particularly with various Operation and Production folks who’d be receiving the alert emails (even in the middle of the night!).

Have all types of Availability alerts in place, but what if the site just slows (while still being technically available)? Maybe just need to tighten the settings a bit as the holiday season approaches? Maybe just getting a bit too many alert emails and people are starting to ignore them? Maybe just this? Maybe just that? Well, without further ado…

Step 01. Have a test measurement in place and let it run for a few days or a few weeks (the larger the sample size the better). The idea here is we’ll be looking “back” at the data to help determine the “forward” setting of the Performance alert.

Step 02. Decide the alert attributes. In this Story, we’ll be alerting on the Full Webpage Response Time metric, comparing the delta between the latter hour and the former hour. If the Rate of Change from one hour to the next is above a certain threshold, then send an alert email.

As mentioned, each measurement vendor would implement their alert modules in different ways. Please remember the attributes in this Story are only one specific example.

Step 03. Calculate the Rates of Change from one hour to the next. For example, if Response Time for the Midnight hour is 1,517 ms and if Response Time for the 01:00 AM hour is 1,503 ms, then the Rate of Change is 0.92% (1,517 minus 0.92% of 1,517 equals 1,503)(this Excel sheet contains the formulas for calculating this Rate of Change). If Response Time for the 01:00 AM hour is 1,503 ms and if Response Time for the 02:00 AM hour is 1,532 ms, then the Rate of Change is 1.93% (1,503 plus 1.93% of 1,503 equals 1,532).

May have noticed is being discarded whether the Rate of Change is positive or negative. For the purpose of this Story, that is okay.

However:

Note to all Performance measurement providers: Most have capabilities to alert on only Response Time INCREASES. Consider adding capability to alert also on Response Time DECREASES as they can be just as indicative of a problem.

Finish calculating the Rates of Change (In this Excel sheet, is calculated the Rates of Change for six weeks of test measurement data, by the hour (total of 1,008 hours). The formula in column D will always give a positive number (except when the Rate of Change is zero) and column D has been formatted to display a Percentage %).

Step 04. Now use a Frequency Distribution on the Rates of Change (for a refresher on Frequency Distributions, consider reading Webtortoise: What the Frequency?) to answer the question(s), “How many Rates of Change were less than 1%? How many Rates of Change were between 1-2%? How many Rates of Change were between 2-3%?” And so on.

The Frequency Distribution will answer these questions and, in the same Excel sheet, can see most Rates of Changes are between zero thru twenty’ish percent %. Now, given most Rates of Change, from one hour to the next, are less than 20%, should the alert threshold be set to less than 20%? …

Probably not.  Unless many alert emails are desired.

If the threshold setting is meant to alert in the most egregious of Performance degradations, then maybe set the alert threshold to 50% or greater. Looking again at the Frequency Distribution, can see a Rate of Change greater than 50% occurred eight times in the last six weeks. If the threshold setting is meant to alert in some other condition, then can look at the Frequency Distribution to get an idea of how sensitive the setting should be. At this point, consider other relative items to determine how sensitive the threshold setting should be. Otherwise, the threshold setting will come down to making a choice and iterating.

Document Complete / OnLoad:

_The following is optional reading material._

Here’s the traditional, time-based line chart for the test measurement used in this post.  It is for a 6-week period, by the hour, totaling 1,008 data.

Download the excel sheet here:  https://docs.google.com/open?id=0B9n5Sarv4oonZWJVMU9QTTlzSGM

Webtortoise Author on LinkedIn:  http://www.linkedin.com/in/leovasiliou

Webtortoise Author on Twitter:  https://twitter.com/Lvasiliou

#CatchpointUser #KeynoteUser #GomezUser #Webtortoise #Performance #WebPerformance

#ExcelStatistics #FrequencyDistribution

Blog at WordPress.com.