# Web Tortoise

## 2012-Jun-28

### What the Frequency?

Filed under: Performance — Tags: , , , , , — leovasiliou @ 01:43 PM EDT

Response:

Hello! This was written 2012-JUN-28 at 13:26 AM ET.

Question: What is a frequency distribution?

Answer: A frequency distribution is a powerful, non-linear way to analyze your website performance data. It is a way to show the number of times (frequency) a data or value appears in a given range or interval (distribution). Consider these two basic examples:

Example One:

Take the following numbers 1, 2, 4, 7 and 9. How many are between 1-5? How many are between 6-10? (Three numbers (numbers 1, 2 and 4) are between 1-5. Two numbers (numbers 7 and 9) are between 6-10.)

Example Two:

Take the following hypothetical test letter grades A, B, A, B, A, C, B and D. How many of each letter grades are there? (There are three “A”. There are three “B”. There is one “C”. There is one “D”.)

Now we will use Excel’s FREQUENCY command to distribute thousands of Webpage Response times. Fear not! Just see the attached excel sheet showing how we came up with the following frequency distribution chart:

_The following is optional reading material._

#ExcelArrayFormula #ExcelStatistics #CatchpointUser #KeynoteUser #GomezUser #FrequencyDistribution

## 2012-May-03

### Half Full or Half Empty? Choosing a Statistical Calculation

Response:

Hello! This was written 2012-MAY-03 at 12:27 PM ET.

Question: In your life as a Keynote or Catchpoint user, suppose you have a day’s worth of website response time performance data. Should you average using the Arithmetic Mean or the Geometric Mean?

Answer: Assume you want to use a central-tendency calculation like a mean (we’ll talk about percentiles in another post). Since the Geometric Mean will result in a lower value than the Arithmetic Mean, you might say the Geometric Mean is an “optimistic calculation” where the Arithmetic Mean is a “pessimistic calculation”. See the below going from raw data to line chart, then decide for yourself when to use either of the two calculations.

First, the below scatter plot shows a day’s worth of website response time data. In this case, there are about 2,880 total data, or about 120 per each respective 24 hours in a day. Notice the single spurious outlier in the 03:00 AM hour.

Second, the above scatter plot is then transformed into the below line graph. The blue line calculates the data using the Arithmetic Mean and the red line calculates the same set of data using the Geometric Mean. The single spurious outlier in the 03:00 AM hour caused a “spike” to appear in the line graph.

Fair warning, the perceived impact (a.k.a. “The Spike”) will depend on how many data are being averaged. The point remains the same, though: The Arithmetic Mean is more subject to skew from spurious outliers than the Geometric Mean. So, depending on your situation, you may want to choose one or the other (or both to see the delta, which is another powerful way to analyze).