Web Tortoise

2013-Jan-31

Arithmetic Mean Versus Geometric Mean Versus Median

Response:

Hello! This #WebTortoise post was written 2013-JAN-31 at 09:06 PM ET (about #WebTortoise).

Main Points

#- An Arithmetic Mean will, for all intent and purpose in WebTortoise World, result in a higher value than its Geometric Mean counterpart. Relative to “faster is better” in web performance, might say an Arithmetic Mean is a pessimistic calculation.

#- A Geometric Mean will, for all intent and purpose in WebTortoise World, result in a lower value than its Arithmetic Mean counterpart. Relative to “faster is better” in web performance, might say a Geometric Mean is an optimistic calculation.

#- Define: What is a Percentile?

#- See, “How do I calculate the Geometric Mean in Excel”?

Story

Had an opportunity to discuss which statistical calculation should be used when looking at Performance charts. The discussion summary goes something like this.

First, assume consideration for a central-tendency calculation. Then:

If, in fact, looking for spurious outliers, consider plotting the Arithmetic Mean average.

Otherwise, consider plotting either the Geometric Mean or the Median, as they are very good central-tendency calculations.

To start, see this XY scatter plot taken from a day’s worth of synthetic test runs. In this Story, are using data from Catchpoint’s US node network (Thank you, Catchpoint), measuring @ 3,500 times a day (about 170 per hour). Intentionally chose this webpage as it contained a third-party ad network having particular host issues (the waterfall data was invaluable for troubleshooting, but that’s a Story for another day).

Eyeballing the chart, notice the thick band of majority data is less than 5,000 ms (right around 1,500 – 3,000 ms) with thinner pockets and bands throughout. Also notice around between 10:00 AM – 02:00 PM, there were no measurements higher than around 14,000 ms.

XY Scatter Plot

Second, will take the above XY scatter plot and draw a bar graph representing the middle 25th-75th percentile range (See, “What is a Percentile”). The idea here is to show a middle range (which might better represent overall Performance) versus just a single line (which can sometimes ‘lie’ or misrepresent).

Middle Range

Third, using the same data from the XY scatter plot, overlay line charts showing respective Arithmetic Mean, Geometric Mean and Median calculations.

Arithmetic Mean VS Geometric Mean VS Median

Critical thing to notice is the height of the Arithmetic Mean (Y axis) versus either the Geometric Mean or the Median. Notice how the Arithmetic Mean is, at times, either very near the upper limit of the middle range or, in some cases, even above the upper limit of the middle range! Now notice the Geometric Mean and Median are always comfortably between the middle range.

Other:

Notice the 12:00 AM and 07:00 AM hour’s Arithmetic Mean is above the Middle Range. Now, quickly glance back at the XY scatter plot to see the measurement data.

Notice the middle range for the 02:00 PM and 03:00 PM hours are smaller than other hours. Glancing back at the XY scatter plot, can see the thick band of measurement data is more tightly packed.

Last, want to give a fair warning when looking at these types of charts: The amount of the data will generally affect the height and patterns of the lines and bars. Do not be caught off guard if, for example, the Arithmetic Mean average is always above your middle range. This is a function of the amount of data.

Document Complete / OnLoad:

_The following is optional reading material._

Download Excel document: https://docs.google.com/file/d/0B9n5Sarv4oonaDZSZXNURzZrd00/edit?usp=sharing

LinkedIn: http://www.linkedin.com/in/leovasiliou

Twitter: @LvasiLiou

#CatchpointUser #KeynoteUser #GomezUser #Webtortoise #Performance #WebPerformance

#ExcelStatistics #ExcelXYScatter #ArithmeticMean #GeometricMean #Median

Blog at WordPress.com.