# Response:

Hello!  This Webtortoise post was written 2014-MAR-31 at 10:35 PM ET.

## Keep in Mind:

#- Credit to Lee Humphries over at http://www.thinkingapplied.com/means_folder/deceptive_means.htm for the idea-inspiring post.  Thank you, sir.

## Story:

Hello, Everyone.  In this Webtortoise Story, are going to expand on the concepts of The Web Performance “Hockey Stick” Cumulative Distribution Function (“CDF”) Chart and explore “rates of change between the percentiles”.  Specifically, we want to see how the Geometric rate of change between percentiles compares with the actual rate of change between percentiles.  We will do this by:

– constructing an actual CDF ;

– constructing a geometric CDF ; and

–  then charting both the actual and the geometric .

In this excel sheet, take a look at the raw data sample for the Internet Retailer Top 10 sites (these were simple default home page loads, but underlying measurement theory can apply to any performance metrics).  In chart 01 (cell AD5), have constructed the aggregate Hockey Stick CDF chart by calculating all the percentiles and then charting them as a line.

In chart 02 (cell AD32), have constructed the aggregate Hockey Stick, but have instead used the geometric accumulation instead of the actual accumulation.

In chart 03 (cell AD59), have plotted the information from charts 01 and 02 together.

In chart 04 (cell AY119), can now do things like compare the delta between the “actual” versus the “geometric” and use as another data point for studying your web performance.  Note the scroll button is cell ax120, to scroll through the IR Top 10.

#Analytics #CatchpointUser #ChartsAndDimensions #ChartsAndGraphs #Performance #SiteSpeed #WebPerformance #Webtortoise #WebPerf #WPO #DataVis

#ExcelHockeyStick #WebPerformanceHockeyStick #Percentile #GeometricMean

# Response:

Hello! This WebTortoise post was written 2014-FEB-28 04:59 PM ET.

## Keep in Mind:

#- Use a cumulative distribution function (CDF) for its exceptionally-powerful force rank, competitive benchmarking and diff’ing capabilities (Note, in this post have affectionately referred to the CDF as, “The Hockey Stick” chart).

#- Use The Hockey Stick chart to compare fully-distributed performance data to other, fully-distributed performance data.

#- Credit to both Peltiertech.com and Chandoo.org for their logic on how to fill areas with color in Excel. Thank you, Gentlemen.

#- The actual Internet Retailer (“IR”) Top 20 competitive benchmark data in this post was of only the respective home page(s); It is benchmarking the [Fully Loaded] Webpage Response Time metric. Note the theory in this post could apply to any performance data (e.g. a full business transaction) or to any metric (e.g. Wait Time, Load Time or any internal KPI).

#- Constructing The Hockey Stick CDF is a computationally, exceptionally expensive proposition.

## Story

Hello, Everyone. The idea of competitive benchmarking is not new. What is new here, however, is the proposed way of using the Hockey Stick CDF to do them.

I’ve found that most existing [web performance] competitive benchmarks do an “OK” job of comparing the summary statistics (e.g. the median or the average). But they don’t do a good enough job of showing the overall picture, nor of showing how far the overall pictures are from each other.

In this Webtortoise Story:

01. Are going to calculate the individual IR Top 20 performance medians and place them along the aggregate IR Top 20 Hockey Stick CDF curve. This will virtuously rank the median values from fastest to slowest.

02. Are going to calculate the Hockey Stick CDF of the individual IR Top 20 and compare to the calculated Hockey Stick CDF of the aggregate IR Top 20.  We will then calculate the net area between them and use that net area number as the mechanism to rank from fastest to slowest.

03. Are going to see whether the two different calculations result in different ranks.

### Place the Individual Medians

I’ve run several tests against the IR Top 20 homepages.  Now use those individual test data to create The Hockey Stick CDF (aggregate curve) by using your various percentile functions (e.g. in Excel, use =PERCENTILE.EXC) and calculate the 1st through the 99th percentiles. Then chart these respective percentiles as a line.  In this below chart, there are 99 chart data (one for each of the 1st through 99th percentiles that we just calculated).

Note you may follow along with this excel spreadsheet (This excel spreadsheet was created in excel 2013 (PC) and contains advanced formatting. Unfortunately, the advanced formatting is necessary for this particular post. In the chance everything does not display perfectly, I’ve placed a supplemental PPT here. The PPT will contains various pictures/graphics, though it will not contain the formulas).

Now place the individual IR Top 20 medians along the curve.  The long and short is you want your individual median to be as far to the left as possible.  Consider now how we can see better the disparity between the first place versus e.g. the last place.  Not only can we see the delta going up to down, but we can also see the delta going left to right.

### Compare the Full Hockey Stick CDFs

Now compare the individual IR Top 20 CDF to the aggregate IR Top 20 CDF and then fill the areas between them to [hopefully] give a better visual of just how far apart they are. At this point, be looking at the PPT slide four and beyond, and be looking at “Comparing Full Distributions” sheet of the Excel workbook. Right around cell AJ35 of this “Comparing Full Distributions” sheet, there is a scroll button where you can scroll through the IR Top 20 and see the differences; this post is showing only Costco (the “fastest”) and Macys (shown because it crosses over a few times, and this is a separate conversation on its own).

When the lines cross each other (perhaps several times) as with the Macys data, it becomes more apparent as to why we need to calculate the net area between the comparison(s).  So, calculate all the “red” area…. calculate all the “green” area and take the difference of the two.  The result will be one of three things:  net faster, net slower or net zero.  By doing it this way, we are now truly comparing the full distribution of the performance data.

### Comparing the Ordinal Rank of the Two Different Calculations

There were only slight differences between ranking the two calculations. Or perhaps, that should be phrased as a question instead of a statement. E.g., “Are the differences between ranking the two calculations substantial or insubstantial”? … Interesting. We can say, though, at least the first and last place(s) did not change. So perhaps that’s how we should approach, by doing multiple calculations and seeing the rank that way? So if an entity was in the same slot in multiple methods, then we build confidence… right?

Either way, good job to Costco.  Those are some amazing web performance numbers!

#Analytics #CatchpointUser #ChartsAndDimensions #ChartsAndGraphs #Performance #SiteSpeed #WebPerformance #Webtortoise #WebPerf #WPO #DataVis

#ExcelHockeyStick #WebPerformanceHockeyStick #Percentile #CompetitiveBenchmark

# Response:

Hello! This #WebTortoise post was written 2014-JAN-31 at 08:09 PM ET (about WebTortoise).

## Main Points:

#- Use a cumulative distribution function (CDF) for its exceptionally-powerful force rank and competitive benchmarking abilities (Note, in this post have affectionately referred to the CDF as, “The Hockey Stick” chart).

#— When looking at The Hockey Stick chart, study where it tapers/turns.  This is an important attribute and starts us down the path of answering the question, “At what point do I need to be worried about those long tails… those spurious outliers”?  Or perhaps the better question is, “At what point are those spurious outliers not so spurious”?

#- This is the first in the series of The Hockey Stick posts.  Please read and comment/contact @Lvasiliou to shape the rest of the series.

#- Nothing is perfect (therefore, everything is imperfect).  Use different charts and graphs in different ways.

#- Meetups are great.

## Story

Hello, Everyone.  I wanted to formally introduce the Hockey Stick chart I’ve had the opportunity to show many, many people over the past several months.  Tons of great feedback has been gathered and I’m finally taking the time to write the series about which I’d been speaking.

Now, if I were telling a joke, then the punch line is what was mentioned as the first main point:

Use a cumulative distribution function (CDF) for its exceptionally-powerful force rank and competitive benchmarking abilities

Technically, you could skip some of the background information I’ll be covering if you just wanted to remember the main point (kind of like “give a fish” vs “teach to fish”), but what fun is that?!  At any rate, let’s begin by saying, “It all starts with a scatter”.

### It All Starts With a Scatter

The power of using The Hockey Stick chart as a rank or comparison tool starts first with a basic review of some of the more popular, more known chart types.  In this below scatter, I chose to use some Synthetic data, but it’s important to note The Hockey Stick chart can be used to rank or compare any performance data.

In the above scatter, we are showing one day’s worth of website response times.  There are 3,552 individual data shown across a 24-hour period.  It looks like there was some type of Pattern Change at around 07:00 AM lasting until around 04:30 PM.  And after 04:30 PM, we can clearly see some spurious outliers.

### Enter Our Summary Line Charts

NOTE every chart in the remainder of this post was made from the above XY Scatter NOTE

Let’s now take the above scatter and turn it into a line graph.  Using the Arithmetic Mean and Median calculations, we’ll come up with this:

We could also add several percentile calculations (reminder, the Median is a.k.a. the 50th Percentile):

### Enter the Frequency Distribution

Let’s now take the above scatter and turn it into this below frequency distribution:

Going from the scatter… to the summary time-based lines… to this above frequency distribution, we ask the question, “Why else would we need to consider another chart/graph type?  These array of analytic assets give me so much information already!”

My answer is, “Nothing is perfect (everything is imperfect) and I suggest The Hockey Stick chart does a better job than the charts we’ve seen so far when it comes to:

– Comparing full distributions to each other ;

– Correctly placing individual measures along an aggregate curve ; and

### Enter The Hockey Stick

Let’s now take the above scatter and turn it into this below cumulative distribution (The Hockey Stick):

Reading the above Hockey Stick will go something like this:

– Twenty percent (X axis) of the 3,552 data (recall this is how many data across that 24-hour period) are below (or equal to) 1,603 milliseconds (Y axis).

– Forty percent (X axis) of the same 3,552 data are below (or equal to) 1,930 milliseconds (Y axis).

– Sixty percent (X axis) of the same 3,552 data are below (or equal to) 2,303 milliseconds (Y axis).

– Eighty percent (X axis) of the same 3,552 data are below (or equal to) 3,081 milliseconds (Y axis).

If you consider when we go from twenty percent to forty percent that it’s “cumulative”, then you now understand what a CDF is!  In other words, the forty percent of the data includes the previously mentioned twenty percent.

Said another way

Envision each of those individual 3,552 data have been dropped into one of those lottery number drawing thingies, we’d say:

– We have a twenty percent chance of pulling a data that is below 1,603 milliseconds.

– We have a forty percent chance of pulling a data that is below 1,930 milliseconds.

– We have a sixty percent chance of pulling a data that is below 2,303 milliseconds.

– We have an eighty percent chance of pulling a data that is below 3,081 milliseconds.

In other words, the CDF, a.k.a. what I’ve affectionately called The Hockey Stick:

describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x” [Wikipedia, CDF Article].

Or, if Mathwave.com’s definition is a little easier, then The Hockey Stick is:

the probability that the variate takes on a value less than or equal to x” (http://www.mathwave.com/articles/distribution_graphs.html).

Question:  Why am I going through the trouble of explaining what a CDF is?

Answer:  Because the Wikipedia article made my head hurt and if you didn’t already know what a CDF is, then reading that article probably would not have changed that.  So I decided to try writing up the explanation this way, in hopes it was more understandable.

Question:  I ventured and read the Wikipedia article and I’m curious why your percent are along the X axis instead of the Y axis?

Answer:  This is the first comment/question most math and stat types ask me.  The short answer is because it’s easier to read this way (at least for me it is).  The longer answer (including graphics) is in the optional reading section toward the bottom of this post.

Question:  If I’m reading this right and the percent is along the X axis, then the Median is right in the middle, between the 40th and 60th percent?

Answer:  Yes, exactly.  The Median is right where I describe and says, “exactly half of the data are below me and exactly half of the data are above me”.

Question:  When am I going to get to ranking and comparing?

Answer:  We are getting there (To give a little forecast, though, imagine The Hockey Stick is an aggregate performance curve of the Internet Retailer top 20 (or top whatever “X”).  Then imagine we’ll place the performance of each respective, individual IR top 20 along the aggregate curve), but for now, let’s discuss how we create The Hockey Stick.

### Creating the Hockey Stick

To create The Hockey Stick, have some raw data, calculate the percentiles and then chart those percentiles using a line.

01.  Have some raw data.

In this example, we are still sticking with the same 3,552 data from the scatter (download the Excel sheet right here).

02.  Calculate the percentiles.

Use built-in functions like PERCENTILE.EXC in Excel to calculate the percentiles for the 1st percentile all the way to the 99th percentile.

— We’ll talk about trimming later, but in case you miss it, you can also calculate instead e.g. the 5th percentile to the 96th percentile.  Doing this is sometimes okay because when you competitive benchmark, the individual performance will not be in those long tails of the aggregate distribution.  It also sometimes makes the chart more readable because the skew may be less.

— In Excel versions earlier than Excel 2010, the percentile functions are broken.

03.  Chart those percentile values with a line and, voila, you have your hockey stick!

In the provided Excel file, the PERCENTILE.EXC functions start in cell V103 and, in this case, we are running on the raw data in column f (F5:F3556 to be exact).  The chart data for this particular hockey stick is from cells V103:V201.

### The First Post in This Series

I’m going to stop here for the first post in the series.  May I ask, please take the time to read and contact me @Lvasiliou if you want to discuss, need help understanding or just want to geek out about charts or graphs in general.  Regarding the rest of the series’ posts, here are some general thoughts so far:

– Use The Hockey Sticks and force rank the performance of individuals along an aggregate curve

– Use The Hockey Sticks and compare full distributions to other full distributions

— compare individuals to other individuals

— compare individuals to groups

– Talk about some strategic use of color

_The following is optional reading material._

## Alternative Presentation

In the above hockey stick, we charted the 1st percentile all the way through the 99th percentile.  This is great because it shows us the performance of the entire distribution, but those last few percent are really scrunching most of the graph.  So, as an alternative way to present and “spread out” the data:

– chart, as a line, the 5th through the 96th percentile ; and

– chart, as columns, the amount of change from the previous percentile for the full 1st percentile through the 99th percentile.

The Amount of Change:  The first percentile is 1,082 milliseconds; the second percentile is 1,153 milliseconds.  The amount of change between the first and second percentile is 71 milliseconds (1,153 – 1,082 = 71).  Do this for each change and then chart this series as bars.  In the Excel sheet, the “change from previous percentile” series starts in cell AK104.  What this allows you to do is spread the data out a little bit, but still give you an idea of just how bad those extremely long tails really are… in the same chart!