Web Tortoise

2013-Dec-30

WebTortoise – WebPerf SLA Graphs

Webtortoise - SLA Bullet Graph

Response:

Hello!  This #WebTortoise post was written 2013-DEC-30 at 02:25 PM ET (about WebTortoise).

Main Points:

#-  Always remember the difference between Availability versus Performance.  This is especially important when it comes to SLA charting as “traditional” SLA graphs have the calculated Performance lines wanting to be *below* the SLA lines, but where “traditional” SLA graphs have the calculated Availability lines wanting to be *above* the SLA lines.  Therefore, if Performance SLA and Availability SLA are charted ambiguously, that vulnerability may lead to misreading of the data.

#-  See also:  Search for, “Stephen Few Bullet Graph” for the Bullet Graph Design Specification; the Webtortoise’ified work in this post included some of Stephen’s ideas.

#- Chart/Graph Name:  Bullet Graph  #- Shows:  SLA; whether or not you “missed” or “met” the SLA (target).

Story

In this Webtortoise Story, will explore some of the nuances of SLA Charting (from a charting/visual perspective) and discuss some of the options to consider when deciding how to present SLA graphs.  Will make use of the Bullet Graph (as per Stephen Few) specification.

First:  it all starts with an XY Scatter.  This graph shows one-day’s worth (midnight to midnight) of website performance data, with time on the X axis and the Webpage Response (in milliseconds) on the Y axis.

Webtortoise - SLA Bullet Graph - 1

Now turn this scatter into a line(s) and get something like this:

Webtortoise - SLA Bullet Graph - 2

Then introduce an SLA target.  Is hereby declared a Performance SLA of three seconds resulting in something like this:

Webtortoise - SLA Bullet Graph - 3

As the above graph is a Performance SLA graph, want the calculated lines to be *below* the SLA line.  But here’s where things start to get tricky.  Notice, depending on the calculation (e.g. the Median versus the 75th Percentile versus the 90th Percentile), the different calculated lines are above (missed) or below (met) the SLA line at different times!  So must further refine the SLA and have chosen this:  Ninety percent of the Response Times less than three seconds (This SLA is chosen for just illustrative purpose.  Choosing the actual SLA will be different, in different circumstances).  Removing those calculated lines except for the 90th Percentile results in this:

Webtortoise - SLA Bullet Graph - 4

Now can say this:

– From the hours of Midnight to 06 AM, were meeting the SLA (the calculated line was below the SLA line).

– From the hours of 07 AM to 07 PM, were missing the SLA (the calculated line was above the SLA line).

– Then, from the hours of 08 PM to Midnight, were meeting the SLA.

Now, here’s where things get trickier.  As this is a *Performance* graph, want the calculated line (in this case, the 90th Percentile) to be *BELOW* the SLA line.  If, however, this were an *Availability* graph, would want the calculated line to be *ABOVE* the SLA line!

Have seen this *BELOW*/*ABOVE* distinction been graphed with ambiguity in too many cases, so please make sure to not perpetuate.

Traditional Performance SLA graph next to traditional Availability SLA graph (In both graphs, the solid, horizontal black line is the SLA target):

Webtortoise - SLA Bullet Graph - 5

In the above “left” graph, want the calculated line to be *below* the SLA line.  But in the “right” graph, want the calculation (in this case, the calculations are bars) to be above the SLA line.

Question:  If was not known one graph was Performance and the other Availability, then how would it be known whether the calculation was to be above or below the target to be deemed as either an, “SLA missed” or an, “SLA met”?

Answer:  For this reason, do things like add colors (maybe red for “bad” and green for “good”, or maybe dark gray for “bad” and light gray for “good”).  Also for this reason, Stephen Few added some alternative designs to his original specification.  For reading about Bullet Graphs, I encourage readers to go check out the spec for themselves (do a search for, “Stephen Few Bullet Graph”).

Now, take some of the ideas from Stephen’s specification (specifically, the color shading) and just make the graphs both have the same mechanic.  In this case, the mechanic will be to make the Performance SLA graph be considered an, “SLA met” if the calculated line is *above* the SLA line (just like with the Availability graph)!!!  This way, regardless of Availability or Performance, will be able to quickly see whether or not the SLA was missed or met!!!  Our SLA will remain the same as previously established (Ninety percent of the Response Times less than three seconds), but instead of the actual Response Time on the Y axis, will be the Percent of Response Times under three seconds!

Making these changes results in a graph like this:

Webtortoise - SLA Bullet Graph

When circulating this graph for review, was asked whether or not this was showing Availability or Performance.  The answer is:  Neither, it is showing a target and whether or not the target was “missed” or “met”.  Now, you could infer that “Three-Second…” in the title meant Performance.  This is true, but more important, it reinforces the need to normalize the mechanic of presenting either an Availability or Performance SLA graph (because people won’t necessarily read and internalize all of the graph attributes to correctly read what the author is intending).

Last thought on closing, is not technically necessary graph a target and whether or not the target was “missed” or “met”; could easily just type the word “missed” or “met”.  But, inevitably, some human comes along and starts asking questions like, “What was the SLA?”  “By how much was the SLA missed or met”?  And so on.

Document Complete / OnLoad:

_The following is optional reading material._

Variants for the Above, “Three-Second SLA:  Missed or Met” Graph

Webtortoise - SLA Bullet Graph - 7

Webtortoise - SLA Bullet Graph - 8

Webtortoise - SLA Bullet Graph - 9

LinkedIn: http://www.linkedin.com/in/leovasiliou

Twitter: @LvasiLiou

#Analytics #CatchpointUser #ChartsAndDimensions #Performance #SiteSpeed #WebPerformance #Webtortoise #WebPerf #WPO #DataVis

#BulletGraph #SLA #SLACharting #SLAMonitoring

2013-May-16

Don’t Forget About Availability

Response:

Hello! This #WebTortoise post was written 2013-MAY-16 at 11:34 AM ET (about #WebTortoise).

Main Points

#- Analyze Availability by various Dimensions, e.g. Hour of Day or Minute of Hour, to look for patterns.

#- Performance infers Availability. Performance may be measured if and only if Availability = 1 (your choices are either 1 or 0; either something’s available or something’s not available).

#- We monitor Availability; we measure Performance.

#- Don’t go it alone. When working to uncover patterns in your Availability and Performance data, will need the help of others in the Organization.

Story

I thought I’d break away from the normal second-and-third-person writing style of Webtortoise to write this more intimate, first-person post. Lately, I’ve been feeling bad for my buddy, Availability (In this Webtortoise Story, my buddy’s name is, “Availability”). You see, Availability’s cousin, Performance, has been getting all of the limelight. I mean, don’t get me wrong, Performance IS sleek and sexy while Availability IS binary and boring, but the only reason we’re able to talk about all these advancements in Performance is because of their JOINT efforts!

So much attention has been given to Performance lately that I am seeing more and more folks forget, or casually glaze over, Availability! The problem here is: Performance _infers_ Availability. That is, if it’s not available, then you cannot measure it [for Performance].

So please, help me spread the word and remind folks to never forget about their ol’ buddy and friend, “Availability”.

And now, your obligatory Webtortoise chart:

In this chart, we counted the number of Availability strikes (a.k.a. errors) for several days. Then we plotted the COUNT by Minute of Hour.

In this first chart, there is no special formatting. But can still see some high errors counts.

Blog Post Availability by Minute of Hour - 1

In this second chart, have highlighted and called out the discovered pattern! At first, the guessed pattern was incorrect because was trying to find a *single*. However, after pulling in some more people resources, was able to figure out there were *multiples*.

Blog Post Availability by Minute of Hour - 2

In this specific case, these patterns were caused by *two separate* log shippings, across two different subsystems, affecting page load (i.e. the page was not available)! And had it not been for a Performance Management Program, may never have discovered these Patterns!

Document Complete / OnLoad:

_The following is optional reading material._

LinkedIn: http://www.linkedin.com/in/leovasiliou

Twitter: @LvasiLiou

Download Excel sheet here:  https://docs.google.com/file/d/0B9n5Sarv4oonTjR0Zk9oYzI1bGc/edit?usp=sharing

#CatchpointUser #KeynoteUser #Webtortoise #Performance #WebPerformance #SiteSpeed #ChartsAndDimensions #Availability

2012-Dec-20

WebTortoise Year in Review 2012

Filed under: Availability, Performance, Review — Tags: , , , , , , , — leovasiliou @ 03:37 PM EST

Response:

Hello! This #WebTortoise post was written 2012-DEC-20 at 12:53 PM ET (about #WebTortoise).

Main Points

#- Because saying it once sometimes just isn’t enough! Here’s the WebTortoise 2012 Year in Review.

Story

Once in a while, will have to retrain or refresh on a particular subject matter. This may be the result of an organizational change, may be the result of using something only occasionally or may be the result of any number of factors. In that vein, here are some select WebTortoise 2012 posts:

#- How do I calculate the geometric mean in Excel?

#- Excel: Use color to add value to your Performance charts.

Excel.Waterfall.Snip.2012-MAR-06-2114ET

#- Arithmetic Mean Average versus Geometric Mean Average: Knowing when to choose which calculation.

Comparing.Mean.Calculations

#- Excel Frequency Distribution: How many Response Times were between 0-1,000ms? How many Response Times were between 1,001-2,000ms? And so on?

Frequency Distribution

#- Excel Heat Map: Making it easier to find patterns in website Response Time. Applying Excel conditional formatting (red/yellow/green) to detect website’s “hot” times.

Heat.Map.Side.by.Side

#- Always consider the different between Performance versus Availability when choosing your measurement instrument(s).

#- Check the overlay. Comparing the latter set of Response Times to the earlier set of Response Times. Was there a Pattern Change?

Blog.Post.Check.the.Overlay-3

#- The Excel Hockey Stick Chart: Looking at Response Times across the entire % percentage range.

Excel Hockey Stick Chart RE Web Performance

#- Studying Prior Rates of Change to configure “Site is Slow” Performance alerts.

Document Complete / OnLoad:

_The following is optional reading material._

LinkedIn: http://www.linkedin.com/in/leovasiliou

Twitter: @LvasiLiou

#CatchpointUser #KeynoteUser #GomezUser #Webtortoise #Performance #WebPerformance

#ExcelStatistics #FrequencyDistribution

Blog at WordPress.com.