# Response:

Hello!  This Webtortoise post was written 2014-MAR-31 at 10:35 PM ET.

## Keep in Mind:

#- Credit to Lee Humphries over at http://www.thinkingapplied.com/means_folder/deceptive_means.htm for the idea-inspiring post.  Thank you, sir.

## Story:

Hello, Everyone.  In this Webtortoise Story, are going to expand on the concepts of The Web Performance “Hockey Stick” Cumulative Distribution Function (“CDF”) Chart and explore “rates of change between the percentiles”.  Specifically, we want to see how the Geometric rate of change between percentiles compares with the actual rate of change between percentiles.  We will do this by:

– constructing an actual CDF ;

– constructing a geometric CDF ; and

–  then charting both the actual and the geometric .

In this excel sheet, take a look at the raw data sample for the Internet Retailer Top 10 sites (these were simple default home page loads, but underlying measurement theory can apply to any performance metrics).  In chart 01 (cell AD5), have constructed the aggregate Hockey Stick CDF chart by calculating all the percentiles and then charting them as a line.

In chart 02 (cell AD32), have constructed the aggregate Hockey Stick, but have instead used the geometric accumulation instead of the actual accumulation.

In chart 03 (cell AD59), have plotted the information from charts 01 and 02 together.

In chart 04 (cell AY119), can now do things like compare the delta between the “actual” versus the “geometric” and use as another data point for studying your web performance.  Note the scroll button is cell ax120, to scroll through the IR Top 10.

#Analytics #CatchpointUser #ChartsAndDimensions #ChartsAndGraphs #Performance #SiteSpeed #WebPerformance #Webtortoise #WebPerf #WPO #DataVis

#ExcelHockeyStick #WebPerformanceHockeyStick #Percentile #GeometricMean

# Response:

Hello! This WebTortoise post was written 2014-FEB-28 04:59 PM ET.

## Keep in Mind:

#- Use a cumulative distribution function (CDF) for its exceptionally-powerful force rank, competitive benchmarking and diff’ing capabilities (Note, in this post have affectionately referred to the CDF as, “The Hockey Stick” chart).

#- Use The Hockey Stick chart to compare fully-distributed performance data to other, fully-distributed performance data.

#- Credit to both Peltiertech.com and Chandoo.org for their logic on how to fill areas with color in Excel. Thank you, Gentlemen.

#- The actual Internet Retailer (“IR”) Top 20 competitive benchmark data in this post was of only the respective home page(s); It is benchmarking the [Fully Loaded] Webpage Response Time metric. Note the theory in this post could apply to any performance data (e.g. a full business transaction) or to any metric (e.g. Wait Time, Load Time or any internal KPI).

#- Constructing The Hockey Stick CDF is a computationally, exceptionally expensive proposition.

## Story

Hello, Everyone. The idea of competitive benchmarking is not new. What is new here, however, is the proposed way of using the Hockey Stick CDF to do them.

I’ve found that most existing [web performance] competitive benchmarks do an “OK” job of comparing the summary statistics (e.g. the median or the average). But they don’t do a good enough job of showing the overall picture, nor of showing how far the overall pictures are from each other.

In this Webtortoise Story:

01. Are going to calculate the individual IR Top 20 performance medians and place them along the aggregate IR Top 20 Hockey Stick CDF curve. This will virtuously rank the median values from fastest to slowest.

02. Are going to calculate the Hockey Stick CDF of the individual IR Top 20 and compare to the calculated Hockey Stick CDF of the aggregate IR Top 20.  We will then calculate the net area between them and use that net area number as the mechanism to rank from fastest to slowest.

03. Are going to see whether the two different calculations result in different ranks.

### Place the Individual Medians

I’ve run several tests against the IR Top 20 homepages.  Now use those individual test data to create The Hockey Stick CDF (aggregate curve) by using your various percentile functions (e.g. in Excel, use =PERCENTILE.EXC) and calculate the 1st through the 99th percentiles. Then chart these respective percentiles as a line.  In this below chart, there are 99 chart data (one for each of the 1st through 99th percentiles that we just calculated).

Note you may follow along with this excel spreadsheet (This excel spreadsheet was created in excel 2013 (PC) and contains advanced formatting. Unfortunately, the advanced formatting is necessary for this particular post. In the chance everything does not display perfectly, I’ve placed a supplemental PPT here. The PPT will contains various pictures/graphics, though it will not contain the formulas).

Now place the individual IR Top 20 medians along the curve.  The long and short is you want your individual median to be as far to the left as possible.  Consider now how we can see better the disparity between the first place versus e.g. the last place.  Not only can we see the delta going up to down, but we can also see the delta going left to right.

### Compare the Full Hockey Stick CDFs

Now compare the individual IR Top 20 CDF to the aggregate IR Top 20 CDF and then fill the areas between them to [hopefully] give a better visual of just how far apart they are. At this point, be looking at the PPT slide four and beyond, and be looking at “Comparing Full Distributions” sheet of the Excel workbook. Right around cell AJ35 of this “Comparing Full Distributions” sheet, there is a scroll button where you can scroll through the IR Top 20 and see the differences; this post is showing only Costco (the “fastest”) and Macys (shown because it crosses over a few times, and this is a separate conversation on its own).

When the lines cross each other (perhaps several times) as with the Macys data, it becomes more apparent as to why we need to calculate the net area between the comparison(s).  So, calculate all the “red” area…. calculate all the “green” area and take the difference of the two.  The result will be one of three things:  net faster, net slower or net zero.  By doing it this way, we are now truly comparing the full distribution of the performance data.

### Comparing the Ordinal Rank of the Two Different Calculations

There were only slight differences between ranking the two calculations. Or perhaps, that should be phrased as a question instead of a statement. E.g., “Are the differences between ranking the two calculations substantial or insubstantial”? … Interesting. We can say, though, at least the first and last place(s) did not change. So perhaps that’s how we should approach, by doing multiple calculations and seeing the rank that way? So if an entity was in the same slot in multiple methods, then we build confidence… right?

Either way, good job to Costco.  Those are some amazing web performance numbers!

#Analytics #CatchpointUser #ChartsAndDimensions #ChartsAndGraphs #Performance #SiteSpeed #WebPerformance #Webtortoise #WebPerf #WPO #DataVis

#ExcelHockeyStick #WebPerformanceHockeyStick #Percentile #CompetitiveBenchmark

# Response:

Hello! This #WebTortoise post was written 2014-JAN-31 at 08:09 PM ET (about WebTortoise).

## Main Points:

#- Use a cumulative distribution function (CDF) for its exceptionally-powerful force rank and competitive benchmarking abilities (Note, in this post have affectionately referred to the CDF as, “The Hockey Stick” chart).

#— When looking at The Hockey Stick chart, study where it tapers/turns.  This is an important attribute and starts us down the path of answering the question, “At what point do I need to be worried about those long tails… those spurious outliers”?  Or perhaps the better question is, “At what point are those spurious outliers not so spurious”?

#- This is the first in the series of The Hockey Stick posts.  Please read and comment/contact @Lvasiliou to shape the rest of the series.

#- Nothing is perfect (therefore, everything is imperfect).  Use different charts and graphs in different ways.

#- Meetups are great.

## Story

Hello, Everyone.  I wanted to formally introduce the Hockey Stick chart I’ve had the opportunity to show many, many people over the past several months.  Tons of great feedback has been gathered and I’m finally taking the time to write the series about which I’d been speaking.

Now, if I were telling a joke, then the punch line is what was mentioned as the first main point:

Use a cumulative distribution function (CDF) for its exceptionally-powerful force rank and competitive benchmarking abilities

Technically, you could skip some of the background information I’ll be covering if you just wanted to remember the main point (kind of like “give a fish” vs “teach to fish”), but what fun is that?!  At any rate, let’s begin by saying, “It all starts with a scatter”.

### It All Starts With a Scatter

The power of using The Hockey Stick chart as a rank or comparison tool starts first with a basic review of some of the more popular, more known chart types.  In this below scatter, I chose to use some Synthetic data, but it’s important to note The Hockey Stick chart can be used to rank or compare any performance data.

In the above scatter, we are showing one day’s worth of website response times.  There are 3,552 individual data shown across a 24-hour period.  It looks like there was some type of Pattern Change at around 07:00 AM lasting until around 04:30 PM.  And after 04:30 PM, we can clearly see some spurious outliers.

### Enter Our Summary Line Charts

NOTE every chart in the remainder of this post was made from the above XY Scatter NOTE

Let’s now take the above scatter and turn it into a line graph.  Using the Arithmetic Mean and Median calculations, we’ll come up with this:

We could also add several percentile calculations (reminder, the Median is a.k.a. the 50th Percentile):

### Enter the Frequency Distribution

Let’s now take the above scatter and turn it into this below frequency distribution:

Going from the scatter… to the summary time-based lines… to this above frequency distribution, we ask the question, “Why else would we need to consider another chart/graph type?  These array of analytic assets give me so much information already!”

My answer is, “Nothing is perfect (everything is imperfect) and I suggest The Hockey Stick chart does a better job than the charts we’ve seen so far when it comes to:

– Comparing full distributions to each other ;

– Correctly placing individual measures along an aggregate curve ; and

### Enter The Hockey Stick

Let’s now take the above scatter and turn it into this below cumulative distribution (The Hockey Stick):

Reading the above Hockey Stick will go something like this:

– Twenty percent (X axis) of the 3,552 data (recall this is how many data across that 24-hour period) are below (or equal to) 1,603 milliseconds (Y axis).

– Forty percent (X axis) of the same 3,552 data are below (or equal to) 1,930 milliseconds (Y axis).

– Sixty percent (X axis) of the same 3,552 data are below (or equal to) 2,303 milliseconds (Y axis).

– Eighty percent (X axis) of the same 3,552 data are below (or equal to) 3,081 milliseconds (Y axis).

If you consider when we go from twenty percent to forty percent that it’s “cumulative”, then you now understand what a CDF is!  In other words, the forty percent of the data includes the previously mentioned twenty percent.

Said another way

Envision each of those individual 3,552 data have been dropped into one of those lottery number drawing thingies, we’d say:

– We have a twenty percent chance of pulling a data that is below 1,603 milliseconds.

– We have a forty percent chance of pulling a data that is below 1,930 milliseconds.

– We have a sixty percent chance of pulling a data that is below 2,303 milliseconds.

– We have an eighty percent chance of pulling a data that is below 3,081 milliseconds.

In other words, the CDF, a.k.a. what I’ve affectionately called The Hockey Stick:

describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x” [Wikipedia, CDF Article].

Or, if Mathwave.com’s definition is a little easier, then The Hockey Stick is:

the probability that the variate takes on a value less than or equal to x” (http://www.mathwave.com/articles/distribution_graphs.html).

Question:  Why am I going through the trouble of explaining what a CDF is?

Answer:  Because the Wikipedia article made my head hurt and if you didn’t already know what a CDF is, then reading that article probably would not have changed that.  So I decided to try writing up the explanation this way, in hopes it was more understandable.

Question:  I ventured and read the Wikipedia article and I’m curious why your percent are along the X axis instead of the Y axis?

Answer:  This is the first comment/question most math and stat types ask me.  The short answer is because it’s easier to read this way (at least for me it is).  The longer answer (including graphics) is in the optional reading section toward the bottom of this post.

Question:  If I’m reading this right and the percent is along the X axis, then the Median is right in the middle, between the 40th and 60th percent?

Answer:  Yes, exactly.  The Median is right where I describe and says, “exactly half of the data are below me and exactly half of the data are above me”.

Question:  When am I going to get to ranking and comparing?

Answer:  We are getting there (To give a little forecast, though, imagine The Hockey Stick is an aggregate performance curve of the Internet Retailer top 20 (or top whatever “X”).  Then imagine we’ll place the performance of each respective, individual IR top 20 along the aggregate curve), but for now, let’s discuss how we create The Hockey Stick.

### Creating the Hockey Stick

To create The Hockey Stick, have some raw data, calculate the percentiles and then chart those percentiles using a line.

01.  Have some raw data.

In this example, we are still sticking with the same 3,552 data from the scatter (download the Excel sheet right here).

02.  Calculate the percentiles.

Use built-in functions like PERCENTILE.EXC in Excel to calculate the percentiles for the 1st percentile all the way to the 99th percentile.

— We’ll talk about trimming later, but in case you miss it, you can also calculate instead e.g. the 5th percentile to the 96th percentile.  Doing this is sometimes okay because when you competitive benchmark, the individual performance will not be in those long tails of the aggregate distribution.  It also sometimes makes the chart more readable because the skew may be less.

— In Excel versions earlier than Excel 2010, the percentile functions are broken.

03.  Chart those percentile values with a line and, voila, you have your hockey stick!

In the provided Excel file, the PERCENTILE.EXC functions start in cell V103 and, in this case, we are running on the raw data in column f (F5:F3556 to be exact).  The chart data for this particular hockey stick is from cells V103:V201.

### The First Post in This Series

I’m going to stop here for the first post in the series.  May I ask, please take the time to read and contact me @Lvasiliou if you want to discuss, need help understanding or just want to geek out about charts or graphs in general.  Regarding the rest of the series’ posts, here are some general thoughts so far:

– Use The Hockey Sticks and force rank the performance of individuals along an aggregate curve

– Use The Hockey Sticks and compare full distributions to other full distributions

— compare individuals to other individuals

— compare individuals to groups

– Talk about some strategic use of color

_The following is optional reading material._

## Alternative Presentation

In the above hockey stick, we charted the 1st percentile all the way through the 99th percentile.  This is great because it shows us the performance of the entire distribution, but those last few percent are really scrunching most of the graph.  So, as an alternative way to present and “spread out” the data:

– chart, as a line, the 5th through the 96th percentile ; and

– chart, as columns, the amount of change from the previous percentile for the full 1st percentile through the 99th percentile.

The Amount of Change:  The first percentile is 1,082 milliseconds; the second percentile is 1,153 milliseconds.  The amount of change between the first and second percentile is 71 milliseconds (1,153 – 1,082 = 71).  Do this for each change and then chart this series as bars.  In the Excel sheet, the “change from previous percentile” series starts in cell AK104.  What this allows you to do is spread the data out a little bit, but still give you an idea of just how bad those extremely long tails really are… in the same chart!

#Analytics #CatchpointUser #ChartsAndDimensions #Performance #SiteSpeed #WebPerformance #Webtortoise #WebPerf #WPO #DataVis
#ExcelHockeyStick #WebPerformanceHockeyStick #Percentile

# Response:

Hello! This #WebTortoise post was written 2013-NOV-27 at 02:28 PM ET (about #WebTortoise).

## Main Points

#- Happy Thanksgiving!

#- Do not be afraid to “play” with your charts and see what you come up with.

#- Chart/Graph Name: Web Performance Heat map ; Shows: Performance, Availability and/or Reliability

## Story

The other day, I found this document entitled, “What Makes a Visualization Memorable?” (you can read the document here) and I had the thought to take one of their examples and “play” with it.

Citation note: Borkin MA, Vo AA, Bylinskii Z, Isola P, Sunkavalli S, Oliva A, Pfister H. What Makes a Visualization Memorable?. IEEE Transactions on Visualization and Computer Graphics (Proceedings of InfoVis 2013). 2013.

Now, this document goes on to present for “what makes a visualization memorable” and is a good standalone read. Additionally, though, the document is offered as part of a larger debate where (in my own words) the debate is basically:

Does *only* the chart data need to be presented in order to be understood (i.e. “no” “chart junk” [Tufte]?

-OR-

Does “chart junk” cause us to expend more cognitive effort… thus resulting is more/better understanding (and consequentially more retention and memorability) !?

The document states (and I restate here) it is, “… a first step toward…” the larger, at hand, debate, but I recommend giving it a read because it is interesting by itself.

Now, from Webtortoise’s perspective, I thought I’d pluck one of their chart examples and make it into a Web Performance chart. After looking at the plucked chart, should see why I chose to use it at this particular time of year.

First, their chart.

The Heat Map:

Second (and changing gears to actual Web Performance data), here’s an XY Scatter chart of one-days’ worth of Fully Loaded Webpage Response Time data (we’ll be turning into a Heat Map).

XY Scatter:

Third, turn the XY Scatter into a Heat Map with Time still on the X axis and Percentile Values on the Y axis (to do this, just using Excel’s built-in conditional formatting). Note in this case, the Percentile values are from 0% – 100% (min to max) by 5’s.

Webtortoise Web Performance Heat Map A:

Webtortoise Web Performance Heat Map B:

Webtortoise Web Performance Heat Map C:

Side by Side:

The question is, “Which Heat Map more accurately depicts the XY Scatter” (and what’s different about them)?

### The differences between A, B and C.

Heat Map A:

Heat Map A had the formatting applied across both axes. In other words, this one shows “the heat” across all of the data. If you look at the Excel file, notice the formatting is applied to cells =\$X\$27:\$AU\$47.

Heat Map B:

Heat Map B had the formatting applied across only one axis (in this case, the Hour of Day). In other words, this one shows “the heat” within a given hour. If you look at the Excel file, notice the formatting is applied to each respective column (e.g. =\$AA\$50:\$AA\$70).

Heat Map C:

Heat Map C had the formatting applied across only one axis (in this case, the Percentile). In other words, this one shows “the heat” within a given percentile. If you look at the Excel file, notice the formatting is applied to each respective row (e.g. =\$X\$75:\$AU\$75).

Having explained the difference, which Heat Map do YOU think more accurately reflects the XY Scatter?

This Webtortoise post was written a little open-ended, to leave room for a little debate on which Heat Map you’d use in a given situation. We didn’t give much thought to WHY we’d use a Heat Map in the first place (at the core of general “readability/understandability” debate), though. In other words, if the XY Scatter is serving the purpose, why use something else? Hint, because we’re wondering if the Heat Map is more memorable than the XY Scatter.

The answer lies in something I’ve written about before: Do not be afraid to “play” with your charts and see what you come up with. Do any of these Heat Map do as good of a job as the XY Scatter to show:

– between the hours of @ 06 AM to 06 PM, there was clearly some abnormal response times (pretend I clearly had the same time axis label on the Heat Maps as I did on the XY Scatter)?

– after @ 06 PM, there was clearly still some volatility, spurious outliers (that probably still needed attention)?

I don’t know; you tell me. But now that we know some of the ways to construct Heat Maps, we’ll be able to use them in other situations (which would not have happened if we didn’t “play” with them in the first place.

_The following is optional reading material._

#CatchpointUser #ChartsAndDimensions #Performance #SiteSpeed #WebPerformance #Webtortoise #WebPerf #WPO #DataVis

#ExcelHeatMap #ExcelConditionalFormatting

# Response:

Hello! This #WebTortoise post was written 2013-JUL-31 at 06:44 PM ET (about #WebTortoise).

## Main Points

#- Big Data doesn’t have to be the biggest; it has to be just Big Enough.

#- Sampling versus not sampling can affect your information both negatively or positively. For example, on one end of the spectrum, not sampling at all has effect of missing transient blips or subtle pattern changes. Where, on the other end, sampling at an extremely low rate has effect of being noisy, choppy or volatile.

#- Always remember Performance versus Availability. For example, the rate for your passive Performance data may be different than the rate for your active Synthetic data.

#- Nothing is perfect. Therefore, everything is imperfect.

## Story

In this Webtortoise story, going to look at the impact of sampling as it pertains to web performance measurements. Started with a week’s worth of data (no particular reason for a week’s worth; just have settled on that as a default time period), totaling @ 42K test samples (@ 250 per hour).

In this below chart 1, we’re looking at both a Median and Arithmetic Mean Average Performance chart calculated using all of the 250 Synthetic Test Samples per hour. Nice and smooth… Can see some fluctuation during peak vs non-peak… Arithmetic Mean versus Median is not too large of a delta… All in all, not a bad looking chart.

Now compare with this below chart 2 except are randomly selecting from the same data set to plot based on 50 test samples per hour.

Now compare with this below chart 3 except are randomly selecting from the same data set to plot based on 10 test samples per hour.

Last, now compare with this below chart 4 except we’ve applied a basic data smoother to the “10 samples per hour” chart.

Then put chart 1 and chart 4 side-by-side! If the chart titles were removed, would you be able to pick the one at 250 test samples per hour versus at 10 test samples per hour?

_The following is optional reading material._

#CatchpointUser #ChartsAndDimensions #KeynoteUser #Performance #SiteSpeed #WebPerformance #Webtortoise #WebPerf #WPO

#Sampling #DataSmoothing #Statistics

# Response:

Hello! This #WebTortoise post was written 2013-JUN-12 at 02:45 PM ET (about #WebTortoise).

## Main Points

#- Use panel charts for certain web performance data to make reading them easier.
#- Will need to manually calculate trend lines for panel chart data as the Excel built-in trend line function(s) ‘spans panels’.
#- Excel Line Chart, Excel Panel Chart

## Story

In Webtortoise World, are constantly looking at Performance and Availability data by various dimensions. Can look at data by Host (Performance of Host1 vs Host2 vs etc), by ISP (Performance of Verizon vs L3 vs etc), by Geography (Performance of East Coast vs West Coast vs etc) and by etc. The problem is can be tough to discern and understand when there are a lot of chart data on a single chart.

Enter panel charts.

Panel charts take multiple chart data and split them into separate ‘panels’, while still being on a single chart (the illusion is of multiple charts, though). These panel charts are generally shown side-by-side and have the benefit of using the same axes (by default).

A non-panel chart:

This first chart is not a particularly atypical chart type. It shows respective median Response Times, for a 24-hour period (2013-JUN-06 to be exact), by ISP. It could be Synthetic data, could be RUM data or could be data from any other instrumental ‘ruler’. Is not important this Breakdown is by ISP (the Breakdown could be anything really). What IS important is to concede how tough it is to read the individual measurements. Who’s the worst performer? Who’s the best performer? Are they all following the same trend?

The panel chart:

Is taken the above line chart and made into this below panel chart, with each of the respective ISP’s Performance data in their own panel. With [intentionally] no additional formatting, can more easily read the information (No additional formatting was applied so as to compare only the layout change. Purpose is to convey the value of the panel chart based on its own merit).

The panel chart with additional formatting:

One thing would not ever attempt to do with the first chart is add a trend line for each series! Shudder to think how much of a hot mess that’d have been! With the panel chart, though, adding trend lines makes the chart even more valuable (In this case, added 2nd-order Polynomial trend lines as might expect a Performance pattern to be similar to peak traffic pattern). Note to add trend lines to each panel means to manually calculate the trend line for each series (see article link in the below optional section) as Excel’s built-in trend line capability ‘spans panels’.

Let’s go back and answer some of those initial questions:

Q: Who’s the worst performer?
A: ISP 7 is clearly the worst performer.

Q: Who’s the best performer?
A: ISPs four and six are neck-and-neck for ‘the best performer’.

Q: Are they all following the same trend?
A: No! While each other ISP has a slight Performance degradation during peak traffic, ISP 5 is actually trending down!

_The following is optional reading material._

Manually Calculate Trendlines in Excel http://www.slideshare.net/ksatyamahesh/computing-trendline-values-in-excel

PeltierTech Article on Panel Charts http://peltiertech.com/Excel/ChartsHowTo/PanelChart1.html

#CatchpointUser #ChartsAndDimensions #KeynoteUser #Performance #SiteSpeed #WebPerformance #Webtortoise #WebPerf #WPO
#ExcelManuallyCalculatingTrendline #ExcelPanelChart #ExcelTrellisChart #SmallMultiples

# Response:

Hello! This #WebTortoise post was written 2013-APR-30 at 09:35 AM ET (about #WebTortoise).

## Main Points

#- Here’s to the statisticians of the world!

## Story

– Why did the statistician cross the road?
— He wasn’t sure.

– A statistician can have his head in an oven and his feet in ice, and he will say that on the average he feels fine (http://math.bnu.edu.cn/~chj/Statjokes.htm).

– A new government 10 year survey costing \$3,000,000,000 revealed 3/4 of the people in America make up 75% of the population (http://www.ahajokes.com/m027.html).

– According to recent surveys, 51% of the people are in the majority (http://www.ahajokes.com/m027.html).

– Statistics play an important role in genetics. For instance, statistics prove that numbers of offspring is an inherited trait. If your parents didn’t have any kids, odds are you won’t either (One passed by Gary Ramseyer, taken from http://stats.stackexchange.com/questions/1337/statistics-jokes).

– Final Exam: A statistics major was completely hung over the day of his final exam. It was a true/false test, so he decided to flip a coin for the answers. The statistics professor watched the student the entire two hours as he was flipping the coin… writing the answer… flipping the coin… writing the answer. At the end of the two hours, everyone else had left the final except for the one student. The professor walks up to his desk and interrupts the student, saying, “Listen, I have seen that you did not study for this statistics test, you didn’t even open the exam. If you are just flipping a coin for your answer, what is taking you so long?”
The student replies bitterly (as he is still flipping the coin), “Shhh! I am checking my answers!” (http://math.bnu.edu.cn/~chj/Statjokes.htm)

– Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital (Aaron Levenstein, taken from http://www.workjoke.com/statisticians-jokes.html).