Last week, while in the thick of a seemingly dominant series against the Pittsburgh Penguins, it occurred to me that the Bruins looked pretty good... a little TOO good. Taking a gander at their underlying numbers, it turned out that they were in fact benefitting from more than their fare share of luck, riding a league-high PDO and having a number of players shooting the lights out well above their normal rates. I offered some warnings, suggested some fixes to mitigate the damage and closed the book, hoping regression wouldn't hit. So far so good. But there's a bit more to the story...
With a sticktap to Arik Parnass at EOTP for starting the thought process rolling, I'm following up last week's topic with a more thorough examination of just how lucky we've been in context of the current playoffs and seven years of playoffs past in a study of how fortune and skill inform playoff success.
---------------------------------------------------------------
First, lets start of with a little thieving from baseball sabremetrics. In 2006, baseball stat-man Tom Tango looked at luck vs talent across the four major sports and then some (sorry NASCAR fans). Head on over to his article for the full schpeel, but quite simply he compared standings points that would be earned by pure chance, as if by flipping a coin, vs the observed actual points to derive how many games it would take for a better talent to overtake a team running on pure luck. Essentially, he offered a mathematical formula for capturing the likelihood a team is benefiting from skill rather than chance. We'll come back to this in just a moment.
I'll also be writing on the backs of two other articles well worth your time; this piece by Phil Birnbaum jumping off from and updating Tango's work and this incredibly detailed study by Sharks blogger SnarkSD for NHL Numbers, taking a similar approach to PDO.
It's going to get a little heavy with the maths down below, but I had to muddle through a lot of this myself not having any schooling in such statistical theory, so maybe my novice point of view can help make this a little less clinical for those of you who just want at the meat of the findings. To aid this, I'll include a warning when the heavy shit is coming and a note of where to come back for findings so you can choose your own adventure.
---------------------------------------------------------------
SETTING THE STAGE
First up, we'll need to define a couple of terms that are going to inevitably come up as part of the discussion. Simple algebra lesson incoming!
(YEAH, YOU CAN SKIP THIS ARITHMETICPHOBES, BUT IT'S HANDY TO KNOW FOR THE RESULTS)
The first is Variance, which is a measure of how far a data set is spread out, measured in the average of the (squared) differences from the mean. In PDO terms, we're averaging the distance from 1.
Second is Standard Deviation, which takes Variance and tells you the expected normal spread of the distribution of your data. PDO that falls within 1 Standard Deviation of the mean in either direction is "normal" and highly probable. Outside of that, we start to question the likelihood it can be maintained. Mathematically, it's the square root of Variance.
Next up is Z Score, which tells you how many Standard Deviations a value is from the mean. This is intuitively calculated by subtracting from your stat of interest its Mean and dividing by its Standard Deviation.
---------------------------------------------------------------
SCOUTING OUT "TALENT"
Ok, still with us? No? Damn...
Ah well, back we go to Tom Tango's work on luck. His piece hypothesized that you could separate the talent contributing to wins from the chance through simulations decided by pure luck - coin flip hockey, and no I'm not talking about John Tortorella's all-defense coaching strategy.
(SLIGHT MATH COMING, YOU'RE PROBABLY OK)
Tango showed that if you subtract the coin-flip Variance from the actual Observed Variance, what remains is that which falls outside the bounds of pure luck - a Variance of Talent if you will. In hockey terms, he looked at standings points and subtracted the (squared) number of points that occurred from 50/50 chance from the number of points that actually happened and came up with a number that, in essence, lie beyond the amount of spread accounted for by luck. He found that - back when ties were a thing for his data set - that it would take 36 games played before the Var(Chance) and Var(Talent) evened out. In other words before 36 games, the results are far too subject to randomness to see the cream rise to the top.
The
Birnbaum piece updates Tango by adjusting his findings for the reality of the three-point game wrought by the infernal shootout. In essence, this modification raised the number of games necessary to overtake randomness to approximately 70.
To provide a tangible example of the concept, you've probably noticed that it's hard for teams who fall behind early in the season to catch up since the advent of the skills competition. This is why. The points system allows too much randomness for skill to exert itself and propel a team that fell back ahead of those with less to speak of in the talent department. Also, note how some teams that probably shouldn't have made the playoffs in this 48 game shortened season failed to fail out of the running - this is why. Chance was still very much influential of overall success and the between-lockout changes to the game made matters much worse. Or, if you want to be a pollyanna, they increased parity.
In the playoffs we fortunately have a binary point outcome: win or go home. So we should see this threshold of games decrease under such circumstances - but will it be enough to render the tiny number of games to be played a valid sample to limit the influence of randomness?
(OK, HIGH MATH ALERT, SKIM TO THE NEXT BOLD FOR RESULTS)
Lets look. Per Tango:
Here is one way to figure out the var(true) for any league.
Step 1 - Take a sufficiently large number of teams (preferably all with the same number of games).
Step 2 - Figure out each team’s winning percentage.
Step 3 - Figure out the standard deviation of that winning percentage.
I just did it quick, and I took the last few years in the NFL, and the SD is .19, which makes var(observed) = .19^2
Step 4 - Figure out the random standard deviation. That’s easy: sqrt(.5*.5/16)
16 is the number of games for each team.
So, var(random) = .125^2
Solve for:
var(obs) = var(true) + var(rand)
var(true), in this case, is .143^2
Knowing that var(true) is .143, to get an "r" of .50, you need var(rand) to also be .143. For that to happen, the number of games played equals 12. That is sqrt(.5*.5/12)= .144
I've collected the past 8 post-seasons' data
here and found the Variance Observed over 988 playoff games to come to .327^2.
As for Variance Chance (Rand), that's the square root of (.5*.5/(988/16)) for a Variance of .034^2. Virtually a full season of data for every squad.
Adding these gives us .1101for a Variance (True).
(IF YOU'RE SKIPPING THE MATH COME BACK HERE)
The Variance Chance (Rand) equals the Variance (True) at 20.3 games played.
In other words, it takes nearly a full three rounds of games before the influence of talent is equal to that of chance. So basically the playoffs in spite of a very small sample do permit the cream to rise, just not until the last two teams are in action. That belief that anything can happen if your team just squeaks into the playoffs? Kinda true.
So as predicted, the number of games for talent to take over is less than in the regular season - even less than the old win-loss-tie format - thanks to the lack of a third point for losing really late in a game or sucking at shootouts. However, it's still scarcely enough to prevent a team from riding luck all the way to the Final. Nor is it completely out of the question for a team to ride it straight to a trophy. A big one.
Now that we've looked at winning percentage, lets do the same with PDO, shall we?
---------------------------------------------------------------
PDO AND THE BOUNDARIES OF CHANCE
As we examined earlier in the year, PDO is an effective metric for assessing in a quick glimpse which teams are playing above or below their true level and are being thrashed by the waves of fortune. Before we jump in the deep end, lets have a couple laps in the kiddie pool. For a refresher, PDO is simply the sum of save percentage and shooting percentage. Over the course of a season, barring empty netters, for every goal that goes in a shooting percentage rises and a save percentage falls - at the end of the year if you were to add it all up across teams it would all come out to 1. PDO abhors entropy. As such, and as you'll see from the following chart from Hawerchuck at Arctic Ice Hockey, the luck in its component stats lead to a heavy tendency to regress toward the mean of 1.
FIG 1
via assets.sbnation.com
Turning to Snark SD's writing for NHL Numbers, he rather compellingly graphs this tendency and also reveals how it behaves in smaller samples:
FIG 2
via nhlnumbers.com
We see here a chart showing Standard Deviations from mean over the course of several periods of games played. As the number rises toward a full 82 game season with ever increasing probabilities, the Standard Deviation shrinks, which corroborates the findings in Tango and Birnbaum's constants for Variance(Chance) over games-played. Randomness clearly decreases as predictability (aka repeatability aka talent) rises.
Below in FIG 3 we have a graph that represents the boundaries of randomness and indicates the realms of probability that a team is playing within the statistically likely influence of luck. The white within the cone represents two Standard Deviations of PDO, with the grey representing over 2 and the hard red line a 3rd SD. Essentially, the white falls within fairly normal expectations of PDO for a given number of games played. The grey is a "danger zone" of sorts, a low probability range that can reflect either exceptional or craptastic play or a serious outlier of good or bad fortune.
This may run slightly counter to the logic you've heard applied to PDO in the past: that exceedingly high or exceedingly low are always bound to regress toward the mean. While true in most instances, those PDO proportions that exceed a certainly number of Standard Deviations are far more likely to be the cause of talent overtaking luck; of a team performing at an elevated sustainable level. Even it it is on the negative side: elevated sustainable suck. Teams that hit 3SD are, per Patrick, a 1 in 500 chance arising from a coin toss alone.
FIG 3:
via nhlnumbers.com
You will see these probabilities played out in the tables below and in the master data spreadsheet - in that 2SD and beyond rarely occurs and is typically a negative indicator one might be able to assume by matchup or seeding.
Once again, we note that Standard Deviation in team PDO shrinks as the number of games increases - ie, there's less outlying data on the fringes of the set. As a side note, for this reason it would be a mistake to compare playoff PDO, with a maximum of 28 games, to regular season PDO, with a uniform measurement of 82 games in a one-to-one manner. It is better to look at their Z-score, the number of standard deviations from the mean, to get a sense of how the team landed in proportion to its population. Kind of like looking at Relative possession metrics.
---------------------------------------------------------------
PLAYOFF COMPLICATIONS
Again, I use the above graph in FIG 3 simply to illustrate and would caution readers not to apply the findings below directly to the results of this graph. Due to the uneven number of games played among our data, the Standard Deviation for our population is actually higher for a team with 20 games played than those with 20 games played in the regular season. Those teams that got bounced early influence the overall SD, the spread of which we see is much wider at fewer GP. I in fact ran into a problem calculating SD Talent because of this, since the SD Chance winds up very strongly weighted by the teams knocked out early.To address this in my spreadsheet, I created a second set of SDs Chance and Talent I termed "SD Survivor," which restricts the population to only teams that exceeded 7GP - those who made it out of the first round. Both are available in the excel doc. The SD Talent by Team is calculated based on the Survivor number rather than the All-team figure.
Furthermore, given the lack of diversity in opponent, there will be some further skews I would expect to be attributable to "skill," particularly special teams given the minute sampling. I've not endeavored to account for external factors that might be implicit in facing the same opponent repeatedly and have kept all data to all-strength for the sake of consistency with the other source material the equations and concepts were derived from.
I would also anticipate that series would be far more prone to the influence of goaltending than the regular season, lending a further "survivorship bias" to the proceedings. Lets see if that pans out:
---------------------------------------------------------------
DATA PARTY 2013!!!1!!
Below is a table of this year's playoff combatants, showing the components of the PDO and the Z score to give a sense of where the figures fall along the plot.
TABLE 1
|
SH% |
Z SH% |
SV% |
Z SV% |
PDO |
Z PDO |
SD Chance |
SD Talent |
Boston |
|
|
|
|
|
|
|
|
Chicago |
|
|
|
|
|
|
|
|
LA
|
|
|
|
|
|
|
|
|
Pittsburgh |
|
|
|
|
|
|
|
|
San Jose |
|
|
|
|
|
|
|
|
Detroit |
|
|
|
|
|
|
|
|
Ottawa |
|
|
|
|
|
|
|
|
NYR |
|
|
|
|
|
|
|
|
Toronto |
|
|
|
|
|
|
|
|
Washington |
|
|
|
|
|
|
|
|
Anaheim |
|
|
|
|
|
|
|
|
St Louis |
|
|
|
|
|
|
|
|
NYI |
|
|
|
|
|
|
|
|
Montreal |
|
|
|
|
|
|
|
|
Minnesota |
|
|
|
|
|
|
|
|
Vancouver |
|
|
|
|
|
|
|
|
(sorry about the formatting, gang. Consider it festive decoration. For this year's data and whole shebang in a more utilitarian form, have at it here)
I've also provided the SD for Chance and Talent. You'll note that as you go down the list, ordered by wins, the SD for Talent shrinks. This is the depicted in figure 2 as the widening left hand SD distribution. SD Chance is inversely proportional to number of games played, as we discuss above by way of Tango's work with standings points.
- Boston, as noted, has the highest PDO on the scale, though it doesn't enter the danger zone. Still, being the only positive Z score above 1, this falls well within the bounds of chance and should not be considered wholly sustainable. The mitigating factor of Rask's elevated career SV% will keep it from falling to 1, but it's likely to fall somewhat, as will our shooting percentage.
- The final four teams held the highest PDOs in the playoffs. A look at this year's chart might have you believe this is routinely the case, but a survey of prior years shows this is not a consistent trend. 20% of the teams bounced in the first round held PDOs above 1 and more held positive PDO Z scores.
- A favorite: note Montreal's abnormally high PDO Z Score, exceeding 2 Standard Deviations from mean in the wrong direction. This falls within the grey uncertainty area in Patrick D's chart above, indicates either a failing on the part of talent or a credit to the goaltending and defense of Ottawa - their statistical performance lies outside that which is commonly considered luck. According to Z Scores, the goaltending sucked worse than the shooting, so this all points to CAAA-REY CAAA-REY CAAA-REY.
- Methinks Pittsburgh's shooting and New York's goaltending, both serious outlying Z scores, have something to do with one another.
- Evident somewhat in this post-season and further exhibited in prior years, the hypothesis about goaltender survivorship bias is flawed. There are numerous cases of high Z scores dropping early, likely on the basis of tight-checking, low-scoring affairs.
- With a mean shooting percentage below 8%, we're experiencing the best year for goaltenders and/or worst for shooters in recent memory, dating back to at least the prior lockout. The trend line here indicates the current high is part of a continued trend that goes back a ways
PAST YEAR FINDINGS
- What we discover is that there are very few teams whose PDO falls a sufficient number of Standard Deviations from the mean to attribute their success wholly to performance.
- The highest positive PDO Z Score belongs to 2010 runner up Philadelphia. 2008 Pittsburgh, another Final also-ran, came in a close second. These two lie the closes to the upper bounds in FIG 3.
- If we were to adjust our Z scores for PDO to only look at "survivors," PDO rises, but even stronger PDO outliers like 2011 Boston do not enter into the "danger zone."
- There is no consistent trend as to the variability of shooting percentage vs save percentage. I entered this survey with the preconceived notion that shooting percentage would be far more erratic, but the SD for SV% is higher in 2013, 2012, 2007 and 2006.
RECENT CHAMPS DATA
TABLE 2
YEAR |
TEAM |
SH |
Z SH |
SV |
Z SV |
PDO |
Z PDO |
SD Chance |
SD Talent |
2006 |
CAR |
0.101 |
0.553 |
0.911 |
0.607 |
1.012 |
0.630 |
0.0153 |
0.0274 |
2007 |
ANA |
0.904 |
0.612 |
0.922 |
0.504 |
1.012 |
0.001 |
0.0156 |
0.0036 |
2008 |
DET |
0.089 |
0.224 |
0.921 |
0.971 |
1.011 |
0.731 |
0.0158 |
0.0076 |
2009 |
PIT |
0.101 |
0.873 |
0.907 |
0.105 |
1.009 |
0.620 |
0.0142 |
0.0219 |
2010 |
CHI |
0.111 |
1.125 |
0.904 |
0.246 |
1.016 |
1.155 |
0.0158 |
0.0034 |
2011 |
BOS |
0.102 |
0.718 |
0.937 |
1.577 |
1.039 |
1.456 |
0.0139 |
0.0162 |
2012 |
LA |
0.093 |
0.511 |
0.944 |
1.096 |
1.037 |
1.771 |
0.0159 |
0.0109 |
2013 |
BOS |
0.086 |
0.696 |
0.943 |
1.246 |
1.029 |
1.286 |
0.0158 |
0.0153 |
2013 |
CHI |
0.085 |
0.645 |
0.931 |
0.727 |
1.016 |
0.871 |
0.0164 |
0.0147 |
- CAR (2006): Well that was a wacky season. A 10% team shooting percentage barely lands you a half an SD away from mean. Poor goalies (note the dip right after the lockout).
- ANA (2007): Same PDO as last year's champ in a season where this was a bit more common. No strong outliers here, they're pretty straight up the middle.
- DET (2008): Same with Detroit. They got solid goaltending in the upper quarter of the playoff pack - a tad out of character - while having a table-low in SH Z Score. Well within the luck bounds, though nothing sticking out as a culprit beyond a slight uptick in SV%.
- PIT (2009): Given the Z Score for goaltending, I think we can toss out the myth of Fleury, as if we didn't already know it. Strange that Detroit, notorious for winning with mediocre netminding, was beaten at their own game. Held the lowest PDO and PDO Z score, coupled with a high SD Talent - they are further to the right on FIG 3. Not exhibiting a tremendous amount of luck in their run.
- CHI (2010): Shot high in a year where that was unusual seems to fit nicely with the imbalance in SD Chance/Talent. It was enough to overcome decidedly mediocre goaltending. Lucky cup? We have a candidate! (trolololol)
- BOS (2011): Highest PDO of the recent champs - though not in relative terms within their playoff class and not approaching the grey area of 2 SD. Held the highest SV% in the playoffs on the back of a Conn Smythe performance from Tim Thomas, aided by a reasonably high SH%. Also it's thoroughly insane that a 10% team shooting percentage wasn't even in the 95th percentile!
- LA (2012): Cashed in their wretched regular season PDO with a well timed regression, LA's PDO Z score places them the closest to the danger zone of any team in recent memory, though still just within the boundaries of chance. A lights out goaltending performance that Quick that hasn't replicated since lends credibility to good fortune as a significant factor in this run.
- BOS/CHI (2013):TBD, though Boston's nearly even SD differential could indicate we're seeing close to their level in spite of an elevated PDO. Goaltending giving Boston the edge with a little room for regression, which is good because it is highly probable it's on the way. The Bruins currently hold the highest SV% Z Score among the winners since... the 2011 Bruins.
- None of the above experienced a PDO at an order of magnitude over 3 SD, and only a couple flirted with 2SD. The past two winners, LA and BOS, have been the highest and have been so based on playoff leading goaltending.
- Carolina, Pittsburgh and 2011 Boston were the only teams whose SD Talent overtook their luck within the bounds of the games played. Chance is still very much in play in the Final.
---------------------------------------------------------------
CONCLUSIONS
It's comforting and encouraging as a fan - no, as a human - to believe that your heroes are experiencing success on the back of sheer will and talent, the product of their own agency, rather than aided by any input of chance. Believing so makes the experience more meaningful and personally felt, the narrative more compelling. Sharing that experience is part of the attraction to sports and in aligning our hopes with the exertion of a team's effort, we frequently mythologize fortune out of the picture. Not unique in this regard, hockey's history is littered with tales of players raising their game in the clutch.
What the above shows in part is that there is not much conclusive indication over the past several years that the championship is won in any overriding fashion by the mettle and valour of the victor. All fall well within the boundaries of statistically chance, with a couple of "weighted die" in the form of elevated shooting or save percentages, some sustainable others not, that clearly outpaced the norm. All in all, skill doesn't often win the day.
With a mere 20.3 games for the Variance of Talent to take over, the playoffs provide scarcely enough runway to eliminate pure randomness from its outcomes. Frankly though, this is part of the beauty of the sport; the fact that outside of exceedingly large samplings one really can't predict much about how events will unfold. The Presidents' trophy may in fact be a better measure of a team's merits from a statistical point of view (at least were schedules played in a balanced fashion), but I for one would far rather watch a team duke it out over seven or fewer games for Stanley Cup glory, even if that's an inconclusive way to evaluate the true talent level of a team. Tonight and beyond, there will be lucky bounces, ridiculously uncharacteristic performances and dreadfully timed falls from grace. And that's why we watch, whether you want to believe it's skill or chance. Ultimately both play a role, though never in the just, orderly manner we often strain to attribute to the game.
That's an awfully lot of words to say, "screw probability, lets enjoy some hockey," but I for one am comforted to know that there simply is no knowing what lies ahead.