clock menu more-arrow no yes mobile

Filed under:

The Numbers Game: Finally, Meaningful Data!

At a quarter of the way through the season, player roles and overall performance are beginning to come into focus. But why should we consider such a small sample useful?

Lets get to it, pardner
Lets get to it, pardner
Paramount Pictures

Oh boy oh boy oh boy!


It's the moment we've all been waiting for.

That's right...

It's time for in-season numbers to MEAN SOMETHING!

Holy crap, it's like Christmas morning and unwrapping a brand spanking new TI-81!

We're 20 games in, the quarter point of the season, and we can finally start to evaluate teams and players with some certainty that our facts and figures aren't wacky aberrations from small samples!

So let's dig in and see how our overall performance should look going forward and how our team in action is constructed.



"But hold on mister statsy pants," you might be saying. "Isn't 21 freaking games a small sample and don't we disapprove of those, you cherry picking jerk! Who are you to say when we should and shouldn't value a number!" Well, voices in my head,


but I've chosen to take a peek at the quarter-season mark for a reason .

(if you want to get to the meat of evaluating team possession, skip to the "WHAT NOW?" section)

Well, first up, lets see what happens to a team's Corsi as the season progresses. Looking at team percentage over the past couple seasons with the help of Extraskater's handy dandy fancystat gamelogs, we can identify that the standard deviation game-to-game for all teams in the league is 8%. So generally, the fluctuation is pretty subdued. Two standard deviations in either direction comprise the range that 95% of our population will fall into - this is called the confidence level - so a hypothetical 50% team will largely fall 16% on either side of the mean, giving us a confidence interval between 34% and 66%. There'll be a game or two here and there above and below the confidence level, but they're pretty rare outliers by definition. Here's a chart showing the distribution of a 50% team's 10,000 game simulated game-to-game CorsiFor%.


Most of the time, thanks to the low standard deviation, they'll be turning up in a tightly packed bunch in the 45-55 range. Given the probability of landing in that territory, as the season grinds on, more and more games will turn up in that range. While individual games will essentially follow this probability curve, the cumulative figure will gradually get closer to the true performance level of the team. The progression of our hypothetically average team towards its perfectly average mean depicted below starts off pretty elevated, but tapers rapidly.


It's pretty common sense that the more games you're adding together, the less an individual game is going to impact the total. And hey! It just so happens that the leveling-off point starts to happen around the 20 game mark.

As Michael Parkatti at Boys on the Bus showed in greater detail in an study of individual player Corsi, the confidence interval narrows, forming a funnel closer to the mean as more games are played. Below he shows this progression for a hypothetical 50% player with a 95% confidence interval and boundaries of chance on either side.



In essence, the further along we get, the more certainty we have that the cumulative figure will fall in a smaller and smaller range. As in both of the above, the Bruins' are bearing out the trends we're seeing, and we can see the leveling off with less erratic peaks and valleys as we hit the 20 game mark:


This behavior and the persistence of the statistic underlies why J Likens of Objective NHL found 20-game Corsi to be the best predictor of goal ratio and winning % within a season.



Per the above, we're going to stick to Corsi for the time being - you've probably heard FenwickClose bandied about as the best predictive measure of a team's performance, but given the subdivision of data into score conditions and omission of a class of shots (blocked shots), it's best reserved for full season data pools. We'll be looking exclusively at the better intra-season indicator, all-score 5v5 Corsi.

The Bruins over the past few games have begun to hover around the 52% mark, presently landing at 51.95% CorsiFor. At the moment, this places them 10th in the league and at their lowest point since the 08-09 season - the success of which rode on obscene shooting percentages. File this under news under "merely ok." The team is neither in danger of getting dominated, nor themselves dominating, but the territorial advantage is still present. It should come as some kind of small consolation that we rank 4th in the East, and that behind a couple of teams who are going through some pretty serious PDO woes.

As with the overall team possession number, the relative reliability of our 20-games-in possession figures - as shown in the funnel plot above - means we can now clearly read intended player usage, where before individual game oddities and an imbalance in home-and-away matching advantage gave some screwy results. So lets take a gander at our usage charts and see who's getting the job done and how.


(as per usual, sourced from Greg Sinclair's usage chart app. As a refresher, Relative Quality of Competition is plotted vertically, zone starts horizontally, and the bubbles represent RelativeCorsi. The dot size is how positive or negative that Corsi figure is, with Blue indicating positive and Red negative)

It may stun and amaze you to hear, but Claude Julien doesn't change much season to season. Shocking I know.

The biggest alteration from past outings is that Kelly isn't being given the toughest of the toughs anymore. He's out there for the most defensive zone starts on the forward corps yet again, but isn't typically getting assignments against top lines - that's still falling to Bergeron, which is something Julien had been trying to avoid through Kelly's usage in the past.

Meanwhile, the third and Merlot lines are being given essentially equal QoC - at least close enough as to not make any appreciable difference. I'll spare you the Merlot line screed, but it's pretty self-evident that they're not holding up their end of the bargain, Shawn "Snipes" Thornton included. Out of the whole enchilada, only Reilly Smith has his head peeking above water.

The newly-Iggy'd 1st line is being used in precisely the same manner as last year, replete with the usual zone-start sheltering for the two wingers. I'd say this is working out on the scoreboard thus far, but from the indication of their very modest possession success and a quick glance at their PDO, they're getting by on high shooting percentages rather than outright overwhelming the competition. Judging from the totals of CorsiFor and Against events they seem content to trade chances - good thing Tuukka's on top of his game.

It will surprise nobody to hear that the Bergeron unit receives the top RelativeQoC by a good margin. Collectively, they're rocking their deployment, crushing their competition. Marchand's bounced around a bit - we'll leave the causes for his performance to another day - and earlier charts show that he'd maintained his possession strength in third line duty while Smith has suffered from his stint down a unit. Still, he does bring up the rear for the trio.

What should jump out to you though most of all with this line's usage is that Loui Eriksson is getting such offensive zone sheltering. For all the hype of his defensive acuity, he's being kept in the offensive zone to an extent that makes Nathan Horton blush. It should be mentioned that prior seasons in Dallas saw him middle of the pack in both QoC and starts, with third or fourth PK TOI, so he's never been the defensive horse he's often purported to be. Still, he's looking damn fine where he is. Which all begs the question: can he be as good as he's been with his current deployment if he were used in more defensive situations?



And this seems like a good moment to pull over for a sidebar about adjusting Corsi for zone starts, since the Bergeron line presents a plum case study. It's pretty intuitive stuff that a guy who doesn't have to slog through the defensive and neutral zones to get at the net is going to have an easier time generating offense and see less of the opponent's, but we are able assess approximately how much impact that advantage gives a player. We could pretty well look at Bergeron's huge blue dot and Eriksson's huge blue dot and assume that, all things being equal, Bergeron's performing better, but by calculating the differences with zone-adjusted Corsi we can get a more nuanced picture of how much Bergeron is penalized by deployment and how much advantage his linemate gains.

Vic Ferrari at Irreverent Oilers Fans first observed that a strong majority of NHL goals are scored by the team with an offensive zone faceoff, and further extrapolated that there was a zonal impact on a player's Fenwick. From there, several bloggers devised means of quantifying the advantage gained by starting in the offensive zone. JaredL at Driving Play used a regression analysis (which you may recall us covering a use of in our recent post on PDO) of the league's zone starts to find the average adjustment for a given percentage. His plot yielded a slope of .18 - put simply, for every 10% shift away from 50% O-zone starts, his calculations found we should adjust a player's Corsi/60 by 1.8 events.

His simplified zone-start adjustment is calculated as follows (don't worry, equations won't bite):
Zone-Adjusted Corsi = Corsi/60 - (Ozone%-50)*.18

By applying this method, we can look at the Bergeron unit and get a clear picture of the impacts of their respective zone starts. If you'd like to use BehindTheNet's rate figures, you can calculate zone-adjustment just as shown in the Driving Play post. Below I've adapted JaredL's method to convert the figure from rate-stats to percentages in order to remain consistent in how we've been depicting our figures, and with how they show up on that newfangled stat site.

Player OZone% CorsiFor% Zone-Adjusted CorsiFor%
Loui Eriksson 61.2% 62.9% 60.8%
Patrice Bergeron 42.3% 59.9% 62.5%
Brad Marchand 53.3 % 54.2% 54.0%

(Full spreadsheet available here)

As we see from the table, the impact isn't perhaps as extreme as one might assume, but it does indeed re-rank Bergeron ahead of Eriksson due to the inherent disadvantage of the former's deployment. This doesn't tell you that Eriksson isn't kicking ass and taking names - he is moreso than anyone NOT named Bergeron - just that it's fewer asses and names than his all-zone linemate.

Chris Kelly, by the way, sees the most significant benefit from zone-adjusting, moving from 45.7% to an almost good 49% zone-adjusted CorsiFor. (you're welcome, Ecozens)




On the blueline, there's a bit more volatility from past deployment, likely due to an influx of youth, but the breakdown of defensive usage is still pretty clear from one glance. In spite of considerable pair juggling, the intent of deployment patterns are obvious: Chara and Boychuk get the top lines and d-zone usage, Seidenberg and Hamilton are out against the middle of the pack with Seidenberg offering some spot usage as a defensive zone specialist with McQuaid (As their blood red balloons suggest, the logic of this should be in question...). Lastly McQuaid and Krug get the bottom of the barrel.

The significant year-over-year change here is that Seidenberg is drifting to the middle, having been replaced with Chara and Boychuk in the toughest minutes. It's quite encouraging to see Boychuk performing this well under the circusmtances - not even having Chara as his all-the-time partner either. As for Seidenberg's movement, I for one am in favor of giving some guys with some break-out ability more starts in the defensive in his stead so that we don't spend as much time there after the faceoff. I'd doubt that is the intent though, given the newfound trust in McQuaid.

Torey Krug, standout story that he is, should be considered with the context depicted above. He's seeing the defensive zone less than any player on the roster including Thornton and offered comparably meager opponents. I'd say Claude's using a neophyte offensive defensemen the exact right way, but said deployment should also come with tempered expectations about his ability in more demanding situations. Presently Dougie Hamilton is outperforming him in tougher usage as shown above, and Krug with one just more secondary assist at 5v5 as well in spite of softer competition.

(And in another note on the youngsters, Bartkowski's bubble size and placement aren't as reliable as those of the rest of his team, having played in under half the games of his comrades. Expect him to move around a bit as the games grind on.)

Once again, if we consider zone adjusted Corsi, we should be bowing down and kissing the boots of Zdeno and Johnny, in spite of their modest offensive offerings. Chara sits at a zone-adjusted CF% of 56.4%, Boychuk above 54%. On the other end of the spectrum, Krug falls to nearly even CF% once his starts are factored in.


There's still plenty of hockey to be played, but with the amount behind us we're beginning to have a pretty firm - and statistically significant - idea of what we have on our hands and how those players are being put to the test. Obviously this isn't a holistic picture of the team, but the above does give a tidy quarter-mark glimpse at overall performance and individual usage - and how to better compare players within the context of their roles.