Is A Two-Goal Lead “The Worst Lead” In Hockey? A Statistical Analysis

Background

Hockey is a sport that carries many superstitions. From sitting in the same spot in the locker to putting pads on in a certain order. One of those superstitions, which we will dig into here, is the belief that “A 2 Goal lead is the worst lead in hockey”, especially when heading to the locker room at the end of the 2nd period. While this is more nuanced advice than, “If you score more goals than the other team, you win 90% of the time” (yes, this was actually said by a coach), the question remains, is this true?

A believer in this statement might argue, “A 2 goal lead creates an unjustified sense of complacency. The other team may capitalize on this and score an easy goal, leading to a momentum swing and overtake the team that had the 2 goal lead.” Let’s see how true this is!

How I Will Cover This

  • I will start by using historical game results to determine if a 2 goal lead is the worst lead in hockey.
  • After this, I will make a table of the conditional odds of suffering a comeback defeat that considers not just the lead but the actual number of goals by each team and home-ice advantage.
  • I will follow this approach with statistical learning (machine learning, if you prefer) models to predict the odds of suffering a comeback defeat.
  • Finally, I compare the performance of the conditional odds table and the statistical learning models on their ability to correctly predict the leader suffering a comeback defeat based on the game situation.

The Data

To address this question, we will use NHL data from 2010 to 2020 covering approximately 15,000 games. I limited the data set to games where the lead at the end of the second period was no more than 4. The data is originally from https://www.kaggle.com/martinellis/nhl-game-data.

Question 1: Is a 2 Goal Lead The “Worst Lead In Hockey”

According to this data, a 2 goal lead is not the worst lead in hockey going into the third period. A 1 goal lead is the worst lead in hockey, followed by 2, 3, and 4 with decreasing risk of losing. A plot representing this trend can be seen below. The darker part of a bar represents the portion of times a team with a given lead (on the x-axis) entering the third period ultimately lost the game. A wider block (across the x-axis) represents that the situation happened more frequently, regardless of whether the leader ended up winning or losing.

So What is The Worst Lead in Hockey?

Let’s start by simply considering the scoring combinations leading into the 3rd period and pair it with the percent of times the leading team ends up losing. I also include the percentage of games from this data set that actually was in that scoring situation at the end of the 2nd / start of the 3rd period. We see one dominant trend at the top of the data frame which is among the one-goal leads, the higher the total scoring, the higher chance the leader ends up losing. This trend breaks down a bit getting into the 2 goal lead scenarios.

If we add one more layer of complexity, home-ice advantage as an interaction effect with these scoring situations, we can further stratify the odds of the leader ending up the loser. It appears as a general rule, being the away team makes it more likely to end up losing a game when you have a lead going into the 3rd period.

Conditional Odds From Historical Games To Predict The Outcome of New Games

Some curious readers may take the next step and ask, “How much could we trust these probabilities to make decisions about game outcomes in the future?”. Well, that’s what we’ll explore now. We will split the data into a train set before 2019–06–01 and the test set which is the games after 2019–06–01. The train data set has 11,480 games and the test set has 3,752 games.

We will make the conditional probabilities (including whether the leader is home or away) based on the train set, and then transfer the probabilities from the train set to the test set. We will transfer the odds based on the game in the test set having the same situational conditions as the train set.

After we transfer the odds, there are a plethora of ways we can evaluate how good these odds are at predicting comebacks versus random chance. I will briefly explain 2 of these methods here, which will then be used to evaluate the statistical learning models as well.

1. ROC-AUC: The receiver operator characteristic area under the curve (ROC-AUC) considers true positive and false positive rates across predicted probabilities. A ROC-AUC of 1 represents a perfect model and a ROC-AUC of 0.5 is only as good as random chance. The bigger the ROC-AUC the better the model. A perfect model should usually lead to some suspicion in real-world use cases, however, and it is possible the model has data leakage. In addition to wanting the largest AUC as reasonably possible on the test data set, you also want the train and test to be as similar as possible to show performance is consistent.

In the example below, the test has an AUC of 0.712 and the train has an AUC of 0.736. This is somewhere between perfect and better than random chance. The model seems to be relatively robust based on the similarity of train and test performance. Let’s wait to see how it compares to our statistical models later on.

2. Lift and Gains Plot and Table.

This may be a little more interpretable for the business minded. The lift shows how much better our model is at predicting comebacks than random chance, from what it predicts as most likely to be a comeback to least likely. When we look at the black line on the plot it shows a cumulative lift of about 200 for the games it predicts most likely to be a comeback (far left side of the plot). This can be interpreted as 2 fold better than random chance. The cumulative lift decreases as it works toward the right and lower probability predictions are included, until eventually all predictions are included on the right side of the graph, at which point the performance is only as good as random chance. If you wanted to cash in on the 2 fold improvement, you would only believe games that were predicted with the highest probabilities of being a comeback. The red line shows how many of the comeback games are captured by prediction. About 80% of comeback wins are captured in the top 50% of predictions ranked by predicted probability of being a comeback. This is better than the random chance of 50% at 50%. While this performance is good, is it good relative to statistical learning models we could build?

Statistical Learning Models

More sophisticated than taking the average outcome and applying it to future games, statistical learning models can find the most significant relationships and interpolate to new combinations of conditions in ways that the conditional odds table cannot.

The first model I will use is logistic regression which trains a model to give a probability that a given game will end in a comeback loss or not. I use “diff” to represent the lead, “lead goal” to represent how many goals the leading team has, and “leader” to represent, whether the team was home or away.

We can see that all of these variables are statistically significant having a p-value less than 0.05. Further, we can determine that lead and being the home team have a negative association with suffering a comeback loss. The more goals you lead by the less likely you are to lose. Being on the home team makes you less likely to lose. However, the more total goals you have, all else being the same, the more likely you are to lose via comeback.

Now that we’ve interpreted the model, let’s check the performance. The test had an AUC of 0.717 and the train had an AUC of 0.734 which is better than the conditional odds table performed, but just barely.

The logistic regression has an almost identical lift compared to the conditional odds table.

The logistic regression beat the conditional odds table in performance. Let’s see what a tree-based method can do. This method is quite intuitive but also powerful in its ability to use interaction effects. Think back to those “Should I have a cookie” flow charts if you haven’t seen these used in this context before.

Feeding the same information to the tree model as I did the logistic regression, it only found 2 variables to be important at its default setting. Those importances are if the lead is greater than or equal to 2, and being the home team in an interaction.

To use the tree just consider the condition of your game and if it meets the rule go left, if not go right. The top number on each node represents the probability that the leading team will lose. The percent below that tells how many observations (% wise) from that dataset are considered in that node.

Here we see slightly worse performance than the new leader, logistic regression.

Again we see the tree provides slightly less lift than the logistic regression and also is less smooth due to its discrete breakpoints.

Current Best Model

In terms of performance, the logistic regression is the best model to predict comeback wins by the metrics considered here. However, the conditional odds table or tree model may be preferred by some based on their simplicity and ease of use.

Potential Improvements

  • Feature Engineering
  • Further features could be added to lead, absolute goals, and home ice. A few off the top of my head are team rank differentials, frequency of an opponent’s come from behind wins, average goal differentials, travel time from the home state, and that list could go on.
  • Data Cleaning
  • Further validation of this publically available data could clean out potentially misleading data points and further refine the models for use in the real world.
  • Models
  • More sophisticated (yet generally less interpretable models) like xgboost, random forest, rulefit, svm, mars, rotational forest, etc. could be used in an attempt to amp up performance.

Conclusion

  • A 2 goal lead is not the worst lead in hockey
  • Being a home team and having a bigger lead is beneficial in avoiding comeback losses going into the third period
  • At a given lead, having more total goals is not beneficial for the odds of maintaining a lead and winning
  • Of the models tested here, logistic regression performs the best by AUC and provides ~ 2 fold improvement over random chance when it predicts a comeback game with its highest probabilities.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: