In a previous post, I calculated the regular season performance of NHL goalies. In that calculation, I needed to know the what the marginal value of a goal against equaled. Here, I will go through a step-by-step process of how to estimate the marginal value of a goal against in the NHL. So here it is.
Step 1: Get the NHL Regular Season Standings Data. I got the NHL regular season standings data from Hockey Reference since the data copies and pastes nicely into Microsoft Excel. I choose to include the 1995-1996 to 2011-2012 regular seasons (minus the 2004-2005 season that was cancelled). As you are doing this, make sure that you know which season is which, as that will help if you are going to follow step #2. My choice of which seasons to include is somewhat arbitrary; in our NHL goalies paper published in the Journal of Sports Economics we used a longer time period.
Step 2: Create Dummy Variables for each NHL Regular Season. Please note, you can skip this step and still estimate the marginal value of a goal against. I choose to do this, since the overall fit of the model is substantially better given the NHL has changed the formula it uses to calculate the dependent variable in the regression step. In case you are wondering, a dummy variable is a variable that is equal to 1 if the condition is true and a zero otherwise. To do this insert a column (I choose to the right of the standings data) and title your dummy variable. I titled the dummy variable for the 2011-2012 season d2011. Thus for the new variable I created titled d2011 I set it equal to 1 for the rows of data for the 2011-2012 season and set it equal to zero for all the other rows of data. Then I repeated this process for each of the seasons, making sure that I only have the number 1 for the years corresponding to the dummy variable. If you set up the dummy variables in order, then in the spreadsheet you should have 1's that are moving downward to the right when looking at each season.
Step 3: Run a Linear Regression. I choose to do this is a statistical package called Stata. I also controlled for heteroskedasticity in the regression estimation by using robust standard errors. Here is the linear regression, where pts is the dependent variable, gf = goals for, ga = goals against and dxxxx is a dummy variable for each season included:
pts = f(gf, ga, d2011, d2010, d2009, d2008, d2007, d2006, d2005, d2003, d2002, d2001, d2000, d1999, d1998, d1997, d1996, d1995)
After running the linear regression, I estimate that the value of a goal against in terms of NHL regular season standings points is equal to -0.340430.