In the previous blog I talked about how consistent MLB batters were in back-to-back seasons. Here I want to present the step-by-step process to do this type of analysis.
Step 1: You have to have an algorithm to evaluate team performance that allows you to take the impact of various team statistics such that you can evaluate player performance. For MLB batters I use Runs Created and for NHL Goalies I use the marginal value of a goal against.
Step 2: You have to have an algorithm to evaluate player performance, whether it is Runs Created, Wins Above Average, WP48, QB Score or another metric. I will assume that you have a method to evaluate player performance.
(I assume that you have the team data from step 1 to calculate the marginal value of the team statistics and that you have used those marginal team values to calculate the player performance measure in step 2. Once this is done then we can get to step 3. If not - see the links above to replicate the analysis for either MLB batters or NHL goalies).
Step 3: I use a cut-off point for the players (but this is not required). If you use a cut-off point such as minimum AB's for MLB batters, sort the data and eliminate the data that does not meet your minimum requirements. Then re-sort the data by season and by your player performance measure from highest to lowest.
Step 4a: Calculate the order from 1 to n (n being the lowest player
performance number for that season). If the order is in column A, then in cell A2 type in a 1, and in cell A3 =A2+1, and copy the formula and paste down for each player in that season. Repeat for each season.
Step 4b: Calculate the rank for each player Calculate each quintile (range of 20%) for each season. In Excel I use a formula like this: =IF(A2/$A$438<=0.2,1,IF(A2/$A$438<=0.4,2,IF(A2/$A$438<=0.6,3,IF(A2/$A$438<=0.8,4,5)))), where A2 is the highest ranked player by number (i.e. 1) and A438 would be the last player for that season.
Step 5: Sort the data by player name and player year (lowest to highest).
Step 6: Create a formula such that only the same player will be listed if they met the player performance criteria in consecutive seasons. Thus for each variable you are interested in, I use the following formula in Excel in row 3: =IF($D4-$D3=1,IF($C3=$C4,B4,""),""), where column D is the column that has the season played and column C has the players name and column B has the players rank. Since you are looking at row 4, that should be the next season. If it is not, then the formula above reports a blank cell. Copy and paste for all the players.
Step 7: Calculate consistency (and near consistency) as follows: =IF(I3="","",IF(I3=B3,1,0)) which says that if the column with the next years rank (column I in this case) is blank, to leave it blank, otherwise to put a one if they are the equal and a zero if they are not equal. For near consistency I use the following formula: =IF(I3="","",IF(I3+1=B3,1,0)) if I want to know how many have increased their rank by one and =IF(I3="","",IF(I3-1=B3,1,0)) if I want to know how many have decreased their rank by one. Copy and paste for all the players.
Step 8: Finally I calculate the percentage of players who are consistent and near consistent and then you can find out how many are not even nearly consistent from one season to another.