What Factors are most important to team success in the NFL? (Econometrics)
- Joe Bertolami
- Dec 23, 2020
- 11 min read
By: Joe Bertolami & Edward Martini
1. Introduction:
From an economic perspective, the NFL and football overall are very intriguing to examine. There is a cost and a benefit to every single play that teams must consider throughout the course of a game. Our paper seeks to find the factors that contribute most to success for franchises in the NFL. In order to understand how success is obtained, it is important to look at how each individual game is won, as that is the ultimate goal for every team. Winning is accomplished by scoring more points than the opposing team, but there are many different variables present that lead to points being scored. It is definitely one of the more complicated sports and is often regarded as the most strategically complex sport in America.
There is certainly a “cause-and-effect” phenomenon that occurs in a football game, as the result of points being scored on offense, are “caused” by gaining yards up and down the field through both running and passing the football. Points can also be scored on defense or special teams, but this is much more difficult and uncommon throughout a game. However, in our model we have included defensive variables such as turnovers and defensive/special teams’ touchdowns. Each team is unique in their own way based on the personnel that exists on the roster. Having some prior knowledge of the NFL is helpful in conducting a study like this, but it is prevalent how different team strategies can be just by looking at the dataset. Teams with a more talented offensive line and running backs such as the 2016 Cowboys stand out on the dataset, as they scored many points that year while rushing for an abundance of yards. On the other hand, a team with a more talented quarterback and receivers will find more success in scoring points by passing the ball. Each team has its own strategy, but the goal remains consistent throughout the whole league; to score more points than the opposing team.
The question that we are seeking to explain is which factors are most important towards team success in the NFL through econometrics. The NFL is becoming increasingly more data driven, especially over the course of the past 10 years, which our study focuses on. Teams are hiring more and more people to their data analytics departments, as it is proving to give a competitive edge. A 2020 article from the Washington Post does a deep dive into the world on NFL data and statistics, showing just how much more prevalent it is becoming across the league. One example the article points out is that the 2019 season consisted of teams trying more fourth-down conversions in the history of the league, at 595. This was a statistically driven decision, as teams used data and statistics to determine the expected value of each play, with the goal being to maximize points scored. In turn, this led to teams going for it more on fourth down. With that said, we strive to use our regression model to explain as much about team success as possible. With data becoming increasingly more prevalent in all of sports, the findings from our model can have real significance towards team decisions and strategy.
2. Data
The data we are primarily using to conduct our study is titled “NFL Team Stats 2002-2019” and is found on Kaggle. This dataset contains a multitude of statistics from thousands of NFL games, spanning from 2002-2019. However, we have transformed the dataset to include just the previous decade, so we will be looking at the 2009-2019-time frame (with a few games form 2020 as well). We want to be as current as possible with our research, so that is why we decided to just focus on the previous decade in the NFL. This is a very meaningful dataset for what we are seeking to find, as it includes many variables that will affect how many points are scored. Some of these variables include both passing and rushing yards, penalties, time of possession, and of course, points. Point differential is our explained variable throughout our study, as this is the factor that determines wins or losses. With so many different factors affecting points scored, it is very interesting to look at what most contributes to this result.
Table 1 shows the descriptive statistics of variables we have included in our regression model. There are certainly many variables we are including because as was explained in the intro, there is so much that can affect a team’s point differential. As you can see in the table, the mean, or average passing yards is significantly greater than that of rushing yards. While this is true, it is also true that teams, on average, will pass more than they will run, as the att_home variable displays passing yards. Passing yards and rushing yards also vary in their minimum and maximum observations, as both the minimum and maximum values for passing yards are greater than that of rushing yards. This makes a lot of sense, as teams attempt to pass more, which will result in a greater yard gain. Teams are typically in the lead when they run the ball often, as this is a good way to run out the clock. The data for passing yards is also the most spread out and not as clustered around the mean, with a standard deviation of 78.75. Looking at other variables such as fourth down conversions and defensive/special teams’ touchdowns, it is important to note how rare these are. This variable has a minimum value of 0, while also having a mean value of less than 1. We feel that this variable, while being rare, will have a strong positive correlation to winning.
Getting a defensive touchdown especially is something that not only scores a team points but can completely change the momentum of a game. Fourth downs and fourth down conversions are another set of variables that we described in the table. As was mentioned in the intro, teams have been increasingly risky in going for it on fourth down. While this can prove to be a successful strategy, the statistics still suggest this is not always the case. Fourth down conversions have a mean value that is less than half of the mean value of fourth downs. The median value of fourth down conversions is also 0, while fourth downs is 1. So, while this can be a successful way to score points, it is not something that teams are always willing to risk. There are certain situations where this is advantageous, but it is often a better option to either punt on fourth down or attempt a field goal. These are a few of our most relevant descriptive statistics that we feel are most important to our model, which will be explained further in our next section. The full table of mean, median, standard deviation, and minimum/maximum values is displayed below:
Table 1: Descriptive Statistics

Empirical Discussion
The explained variable is point differential using the home team as the base. The model will attempt to find how the explanatory variables affect the home teams point differential.The following equation is our statistical model:
score diff = β0 + β1passing yards home + β2comp perc + β3comp perc * passing yards home + β4att home + β5att home+ β6rushing yards home + β7rushing attempts home + β8rushing attempts home+ β9fourth downs home + β10fourth downs conv home + β11fourth downs home * fourth downs conv home + β12 sacks home + β13turnover diff + β14def st td home + β15penalties home + µ
Here we have two interaction terms. One for completion percentage and one for fourth down conversions. Completion percentage, that is, passing completions divided by passing attempts is a good measurement of passing efficiency. However, without an interaction completion percentage would be interpreted as an increase in score differential if passing yards is zero which does not make sense. Therefore, after interacting the variables our new interpretation would be that completion percentage’s effect on score differential depends on how many passing yards are thrown. Further, fourth downs would be interpreted as the number of fourth downs the offense attempted if they converted zero of them which obviously does not make sense. After the interaction effect, the new interpretation is that the number of fourth downs attempted has an effect on score differential depending on how many they converted. Moreover, we used two quadratics one for passing attempts and one for rushing attempts. A recent discussion between football analysts has to do with the effectiveness of running or passing more as the game goes on. Running the ball more later in the game with a lead is an effective way to run out the clock and keep the lead. Also, passing the ball is seen as a better way to get ahead early in the game or come back from a deficit. We want to include this effect in the model to measure if this is in fact the case in our sample.
Regression Results

n = 2,946 = 0.6755 Adjusted = 0.6738
Clearly, we have a lot of relevant variables, all of them in fact when considering p-values at 5% significance. We will start with the most relevant variable which appears to be turnover difference which is simply the home team’s turnovers subtracted by the away team’s turnovers. According to our model, an additional turnover decreases the point differential by 2.82 points and is the most significant variable with a -30.56 t-statistic. Further, considering that the average score differential was only 2.23 points, having a negative turnover differential, even if it is only by one will likely cost you the game.
The next highest significance was the defensive and special teams’ touchdowns. According to our model, one more defensive or special team’s touchdown increases the score differential by 2.38 points. Again, considering this with the average point differential means that just one touchdown from the defense or special teams will likely be the difference in the game.
A surprising result was the negative coefficient on fourth downs. Our model suggests that going for it on fourth down decreases the point differential by 2.89 points for the average team. This is the partial effect calculated using the average number of fourth down conversions and after reparametrizing gives a t-statistic of -12.247. There is a problem with this result which will be discussed in the conclusion.
Another very significant variable was rushing attempts per game. Our model predicts that one more rushing attempt will increase score differential by .95 points and has a t-statistic of 7.9. However, since more rushing attempts should suggest that the team is winning, the variable we are more interested in is rushing attempts squared. Rushing attempts squared is -0.007, meaning that its shape is parabolic. So, as expected, running more has a positive effect on score differential but at a decreasing rate. However, the turnaround point is 63 attempts which is very unrealistic in any football game which means that we can ignore the turnaround point.
Another very interesting variable we will discuss are penalties. According to our model, one more penalty is expected to decrease score differential by 0.45 points and has a t-statistic of -7.895. This is very interesting that penalties have such a big impact in our model. It only takes a few penalties to negatively impact the score differential and there are 6.27 penalties a game on average. Therefore, it pays off to have a very well-disciplined football team that commits fewer penalties.
The remaining variables are passing yards, completion percentage, rushing yards, and sacks. Passing yards are estimated to have a 0.0445 increase in score differential for every one yard thrown. Completion percentage for the average team is estimated to increase score differential by 17.31 for every 1% increase and has a t-statistic of 8.76 after reparameterization. Every rushing yard is estimated to increase score differential by .0313 and has a t-statistic of 6.786. Finally, every sack is estimated to decrease score differential by 0.573 and has a t-statistic of -5.62.
Conclusion
In summary, the most interesting findings were the effects of turnover differential, special teams/defensive touchdowns, fourth downs, and penalties. These are areas of the game that are often overlooked and according to our model are very important to the outcome of a game. Other variables such as passing yards and rushing yards are also very important when trying to describe score differential, just not as much. Rushing attempts unsurprisingly had an increasing effect on score differential but at a decreasing rate. The inflection point was at 63 attempts which means that practically speaking, rushing attempts have an increasing effect on score differential at a very small decreasing rate. Further, fourth down attempts had a negative coefficient which suggests that more fourth down attempts negatively affect score differential.
Implications of these findings would interest coaching staffs and general managers. What is often called the third phase of the game is special teams and penalties committed. They would be interested to know that those two variables can greatly affect score differential. Those variables are directly a result of the coaching staff keeping the team disciplined. Further, it would surprise no one that turnover differential is very important. However, it may surprise everyone that it is the most important variable that had the most statistical significance in explaining score differential. This may lead to teams signing quarterbacks that throw less interceptions even though they may throw less yards and less touchdowns. A player like Jameis Winston comes to mind who is a very exciting quarterback that threw 33 touchdowns in 2019 but also threw 30 interceptions. Another interesting implication is that teams might not want to go for fourth down conversions as much. Recently a lot of discussion around the NFL is that teams should attempt fourth down conversions more often which is the opposite of our conclusion. Also, completion percentage has the highest estimate that affects score differential depending on how many yards are thrown (again, we used the average yards thrown per game). It is not very surprising that more accurate quarterbacks win more games, but it certainly reinforces the principle.
There are some very significant problems with our data that need to be pointed out. First, there are many missing variables such as field goals. This dataset has no field goal data which is very problematic because many games are decided by field goals and it is one of the main ways of scoring points. Also, we are missing some defensive statistics such as yards allowed, and field goals allowed. Also, we lack the ability to use time of possession in our regression which we feel would have a big impact on winning games. Another big variable that is missing is average field position which likely has an effect on score differential. Finally, there was no injury data in this regression. Injuries can have a huge negative impact on even the best teams such as the San Francisco 49’ers this year who went from a Superbowl appearance to a losing record. Without these important variables, the adjusted is 0.67 which means that we are only explaining 67% of the variability in score differential.
Also, there are some issues with the data we used for our regression. The fourth down attempts variable has a negative estimate. This may be true, but we are missing the distance needed to gain the first down. Further, we are missing some in-game context. For instance, if a team is down by 20 points and is desperate to gain a first down and repeatedly go for fourth down conversions, it will negatively affect our model. Moreover, completion percentage had a high standard error. After noticing this and running a confidence interval we can conclude that it is statistically significant, however, it had a large interval that suggests the variable is imprecisely estimated. Lastly, there is a lot of collinearity between these variables which make it harder to determine if the independent variable effects the dependent variable directly. Seeing how every variable is significant, it is not possible to capture the most realistic effect of each variable on point differential.
Next, we would choose to study the draft, free agency, and team worth in the NFL. So much value is put into high draft picks and it seems that finding the best players in the draft is random. For instance, a few of the best quarterbacks in the league like Tom Brady and Russell Wilson were 6th and 3rd round picks, respectively. Further, the same teams tend to stay toward the top of the draft getting the best picks and not winning more. Further, Professor Donn Johnson suggested that free agency may be a lemons market where teams tend to overpay for less effective players looking for a new contract. Also, it would be very interesting to see how much winning percentage effects a team’s net worth. There are many interesting areas to study in an NFL that is becoming increasingly data oriented. More and more front offices are relying heavily on analytics when trying to put together the right team and strategy to win games.
Appendix
Data from Kaggle, NFL Team Stats 2002-2019 (ESPN) https://www.kaggle.com/cviaxmiwnptr/nfl-team-stats-20022019-espn
References
Fortier, Sam. “The NFL's Analytics Movement Has Finally Reached the Sport's Mainstream.” The Washington Post. WP Company, January 17, 2020. https://www.washingtonpost.com/sports/2020/01/16/nfls-analytics-movement-has-finally-reached-sports-mainstream/.
Comments