Player Analytics Fundamentals: Part 2 – Performance Metrics

I want to start the series with the topic of “Metric Development.”  I’m going to use the term “metric” but I could have just as easily used words like stats, measures or KPIs.  Metrics are the key to sports and other analytics functions since we need to be sure that we have the right performance standards in place before we try and optimize.  Let me say that one more time – METRIC DEVELOPMENT IS THE KEY.

The history of sports statistics has focused on so called “box score” statistics such as hits, runs or RBIs in baseball.  These simple statistics have utility but also significant limitations.  For example, in baseball a key statistic is batting average.  Batting average is intuitively useful as it shows a player’s ability to get on base and to move other runners forward.  However, batting average is also limited as it neglects the difference between types of hits.  In a batting average calculation, a double or home run is of no greater value than a single.  It also neglects the value of walks.

These short-comings motivated the development of statistics like OBPS (on base plus slugging).  Measures like OBPS that are constructed from multiple statistics are appealing because they begin to capture the multiple contributions made by a player.  On the downside these types of constructed statistics often have an arbitrary nature in terms of how component statistics are weighted.

The complexity of player contributions and the “arbitrary nature” of how simple statistics are weighted is illustrated by the formula for the NFL quarterback ratings.

This equation combines completion percentage (COMP/ATT), yards per attempt (YARDS/ATT), touchdown rate (TD/ATT) and interception rate (INT/ATT) to arrive at a single statistic for a quarterback.  On the plus side the metric includes data related to “accuracy” (completion percentage) to “scale” (yards per), to “conversion” (TDs), and to “failures” (interceptions).  We can debate if this is a sufficiently complete look at QBs (should we include sacks?) but it does cover multiple aspects of passing performance.   However, a common reaction to the formula is a question about where the weights come from.  Why is completion rate multiplied by 5 and touchdown rates multiplied by 20?

Is it a great statistic?  One way to evaluate is via a quick check of the historical record.  Does the historical ranking jive with our intuition?  Here is a link to historical rankings.

Every sport has examples of these kinds of “multi-attribute” constructed statistics.  Basketball has player efficiency metrics that involve weighting a player’s good events (points, rebounds, steals) and negative outcomes (turnovers, fouls, etc…).  The OBPS metric involves an implicit assumption that “on base percentage” and “slugging” are of equal value.

One area I want to explore is how we should construct these types of performance metrics.  This is a discussion that involves some philosophy and some statistics.  We will take this piece by piece and also show a couple of applications along the way.

Analytics, Trump, Clinton and the Polls: Sports Analytics Series Part 5.1

Recent presidential elections (especially 2008 and 2012) have featured heavy use of analytics by candidates and pundits.  The Obama campaigns were credited with using micro targeting and advanced analytics to win elections. Analysts like Nate Silver were hailed as statistical gurus who could use polling data to predict outcomes.  In the lead up to this year’s contest we heard a lot about the Clinton campaign’s analytical advantages and the election forecasters became regular parts of election coverage.

Then Tuesday night happened.  The polls were wrong (by a little) and the advanced micro targeting techniques didn’t pay off (enough).

Why did the analytics fail?

First the polls and the election forecasts (I’ll get to the value of analytics next week). As background, commentators tend to not truly understand polls.  This creates confusion because commentators frequently over- and misinterpret what polls are saying.  For example, whenever “margin of error” is mentioned they tend to get things wrong.  A poll’s margin of error is based on sample size.  The common journalist’s error is that when you are talking about a collection of polls the sample size is much larger than an individual poll with a margin of error of 3% or 4%.  When looking at an average of many polls the “margin of error” is much smaller because the “poll of polls” has a much larger sample size.  This is a key point because when we think about the combined polls it is even more clear that something went wrong in 2016.

Diagnosing what went wrong is complicated by two factors.  First, it should be noted that because every pollster does things differently we can’t make blanket statements or talk in absolutes.  Second, diagnosing the problem requires a deep understanding of the statistics and assumptions involved in polling.

In the 2016 election my suspicion is that a two things went wrong.  As a starting point – we need to realize that polls include strong implicit assumptions about the nature of the underlying population and about voter passion (rather than preference).  When these assumptions don’t hold the polls will systematically fail.

First, most polls start with assumptions about the nature of the electorate.  In particular, there are assumptions about the base levels of Democrats, Republicans and Independents in the population.  Very often the difference between polls relates to these assumptions (LA Times versus ABC News).

The problem with assumptions about party affiliation in an election like 2016, is that the underlying coalitions of the two parties are in transition.  When I grew up the conventional wisdom was that the Republicans were the wealthy, the suburban professionals, and the free trading capitalists while the democrats were the party of the working man and unions.  Obviously these coalitions have changed.  My conjecture is that pollsters didn’t sufficiently re-balance.  In the current environment it might make sense to place greater emphasis on demographics (race and income) when designing sampling segments.

The other issue is that more attention needs to be paid towards avidity / engagement/ passion (choose your own marketing buzz word).  Polls often differentiate between likely and registered voters.  This may have been insufficient in this election. If Clinton’s likely voters were 80% likely to show up and Trumps were 95% likely then having a small percentage lead in a preference poll isn’t going to hold up in an election.

The story of the 2016 election should be something every analytics professional understands.  From the polling side the lesson is that we need to understand and question the underlying assumptions of our model and data.  As the world changes do our assumptions still hold?  Is our data still measuring what we hope it does?  Is a single dependent measure (preference versus avidity in this case) enough?

Moving towards Modeling & Lessons from Other Arenas: Sports Analytics Series Part 5

The material in this series is derived from a combination of my experiences in sports applications and my experiences in customer analysis and database marketing.  In many respects, the development of an analytics function is similar across categories and contexts.  For instance, a key issue in any analytics function is the designing and creation of an appropriate data structure.  Creating or acquiring the right kinds of analytics capabilities (statistical skills) is also a common need across industries.

A need to understand managerial decision making styles is also common across categories.  It’s necessary to understand both the level of interest in using analytics and also the “technical level” of the decision makers.  Less experienced data scientists and statistician have a tendency to use too complicated of methods.  This can be a killer.  If the models are too complex they won’t be understood and then they won’t be used.  Linear regression with perhaps a few extensions (fixed effects, linear probability models) are usually the way to go.    Because sports organizations have less history in terms of using analytics the issue of balancing complexity can be especially challenging.

A key distinction between many sports and marketing applications is the number of variables versus the number of observations.  This is an important point of distinction between sports and non-sports industries and it is also an important issue for when we shift to discussing modeling in a couple of weeks.  When I use the term variables I am referencing individual elements of data.  For example, an element of data could be many different things such as a player’s weight or the number of shots taken or the minutes played.  We might also break variables into the categories of dependent variables (things to explain) versus independent variables (things to explain with).  When I use the term observations I am talking about “units of analysis” like players or games.

In many (most) business contexts we have many observations.  A large company may have millions of customer accounts.  There may, however, be relatively few explanatory variables.  The firm may have only transaction history variables and limited demographics.  Even in sports marketing a team interested in modeling season ticket retention may only have information such as the number of tickets previously purchased, prices paid and a few other data points.  In this same example the team may have tens of thousands of season ticket holders.  If we think of this “information” as a database we would have a row for every customer account (several thousand rows) and perhaps ten or twenty columns of variables related to each customer (past purchases and marketing activities).

One trend is that the number of explanatory variables is expanding in just about every category. In marketing applications we have much more purchase detail and often expanded demographics and psychographics.  However, the ratio of observations to columns usually still favors the observations.

In sports we (increasingly) face a very different data environment.  Especially, in player selection tasks like drafting or free agent signings.  The issue in player selection applications is that there are relatively few player level observations.  In particular, when we drill down into specific positions we often find ourselves having only tens or hundreds or player histories (depending on far back we want to go with the data).  In contrast, we may have an enormous number of variables per player.

We have historically had many different types of “box score” type stats but now we have entered into the era of player tracking and biometrics.  Now we can generate player stats related to second-by-second movement or even detailed physiological data.  In sports ranging from MMA to soccer to basketball the amount of variables has exploded.

A big question as we move forward into more modeling oriented topics is how do we deal with this situation?

Decision Biases: Sports Analytics Series Part 4

One way to look at on-field analytics is that it is a search for decision biases.  Very often, sports analytics takes the perspective of challenging the conventional wisdom.  This can take the form of identifying key statistics for evaluating players.  For example, one (too) simple conclusion from “Moneyball” would be that people in baseball did not adequately value the value of being walked and on-base percentage.  The success of the A’s (again – way oversimplifying) was based on finding flaws in the conventional wisdom.

Examples of “challenges” to conventional wisdom are common in analyses of on-field decision making.  For example, in past decades the conventional wisdom was that it is a good idea to use a sacrifice bunt to move players into scoring position or that it is almost always a good idea to punt on fourth down.  I should note that even the term conventional wisdom is problematic as there have likely always been long-term disagreements about the right strategies to use at different points in a game.  Now, however, we are increasingly in a position to use data to determine the right or optimal strategies.

As we discussed last time, humans tend to be good at overall or holistic judgments while models are good at precise but narrow evaluations.  When the recommendations implied by the data or model are at odds with how decisions are made, there is often an opportunity for improvement.  Using data to find types of undervalued players or to find beneficial tactics represents an effort to correct human decision making biases.

This is an important point.  Analytics will almost never outperform human judgment when it comes to individuals.  What analytics are useful for is helping human decision makers self-correct.  When the model yields different insights than the person it’s time to drill down and determine why.  Maybe it’s a shortcoming of the model or maybe it’s a bias on the part of the general manager.

The term bias has a negative connotation.  But it shouldn’t for this discussion.  For this discussion a bias should just be viewed as a tendency to systematically make decisions based on less than perfect information.

The academic literature has investigated many types of biases.  Wikipedia provides a list of a large number of biases that might lead to decision errors.  This list even includes the sports inspired “hot-hand fallacy” which is described as a “belief that a person who has experienced success with a random event has a greater chance of further success in additional attempts.”  From a sports analytics perspective the question might be asked is whether the hot-hand is a real thing or just a belief. The analyst might be interested in developing a statistical test to assess whether a player on a hot streak is more likely to be successful on his next attempt.  This model would have implications for whether a coach should “feed” the hot hand.

Academic work has also looked at the impact of factors like sunk costs on player decisions.  The idea behind “sunk costs” is that if costs have already been incurred then those costs should not impact current or future decision making.  In the case of player decisions “sunk costs” might be factors like salary or when the player was drafted.  Ideally, a team would use the players with the highest expected performance.  A tendency towards playing individuals based on the past would represent a bias.

Other academic work has investigated the idea of “status” bias.  In this case the notion is that referees might call a game differently depending on the players involved.  It’s probably obvious that this is the case.  Going old school for a moment, even the most fervent Bulls fans of the 90’s would have to admit that Craig Ehlo wouldn’t get the same calls as Michael Jordan.

In these cases, it is possible (though tricky) to look for biases in human decision making.  In the case of sunk costs investigators have used statistical models to examine the link between when a player was drafted and the decision to play an athlete (controlling for player performance).  If such a bias exists, then the analysis might be used to inform general managers of this trait.

In the case of advantageous calls for high profile players, an analysis might lead to a different type of conclusion. If such a bias exists, then perhaps leagues should invest more heavily in using technology to monitor and correct referee’s decisions.

  • People suffer from a variety of decision biases. These biases are often the result of decision making heuristics or rules of thumbs.
  • One use of statistical models is to help identify decision making biases.
  • The identification of widespread biases is potentially of great value as these biases can help identify imperfections in the market for players or improved game strategies.

Questioning the Value of Analytics: Sports Analytics Series Part 3

Continuing the discussion about organizational issues and challenges, a fundamental issue is understanding and balancing the relative strengths and weaknesses of human decision makers and mathematical models.  This is an important discussion because before diving into specific questions related to predicting player performance it’s worthwhile to first think about how modeling and statistics should fit into an overall structure for decision making.  The short answer is that analytics should serve as a complement to human insight. 

The “value” of analytics in sports has been the topic of debate.  A high profile example of this occurred between Charles Barkley and Daryl Morey.  Barkley has gone on record questioning the value of analytics.

“Analytics don’t work at all. It’s just some crap that people who were really smart made up to try to get in the game because they had no talent. Because they had no talent to be able to play, so smart guys wanted to fit in, so they made up a term called analytics.  Analytics don’t work.” 

The quote reflects an extreme perspective and it is legitimate to question whether Charles Barkley has the background to assess the value of analytics (or maybe he does, who knows?).  But, I do think that Barkley’s opinion does have significant merit.

In much of the popular press surrounding books like Moneyball or The Extra 2% analytics often seem like a magic bullet.  The reality is that statistical models are better viewed as decision support aids.  Note that I am talking about the press rather than then books.

The fundamental issue is that models and statistics are incomplete.  They don’t tell the whole story.  A lot of analytics revolves around summarizing performance into statistics and then predicting how performance will evolve. Defining a player based on a single number is efficient but it can only capture a slice of the person’s strengths and weaknesses.  Predicting how human performance will evolve over time is a tenuous proposition.

What statistics and models are good at is quantifying objective relationships in the data.  For example, if we were interested in building a model of how quarterback performance translates from college to professional football we could estimate the mathematical relationship between touchdown passes at the college level and touchdown passes at the pro level.  A regression model would give us the numerical patterns in the data but such a model would likely have little predictive power since many other factors come in to play.

The question is whether the insights generated from analytics or the incremental forecasting power actually translate into something meaningful.  They can.  But the effects may be subtle and they may play out over years.  And remember we are not even considering the financial side of things.  If the best predictive models improve player evaluations by a couple of percent maybe it translates to your catcher having a 5% higher on base percentage or your quarterback having a passer rating that is 1 or 2 points higher.  These things matter.  But are they dwarfed by being able to throw 10 or 20 million more into signing a key player?

If the key to winning a championship is having a couple of superstars.  Then maybe analytics don’t matter much.  What matters is being able to manage the salary cap and attract the talent.  But maybe the goal is to make the playoffs in a resource or salary cap constrained environment.  Then maybe spending efficiently and generating a couple of extra is the objective.  In this case analytics can be a difference maker.

Understanding the Organization: Sports Analytics Series Part 2

The purpose of this series is to discuss the use of analytics in sports organizations (see part 1).  Rather than jump into a discussion of models, I want to start with something more fundamental.  I want to talk about how organizations work and how people make decisions.  Sophisticated statistics and detailed data are potentially of great value.  However, if the organization or the decision maker is not interested in or comfortable with advanced statistics then it really doesn’t matter if the analyses are of high quality.

Analytics efforts can fail to deliver optimal value for a variety of reasons in almost any industry.  The idea that we can use data to guide decisions is intuitively appealing.  It seems like more data can only create more understanding and therefore better decisions.  But going from this logic to improved decision making can be a difficult journey.

Difficulties can arise from a variety of sources.  The organization may lack commitment in terms of time and resources.  Individual decision makers may lack sufficient interest in, or understanding of analytics.  Sometimes the issue can be the lack of vision as to what analytics is supposed accomplish.  There can also be a disconnect between the problems to be solved and the skills of the analytics group.

These challenges can be particularly significant in the sports industry because there is often a lack of institutional history of using analytics.  Usually organizations have existing approaches and structures for decision making and the incorporation of new data structures or analytical techniques requires some sort of change.  In the earliest stages, the shift towards analytics involves moving into uncharted territory.  The decision maker is (implicitly) asked to alter how he operates and this change may be driven by information that is derived from unfamiliar techniques.

Several key concerns can be best illustrated by considering two categories of analyses.  The first category involves long-term projects for addressing repeated decisions.  For instance, a common repeated decision might be drafting players.  Since a team drafts every year it makes sense to assemble extensive data and to build high quality predictive models to support annual player evaluation.  This kind of organizational decision demands a consistent and committed approach.  But the important point is that this type of decision may require years of investments before a team can harvest significant value. 

It is also important to realize that with repeated tasks there will be an existing decision making structure in place.  The key is to think about how the “analytics” add to or compliment this structure rather than thinking that “analytics” is a new or replacement system (we will discuss why this is true in detail soon).  The existing approach to scouting and drafting likely involves many people and multiple systems.  The analytics elements need to be integrated rather than imposed.

A second category of analyses are short-term one-off types of projects.  These projects can be almost anything ranging from questions about in game strategies or very specific evaluations of player performance.  These projects primarily demand flexibility.  Someone in the organization may see or hear something that generates a question.  This question then gets tossed to the analytics group (or person) and a quick turn-around is required.

Since these questions can come from anywhere the analytics function may struggle with even having the right data or having the data in an accessible format.  Given the time sensitive nature of these requests there will likely be a need to use flawed data or imperfect methods.  The organization needs to be realistic about what is possible in the short-term and more critically the analysis needs to be understood at a level where the human decision maker can adjust for any shortcomings (and there are always shortcomings).  In other words, the decision maker needs to understand the limitations associated with a given analysis so that the analytics can inform rather than mislead.

The preceding two classes of problems highlight issues that arise when an organization starts on the path towards being more analytically driven.  In addition, there can also be problems caused by inexperienced analysts.  For example, many analysts (particularly those coming from academia) fail to grasp is that problems are seldom solved through the creation of an ideal statistic or equation.  Decision making in organizations is often driven by short-term challenges (putting out fires).  Decision support capabilities need to be designed to support fast moving, dynamic organizations rather than perfectly and permanently solving well defined problems.

In the next entry, we will start to take a more in depth look at how analytics and human decision making can work together.  We will talk about the relative merits of human decision making versus statistical models.  After that we will get into a more psychological topic –decision making biases.

Part 2 Key Takeaways…

  • The key decision makers need to be committed to and interested in analytics.
  • Sufficient investment in people and data is a necessary condition.
  • Many projects require a long-term commitment. It may be necessary to invest in multiyear database building efforts before value can be obtained.

How Much Do NFL Stadiums Matter?

MLB Ballpark factorsWhere a game takes place hugely impacts performance, without even taking home field advantage into account.  In the MLB, there are “ballpark factors” which provide data as to how much more or less likely an event (e.g. double, home run, etc.) is in a particular ballpark relative to the league average.  These “ballpark factors” are a concept that most die-hard baseball fans know very well, and something all fantasy baseball players should be familiar with (especially for daily sites, like FanDuel).  ESPN provides a table to all its readers like the one shown on the left. The table reads as follows: for every one run scored in a league average MLB park, 1.501 runs will be scored in Colorado and .825 runs will be scored in Seattle. These factors are not the be-all and end-all when it comes to explaining player performance, but it’s another predictive tool to add to your tool belt.

Two recent examples show its application quite nicely: the performance decline of Robinson Cano after he decided to move to Seattle this past season, and the reasoning (or lack thereof) behind the Mets free agent signing of Michael Cuddyer this offseason.  In his seven full seasons with the Yankees, Cano averaged 24 home runs in 160 games per year.  His first year in Seattle he hit 14 in 157 games. Using only ballpark factors we would have predicted that he would hit 16 – not bad for just one calculation.  Michael Cuddyer has played his last three seasons with Colorado, hitting .307 with 15 home runs in only 93 games per year.  Again using only ballpark factors, in 93 games next year he should hit .254 with 12 home runs.

Being the fantasy sports aficionado that I am, I wanted to apply the same idea behind these ballpark factors to NFL data. However, much to my dismay, there was no NFL equivalent to be found. So, I decided to create NFL Stadium factors based on data from 2010 to 2013. The result is the table seen below.

Stadium FactorsExcludes SF & MN

Just as with MLB “ballpark factors”, the numbers in this table are just another piece in the puzzle of football analytics.  Unlike baseball, however, the NFL Stadium Factors are a slightly more effective tool on a team-by-team basis rather than for individual performance. For example, consider the trade rumors hovering around the weeks leading up the NFL draft this past year. “Brady to Houston,” the headlines read. On the surface this looks like a no-doubter for the Texans, but has New England’s stadium been augmenting Tom Brady’s statistics over these many years? In fact, a quick look at the table on the left makes me wonder if the Patriot offensive juggernaut as a whole has benefited by playing in Foxboro.  New England’s passing attack has averaged 276 yards per game and 33 touchdowns in the air over the past five years. If they had been playing in Houston’s stadium over that time span, stadium factors suggest those number would have plummeted to 252 and 29.

It’s easy to take these numbers as they are and just plug them in your statistical analyses; however considering the characteristics of given stadiums in order to understand why certain trends exist in the data is infinitely more useful. The stadium characteristics I looked at are as follows: domes, turf fields, cold weather, noise, and altitude.

  • In stadiums with domes, you see a significant increase in field goals made and a decrease in rush yards gained. This is likely due to the absence of wind and other adverse weather effects, which negatively affects the passing game and field goals. Regardless of whether or not a team has a good rushing attack, if conditions don’t lend themselves to a game plan centered on throwing the ball, then more rush yards will inevitably be gained.
  • On turf fields, the number of successful field goals goes up substantially due to the more consistent footing for kickers. Kickers are much more prone to slipping on the less-secure grass footing of a natural surface.
  • The third characteristic, cold weather, is defined from a list of the 10 coldest and snowiest stadiums in the NFL. From that classification, I found that noticeably fewer points, rush touchdowns, and field goals occur in those stadiums. These outcomes are all fairly logical and can be explained by the unfavorable effects of the cold on the human body. In addition, as the temperature drops the football becomes less elastic. In combination with the dense, cold air inside and outside of the football, this makes field goals a much harder task.
  • In the NFL’s five loudest stadiums, noise was found to lead to fewer points scored per game, most likely because of a two factors – communication and intensity. The louder a stadium gets, the harder it is for an offense to communicate certain blitz protections or other audibles. Secondly, the intensity of a loud crowd leads to more pressure and greater nervousness, which I believe more heavily impacts offensive performance.
  • Finally, altitude is directly and positively correlated to field goals made and rush touchdowns scored. The first part makes perfect sense – things fly further in the lighter air of high altitude. The reason behind the second finding is a little more intricate. Altitude has powerful effects on lung capacity and conditioning levels, so defensive linemen (who aren’t in shape) tend to struggle in places like Mile High Stadium. Rushing touchdown data would specifically reveal this trend, because they often occur at the end of long drives when those in charge of stopping the run (the defensive linemen) are exhausted.

This table is a great starting point in starting to describe the effect that a certain location has on NFL performance. Although the insights behind the aforementioned explanations are my personal opinions, the numbers can be explained logically, and when used in statistical analysis will most definitely lead to improved results.

Michael Byman (@MichaelByman) is a senior at Emory’s Goizueta Business School.  He is a Sport Analytics Research Grant recipient & submarine college pitcher.