Decision Biases: Sports Analytics Series Part 4

One way to look at on-field analytics is that it is a search for decision biases.  Very often, sports analytics takes the perspective of challenging the conventional wisdom.  This can take the form of identifying key statistics for evaluating players.  For example, one (too) simple conclusion from “Moneyball” would be that people in baseball did not adequately value the value of being walked and on-base percentage.  The success of the A’s (again – way oversimplifying) was based on finding flaws in the conventional wisdom.

Examples of “challenges” to conventional wisdom are common in analyses of on-field decision making.  For example, in past decades the conventional wisdom was that it is a good idea to use a sacrifice bunt to move players into scoring position or that it is almost always a good idea to punt on fourth down.  I should note that even the term conventional wisdom is problematic as there have likely always been long-term disagreements about the right strategies to use at different points in a game.  Now, however, we are increasingly in a position to use data to determine the right or optimal strategies.

As we discussed last time, humans tend to be good at overall or holistic judgments while models are good at precise but narrow evaluations.  When the recommendations implied by the data or model are at odds with how decisions are made, there is often an opportunity for improvement.  Using data to find types of undervalued players or to find beneficial tactics represents an effort to correct human decision making biases.

This is an important point.  Analytics will almost never outperform human judgment when it comes to individuals.  What analytics are useful for is helping human decision makers self-correct.  When the model yields different insights than the person it’s time to drill down and determine why.  Maybe it’s a shortcoming of the model or maybe it’s a bias on the part of the general manager.

The term bias has a negative connotation.  But it shouldn’t for this discussion.  For this discussion a bias should just be viewed as a tendency to systematically make decisions based on less than perfect information.

The academic literature has investigated many types of biases.  Wikipedia provides a list of a large number of biases that might lead to decision errors.  This list even includes the sports inspired “hot-hand fallacy” which is described as a “belief that a person who has experienced success with a random event has a greater chance of further success in additional attempts.”  From a sports analytics perspective the question might be asked is whether the hot-hand is a real thing or just a belief. The analyst might be interested in developing a statistical test to assess whether a player on a hot streak is more likely to be successful on his next attempt.  This model would have implications for whether a coach should “feed” the hot hand.

Academic work has also looked at the impact of factors like sunk costs on player decisions.  The idea behind “sunk costs” is that if costs have already been incurred then those costs should not impact current or future decision making.  In the case of player decisions “sunk costs” might be factors like salary or when the player was drafted.  Ideally, a team would use the players with the highest expected performance.  A tendency towards playing individuals based on the past would represent a bias.

Other academic work has investigated the idea of “status” bias.  In this case the notion is that referees might call a game differently depending on the players involved.  It’s probably obvious that this is the case.  Going old school for a moment, even the most fervent Bulls fans of the 90’s would have to admit that Craig Ehlo wouldn’t get the same calls as Michael Jordan.

In these cases, it is possible (though tricky) to look for biases in human decision making.  In the case of sunk costs investigators have used statistical models to examine the link between when a player was drafted and the decision to play an athlete (controlling for player performance).  If such a bias exists, then the analysis might be used to inform general managers of this trait.

In the case of advantageous calls for high profile players, an analysis might lead to a different type of conclusion. If such a bias exists, then perhaps leagues should invest more heavily in using technology to monitor and correct referee’s decisions.

  • People suffer from a variety of decision biases. These biases are often the result of decision making heuristics or rules of thumbs.
  • One use of statistical models is to help identify decision making biases.
  • The identification of widespread biases is potentially of great value as these biases can help identify imperfections in the market for players or improved game strategies.

A Quick Example of the Limitations of Analytics: Sports Analytics Series Part 3.1

In Part 3 we started to talk about the complementary role of human decision makers and models.  Before we get to the next topic – Decision Biases – I wanted to take a moment to present an example that helps illustrate the points being made in the last entry.

I’m going to make the point using an intentionally nontraditional example.  Part of the reason I’m using this example is that I think it’s worthwhile to think about what might be “questionable” in terms of the analysis.  So rather than look at some well-studied relationships in contexts like NFL quarterbacks or NBA players, I’m going to develop a model of Fullback performance in Major League Soccer.

To keep this simple, I’m going to try and figure out the relationship between a player’s Plus-Minus statistic and a few key performance variables.  I’m not going to provide a critique of Plus-Minus but I encourage everyone to think about the value of such a statistic in soccer in general and for the Fullback position in particular.  This is an important exercise for thinking about combining statistical analysis and human insight.  What is the right bottom line metric for a defensive player in a team sport?

The specific analysis is a simple regression model that quantifies the relationship between Plus-Minus and the following performance measures:

  • % of Defensive Ground Duels Won
  • % of Defensive Aerial Duels Won
  • Tackling Success Rate (%)
  • % of Successful Passes in the Opponents ½

This is obviously a very limited set of statistics.  One thing to think about is that if I am creating this statistical model with even multiple years of data, I probably don’t have very many observations.  This is a common problem.  In any league there are usually about 30 teams and maybe 5 players at any position.  We can potentially capture massive amounts of data but maybe we only have 150 observations a year.  Note that in the case of MLS fullbacks we have less than that.  This is important because it means that in sports contexts we need to have parsimonious models.  We can’t throw all of our data into the models because we don’t have enough observations.

The table below lists the regression output.  Basically, the output is saying that % Successful passes in the opponent’s half is the only statistic that is significantly and positively correlated with a Fullback’s Plus-Minus statistic.

Parameter Estimates
Variable DF Parameter
t Value Pr > |t|
Intercept 1 -1.66764 0.41380 -4.03 <.0001
% Defensive Ground Duels Won 1 -0.00433 0.00314 -1.38 0.1692
% Def Aerial Duels Won 1 -0.00088542 0.00182 -0.49 0.6263
 Tackling Success Percentage 1 0.39149 0.25846 1.51 0.1305
% Successful Passes in Opponents 1/2 1 0.02319 0.00480 4.83 <.0001

The more statistically oriented reader might be asking the question of how well does this model actually fit the data.  What is the R-Square?  It is small.  The preceding model explains about 5% of the variation in Fullback’s Plus-Minus statistics.

And that is the important point.  The model does its job in that it tells us there is a significant relationship between passing skill and goal differential.  But it is far from a complete picture.  The decision maker needs to understand what the model shows.  However, the decision maker also needs to understand what the model doesn’t reveal.   This model (and the vast majority of other models) is inherently limited.  Like I said last time – the model is a decision support tool / not something that makes the decision.

Admittedly I didn’t try to find a model that fits the data really well.  But I can tell you that in my experience in sports and really any context that involves predicting or explaining individual human behavior, the models usually only explain a small fraction of variance in performance data.

Questioning the Value of Analytics: Sports Analytics Series Part 3

Continuing the discussion about organizational issues and challenges, a fundamental issue is understanding and balancing the relative strengths and weaknesses of human decision makers and mathematical models.  This is an important discussion because before diving into specific questions related to predicting player performance it’s worthwhile to first think about how modeling and statistics should fit into an overall structure for decision making.  The short answer is that analytics should serve as a complement to human insight. 

The “value” of analytics in sports has been the topic of debate.  A high profile example of this occurred between Charles Barkley and Daryl Morey.  Barkley has gone on record questioning the value of analytics.

“Analytics don’t work at all. It’s just some crap that people who were really smart made up to try to get in the game because they had no talent. Because they had no talent to be able to play, so smart guys wanted to fit in, so they made up a term called analytics.  Analytics don’t work.” 

The quote reflects an extreme perspective and it is legitimate to question whether Charles Barkley has the background to assess the value of analytics (or maybe he does, who knows?).  But, I do think that Barkley’s opinion does have significant merit.

In much of the popular press surrounding books like Moneyball or The Extra 2% analytics often seem like a magic bullet.  The reality is that statistical models are better viewed as decision support aids.  Note that I am talking about the press rather than then books.

The fundamental issue is that models and statistics are incomplete.  They don’t tell the whole story.  A lot of analytics revolves around summarizing performance into statistics and then predicting how performance will evolve. Defining a player based on a single number is efficient but it can only capture a slice of the person’s strengths and weaknesses.  Predicting how human performance will evolve over time is a tenuous proposition.

What statistics and models are good at is quantifying objective relationships in the data.  For example, if we were interested in building a model of how quarterback performance translates from college to professional football we could estimate the mathematical relationship between touchdown passes at the college level and touchdown passes at the pro level.  A regression model would give us the numerical patterns in the data but such a model would likely have little predictive power since many other factors come in to play.

The question is whether the insights generated from analytics or the incremental forecasting power actually translate into something meaningful.  They can.  But the effects may be subtle and they may play out over years.  And remember we are not even considering the financial side of things.  If the best predictive models improve player evaluations by a couple of percent maybe it translates to your catcher having a 5% higher on base percentage or your quarterback having a passer rating that is 1 or 2 points higher.  These things matter.  But are they dwarfed by being able to throw 10 or 20 million more into signing a key player?

If the key to winning a championship is having a couple of superstars.  Then maybe analytics don’t matter much.  What matters is being able to manage the salary cap and attract the talent.  But maybe the goal is to make the playoffs in a resource or salary cap constrained environment.  Then maybe spending efficiently and generating a couple of extra is the objective.  In this case analytics can be a difference maker.

Understanding the Organization: Sports Analytics Series Part 2

The purpose of this series is to discuss the use of analytics in sports organizations (see part 1).  Rather than jump into a discussion of models, I want to start with something more fundamental.  I want to talk about how organizations work and how people make decisions.  Sophisticated statistics and detailed data are potentially of great value.  However, if the organization or the decision maker is not interested in or comfortable with advanced statistics then it really doesn’t matter if the analyses are of high quality.

Analytics efforts can fail to deliver optimal value for a variety of reasons in almost any industry.  The idea that we can use data to guide decisions is intuitively appealing.  It seems like more data can only create more understanding and therefore better decisions.  But going from this logic to improved decision making can be a difficult journey.

Difficulties can arise from a variety of sources.  The organization may lack commitment in terms of time and resources.  Individual decision makers may lack sufficient interest in, or understanding of analytics.  Sometimes the issue can be the lack of vision as to what analytics is supposed accomplish.  There can also be a disconnect between the problems to be solved and the skills of the analytics group.

These challenges can be particularly significant in the sports industry because there is often a lack of institutional history of using analytics.  Usually organizations have existing approaches and structures for decision making and the incorporation of new data structures or analytical techniques requires some sort of change.  In the earliest stages, the shift towards analytics involves moving into uncharted territory.  The decision maker is (implicitly) asked to alter how he operates and this change may be driven by information that is derived from unfamiliar techniques.

Several key concerns can be best illustrated by considering two categories of analyses.  The first category involves long-term projects for addressing repeated decisions.  For instance, a common repeated decision might be drafting players.  Since a team drafts every year it makes sense to assemble extensive data and to build high quality predictive models to support annual player evaluation.  This kind of organizational decision demands a consistent and committed approach.  But the important point is that this type of decision may require years of investments before a team can harvest significant value. 

It is also important to realize that with repeated tasks there will be an existing decision making structure in place.  The key is to think about how the “analytics” add to or compliment this structure rather than thinking that “analytics” is a new or replacement system (we will discuss why this is true in detail soon).  The existing approach to scouting and drafting likely involves many people and multiple systems.  The analytics elements need to be integrated rather than imposed.

A second category of analyses are short-term one-off types of projects.  These projects can be almost anything ranging from questions about in game strategies or very specific evaluations of player performance.  These projects primarily demand flexibility.  Someone in the organization may see or hear something that generates a question.  This question then gets tossed to the analytics group (or person) and a quick turn-around is required.

Since these questions can come from anywhere the analytics function may struggle with even having the right data or having the data in an accessible format.  Given the time sensitive nature of these requests there will likely be a need to use flawed data or imperfect methods.  The organization needs to be realistic about what is possible in the short-term and more critically the analysis needs to be understood at a level where the human decision maker can adjust for any shortcomings (and there are always shortcomings).  In other words, the decision maker needs to understand the limitations associated with a given analysis so that the analytics can inform rather than mislead.

The preceding two classes of problems highlight issues that arise when an organization starts on the path towards being more analytically driven.  In addition, there can also be problems caused by inexperienced analysts.  For example, many analysts (particularly those coming from academia) fail to grasp is that problems are seldom solved through the creation of an ideal statistic or equation.  Decision making in organizations is often driven by short-term challenges (putting out fires).  Decision support capabilities need to be designed to support fast moving, dynamic organizations rather than perfectly and permanently solving well defined problems.

In the next entry, we will start to take a more in depth look at how analytics and human decision making can work together.  We will talk about the relative merits of human decision making versus statistical models.  After that we will get into a more psychological topic –decision making biases.

Part 2 Key Takeaways…

  • The key decision makers need to be committed to and interested in analytics.
  • Sufficient investment in people and data is a necessary condition.
  • Many projects require a long-term commitment. It may be necessary to invest in multiyear database building efforts before value can be obtained.

A Short Course on Sports Analytics – Part 1

  1. Sports Analytics in Organizations

This fall the plan is to do something a little different with the blog.  Rather than data driven analyses of sports marketing topics, I want to spend some time talking about using analytics to support player and in-game decision making.  The “Moneyball” side of the sports analytics space.

The focus will mainly be at the level of the organization rather than at the level of specific research questions.  In other words, we will talk about providing effective analytics support within an organization, rather than presenting a series of analyses.  My hope is that this evolves to being something of a web based course on using analytics to drive decisions in sports.

I’ve spent a lot of time over the past few decades working on analytics projects (across multiple industries) and I’ve developed opinions about what firms do right and where mistakes are made.  Over the last few years, I’ve thought a lot about how analytics can be used by sports organizations.  Specifically, about how lessons from other industries can be applied, and instances where sports are just different.

The history of statistical analysis in sports goes way back, but obviously exploded with the publication of Moneyball.  A huge number of sports fans would love to be a General Manager but few people have the athletic ability to gain entry as a former player.  Using statistics to find ways to win is (maybe?) a more accessible route.

But this route is not without its complications.  Using stats and data to win games is an intriguing and challenging intellectual task.  What data should be collected?  How should the data be analyzed?  How should the analysis be included in the decision making structure?  These are all challenging questions that go beyond what a fan with some data can accomplish.

What I’m going to do in this series is talk about how to approach analytics from both a conceptual level and an operational level.  Conceptually, I will cover how humans make decisions in organizations.  At the operational level, we will discuss what types of analyses should be pursued.

What I won’t do in this series is talk about specific models.  At least not very much.  I may drop in a couple of analyses.  This limitation is done with purpose.  It’s my feeling that the sports analytics space is overly littered with too many isolated projects and analyses.  The goal here is to provide a structure for building an analytics function and some general guidance on how to approach several broad classes of analyses.

What will this series include?  Some of the content will be based on whatever becomes top of mind or based on the response I get from readers.  But some things will definitely appear.  There will be material about how analytics can best compliment a human decision maker.  I will also talk about how lessons from other industries can be helpful in the sports context.  There are more similarities than differences between sports and “standard” businesses.  But there are some important differences.

We will also talk about models and statistical analysis.  But this will be done in broad terms.  What I mean is that we will discuss classes of analysis rather than specific studies.  For example, we will discuss player selection analyses but the emphasis will be on how to approach the problem rather than the creation of a particular forecasting model.  There are a variety of ways to analyze players.  We can use simple models like linear regression or more complex models that yield probabilities.  We can also forgo the stats and use raw data to look for player comparisons.  We will discuss the implementation challenges and benefits of each approach.

This series is a work in progress.  I have a number of entries planned but I’m very open to questions.  Shoot me an email and I’ll be happy to respond in future entries or privately (time permitting).

Next: Understanding the Organization