Moving towards Modeling & Lessons from Other Arenas: Sports Analytics Series Part 5

The material in this series is derived from a combination of my experiences in sports applications and my experiences in customer analysis and database marketing.  In many respects, the development of an analytics function is similar across categories and contexts.  For instance, a key issue in any analytics function is the designing and creation of an appropriate data structure.  Creating or acquiring the right kinds of analytics capabilities (statistical skills) is also a common need across industries.

A need to understand managerial decision making styles is also common across categories.  It’s necessary to understand both the level of interest in using analytics and also the “technical level” of the decision makers.  Less experienced data scientists and statistician have a tendency to use too complicated of methods.  This can be a killer.  If the models are too complex they won’t be understood and then they won’t be used.  Linear regression with perhaps a few extensions (fixed effects, linear probability models) are usually the way to go.    Because sports organizations have less history in terms of using analytics the issue of balancing complexity can be especially challenging.

A key distinction between many sports and marketing applications is the number of variables versus the number of observations.  This is an important point of distinction between sports and non-sports industries and it is also an important issue for when we shift to discussing modeling in a couple of weeks.  When I use the term variables I am referencing individual elements of data.  For example, an element of data could be many different things such as a player’s weight or the number of shots taken or the minutes played.  We might also break variables into the categories of dependent variables (things to explain) versus independent variables (things to explain with).  When I use the term observations I am talking about “units of analysis” like players or games.

In many (most) business contexts we have many observations.  A large company may have millions of customer accounts.  There may, however, be relatively few explanatory variables.  The firm may have only transaction history variables and limited demographics.  Even in sports marketing a team interested in modeling season ticket retention may only have information such as the number of tickets previously purchased, prices paid and a few other data points.  In this same example the team may have tens of thousands of season ticket holders.  If we think of this “information” as a database we would have a row for every customer account (several thousand rows) and perhaps ten or twenty columns of variables related to each customer (past purchases and marketing activities).

One trend is that the number of explanatory variables is expanding in just about every category. In marketing applications we have much more purchase detail and often expanded demographics and psychographics.  However, the ratio of observations to columns usually still favors the observations.

In sports we (increasingly) face a very different data environment.  Especially, in player selection tasks like drafting or free agent signings.  The issue in player selection applications is that there are relatively few player level observations.  In particular, when we drill down into specific positions we often find ourselves having only tens or hundreds or player histories (depending on far back we want to go with the data).  In contrast, we may have an enormous number of variables per player.

We have historically had many different types of “box score” type stats but now we have entered into the era of player tracking and biometrics.  Now we can generate player stats related to second-by-second movement or even detailed physiological data.  In sports ranging from MMA to soccer to basketball the amount of variables has exploded.

A big question as we move forward into more modeling oriented topics is how do we deal with this situation?

A Short Course on Sports Analytics – Part 1

  1. Sports Analytics in Organizations

This fall the plan is to do something a little different with the blog.  Rather than data driven analyses of sports marketing topics, I want to spend some time talking about using analytics to support player and in-game decision making.  The “Moneyball” side of the sports analytics space.

The focus will mainly be at the level of the organization rather than at the level of specific research questions.  In other words, we will talk about providing effective analytics support within an organization, rather than presenting a series of analyses.  My hope is that this evolves to being something of a web based course on using analytics to drive decisions in sports.

I’ve spent a lot of time over the past few decades working on analytics projects (across multiple industries) and I’ve developed opinions about what firms do right and where mistakes are made.  Over the last few years, I’ve thought a lot about how analytics can be used by sports organizations.  Specifically, about how lessons from other industries can be applied, and instances where sports are just different.

The history of statistical analysis in sports goes way back, but obviously exploded with the publication of Moneyball.  A huge number of sports fans would love to be a General Manager but few people have the athletic ability to gain entry as a former player.  Using statistics to find ways to win is (maybe?) a more accessible route.

But this route is not without its complications.  Using stats and data to win games is an intriguing and challenging intellectual task.  What data should be collected?  How should the data be analyzed?  How should the analysis be included in the decision making structure?  These are all challenging questions that go beyond what a fan with some data can accomplish.

What I’m going to do in this series is talk about how to approach analytics from both a conceptual level and an operational level.  Conceptually, I will cover how humans make decisions in organizations.  At the operational level, we will discuss what types of analyses should be pursued.

What I won’t do in this series is talk about specific models.  At least not very much.  I may drop in a couple of analyses.  This limitation is done with purpose.  It’s my feeling that the sports analytics space is overly littered with too many isolated projects and analyses.  The goal here is to provide a structure for building an analytics function and some general guidance on how to approach several broad classes of analyses.

What will this series include?  Some of the content will be based on whatever becomes top of mind or based on the response I get from readers.  But some things will definitely appear.  There will be material about how analytics can best compliment a human decision maker.  I will also talk about how lessons from other industries can be helpful in the sports context.  There are more similarities than differences between sports and “standard” businesses.  But there are some important differences.

We will also talk about models and statistical analysis.  But this will be done in broad terms.  What I mean is that we will discuss classes of analysis rather than specific studies.  For example, we will discuss player selection analyses but the emphasis will be on how to approach the problem rather than the creation of a particular forecasting model.  There are a variety of ways to analyze players.  We can use simple models like linear regression or more complex models that yield probabilities.  We can also forgo the stats and use raw data to look for player comparisons.  We will discuss the implementation challenges and benefits of each approach.

This series is a work in progress.  I have a number of entries planned but I’m very open to questions.  Shoot me an email and I’ll be happy to respond in future entries or privately (time permitting).

Next: Understanding the Organization