A Quick Example of the Limitations of Analytics: Sports Analytics Series Part 3.1

In Part 3 we started to talk about the complementary role of human decision makers and models.  Before we get to the next topic – Decision Biases – I wanted to take a moment to present an example that helps illustrate the points being made in the last entry.

I’m going to make the point using an intentionally nontraditional example.  Part of the reason I’m using this example is that I think it’s worthwhile to think about what might be “questionable” in terms of the analysis.  So rather than look at some well-studied relationships in contexts like NFL quarterbacks or NBA players, I’m going to develop a model of Fullback performance in Major League Soccer.

To keep this simple, I’m going to try and figure out the relationship between a player’s Plus-Minus statistic and a few key performance variables.  I’m not going to provide a critique of Plus-Minus but I encourage everyone to think about the value of such a statistic in soccer in general and for the Fullback position in particular.  This is an important exercise for thinking about combining statistical analysis and human insight.  What is the right bottom line metric for a defensive player in a team sport?

The specific analysis is a simple regression model that quantifies the relationship between Plus-Minus and the following performance measures:

  • % of Defensive Ground Duels Won
  • % of Defensive Aerial Duels Won
  • Tackling Success Rate (%)
  • % of Successful Passes in the Opponents ½

This is obviously a very limited set of statistics.  One thing to think about is that if I am creating this statistical model with even multiple years of data, I probably don’t have very many observations.  This is a common problem.  In any league there are usually about 30 teams and maybe 5 players at any position.  We can potentially capture massive amounts of data but maybe we only have 150 observations a year.  Note that in the case of MLS fullbacks we have less than that.  This is important because it means that in sports contexts we need to have parsimonious models.  We can’t throw all of our data into the models because we don’t have enough observations.

The table below lists the regression output.  Basically, the output is saying that % Successful passes in the opponent’s half is the only statistic that is significantly and positively correlated with a Fullback’s Plus-Minus statistic.

Parameter Estimates
Variable DF Parameter
Estimate
Standard
Error
t Value Pr > |t|
Intercept 1 -1.66764 0.41380 -4.03 <.0001
% Defensive Ground Duels Won 1 -0.00433 0.00314 -1.38 0.1692
% Def Aerial Duels Won 1 -0.00088542 0.00182 -0.49 0.6263
 Tackling Success Percentage 1 0.39149 0.25846 1.51 0.1305
% Successful Passes in Opponents 1/2 1 0.02319 0.00480 4.83 <.0001

The more statistically oriented reader might be asking the question of how well does this model actually fit the data.  What is the R-Square?  It is small.  The preceding model explains about 5% of the variation in Fullback’s Plus-Minus statistics.

And that is the important point.  The model does its job in that it tells us there is a significant relationship between passing skill and goal differential.  But it is far from a complete picture.  The decision maker needs to understand what the model shows.  However, the decision maker also needs to understand what the model doesn’t reveal.   This model (and the vast majority of other models) is inherently limited.  Like I said last time – the model is a decision support tool / not something that makes the decision.

Admittedly I didn’t try to find a model that fits the data really well.  But I can tell you that in my experience in sports and really any context that involves predicting or explaining individual human behavior, the models usually only explain a small fraction of variance in performance data.

Major League Soccer (MLS) Social Media Equity Rankings: Sporting Kansas City & Seattle Sounders FC on Top

MLS Social Media Equity RankingsFan base evaluation has always been a topic of interest for sports researchers.  The world of social media provides an opportunity to look at fan base support/loyalty without worrying about capacity constraints, pricing, or revenue data issues.  To calculate MLS teams’ “social media equity” we collected social media engagement metrics (Twitter mentions of the club, both with and without hashtags).  We then created a statistical model that predicts these measures of social media engagement as a function of factors such as market size, official club tweeting activity, team payroll, and team performance for this past season.  We then compared each team’s actual social media engagement against the model predictions.

To examine the social media equity of MLS teams, we collected tweets for each team from 2009 to 2013 and built a statistical model.  The logic behind this model is that social media engagement from fans is driven by a bunch of factors like team performance, city demographics, etc.  Unlike other factors, social media equity is not directly measurable.  So we can attribute the contribution from social media equity to model residuals after controlling for measureable factors like team performance.

Social Media Engagement MLSWe first used data from 2009 to 2012 to calibrate the model, i.e. to estimate the coefficients for the explanatory variables, and then we used these estimates to calculate expected social media engagement using data for explanatory variables in 2013. The difference between our prediction and the true engagement we observed is the social media equity for each team.

Across the various specifications of the model, average attendance was significant.  This means home game attendance is crucial for engaging fans online.  This was also true for championships won by the club.  Social media equity rankings from the models are quite consistent.  Here, we present the ranking from the model whose dependent variable is the sum of all types of mentions on Twitter regarding the club.  It is not surprising to see Sporting Kansas City & Seattle Sounders FC at the top the chart.  Sporting Kansas City has been very strong in recent years, and won the championship in 2013.  Seattle Sounders FC has the highest average attendance among all MLS teams, which is evidence for the enthusiasm of fans.

Zhe Han, PhD Student