Fanalytics Podcast: Sports Analytics – Getting Your Foot in the Door

Houston Dynamo Data Analyst and Emory alumni Sean Steffen joins Mike Lewis on this Fanalytics podcast episode where they discuss how to get into the field of sports analytics. Getting your foot in the door can be quite competitive. Sean shares his “non-traditional” journey.

Some background on Sean: In college, he majored in creative writing. From there, he started writing for American Soccer Analysis, a blog that focuses on Major League Soccer. The key to Sean’s success was that he backed up his writing with data and analytics.

The conversation gets a little deep in the weeds and even includes a discussion of the competing programming languages – R and Python. For prospective analysts, Sean recommends learning excel and linear regression.  Mike says SQL, R, and linear regression are good starting points to analyze data.

They also talk modern soccer analytics such as the logic and mechanics behind expected goals.

You can reach Sean on Twitter @SeanSteffen or search him on LinkedIn.

Click on the logo below to listen to the podcast episode.

 

Fanalytics Podcast: FIFA World Cup Gender Pay Gap

A big topic making headlines right now is the FIFA Women’s World Cup. More specifically, the gender pay gap at the FIFA World Cup.

Professors Mike Lewis and Tom Smith discuss their thoughts on the wage differences between the men and women’s soccer teams on this Fanalytics podcast episode.

Tom also looks at the earning ratios of men and women athletes primarily in the United States.

To listen to this podcast episode, click on the logo below.

You can also rate & subscribe to Fanalytics on iTunes and Google Play.

 

 

Fanalytics Podcast: Lucy Rushton and Building Atlanta United FC

How do you build a new team, like the Atlanta United, from the ground up?

In this Fanalytics episode we meet Atlanta United’s Lucy Rushton. As the team’s Head of Technical Recruitment and Performance Analysis, she provides analytics, data and insights that help the team build their roster.  In the conversation with Lucy we talk about two types of analyses.  One part involves the subjective analysis which is watching the players on the field. The other part is the objective analysis which involves data and statistics, emotion is taken out of the analysis. Rushton says it’s important to get a balance between the two in order to drive a successful department.

So what’s the game plan when searching for players for the team? Rushton says to get data and find players that fit in with the club philosophy and playing styles. Styles include players who have fast attacking skills, can entertain, athleticism, and speed. You also have to ask, what are the key attributes of a player for the position they look for? How much do these players cost?

When it comes to statistical forecasting, how much of that do decision-makers want to see? They want to see the insights not the models.

What’s next for Atlanta United? The head scout says the goal is to get better, get another chance to play in the CONCACAF Champions League, and growth in analysis.

In the second half of the episode, we talk about some of the larger lessons related to performing and presenting analytics in any organization. Analytics is seldom a magic bullet for any organizational challenge. More often, analytics informs rather than directs decisions.

Along these lines, we frame the interview with Lucy and the challenge of building a championship roster in terms of decision support realities such as biases in human decision making and the limitations of statistical models.

To listen to this podcast episode, click on the logo below.

Fanalytics Video: NBA Finals & FIFA Women’s World Cup

This week on the Fanalytics video, we discuss the big story lines happening in the NBA finals and FIFA Women’s World Cup. Thanks for checking out the trending sports stories with us on Monday mornings!

A Quick Example of the Limitations of Analytics: Sports Analytics Series Part 3.1

In Part 3 we started to talk about the complementary role of human decision makers and models.  Before we get to the next topic – Decision Biases – I wanted to take a moment to present an example that helps illustrate the points being made in the last entry.

I’m going to make the point using an intentionally nontraditional example.  Part of the reason I’m using this example is that I think it’s worthwhile to think about what might be “questionable” in terms of the analysis.  So rather than look at some well-studied relationships in contexts like NFL quarterbacks or NBA players, I’m going to develop a model of Fullback performance in Major League Soccer.

To keep this simple, I’m going to try and figure out the relationship between a player’s Plus-Minus statistic and a few key performance variables.  I’m not going to provide a critique of Plus-Minus but I encourage everyone to think about the value of such a statistic in soccer in general and for the Fullback position in particular.  This is an important exercise for thinking about combining statistical analysis and human insight.  What is the right bottom line metric for a defensive player in a team sport?

The specific analysis is a simple regression model that quantifies the relationship between Plus-Minus and the following performance measures:

  • % of Defensive Ground Duels Won
  • % of Defensive Aerial Duels Won
  • Tackling Success Rate (%)
  • % of Successful Passes in the Opponents ½

This is obviously a very limited set of statistics.  One thing to think about is that if I am creating this statistical model with even multiple years of data, I probably don’t have very many observations.  This is a common problem.  In any league there are usually about 30 teams and maybe 5 players at any position.  We can potentially capture massive amounts of data but maybe we only have 150 observations a year.  Note that in the case of MLS fullbacks we have less than that.  This is important because it means that in sports contexts we need to have parsimonious models.  We can’t throw all of our data into the models because we don’t have enough observations.

The table below lists the regression output.  Basically, the output is saying that % Successful passes in the opponent’s half is the only statistic that is significantly and positively correlated with a Fullback’s Plus-Minus statistic.

Parameter Estimates
Variable DF Parameter
Estimate
Standard
Error
t Value Pr > |t|
Intercept 1 -1.66764 0.41380 -4.03 <.0001
% Defensive Ground Duels Won 1 -0.00433 0.00314 -1.38 0.1692
% Def Aerial Duels Won 1 -0.00088542 0.00182 -0.49 0.6263
 Tackling Success Percentage 1 0.39149 0.25846 1.51 0.1305
% Successful Passes in Opponents 1/2 1 0.02319 0.00480 4.83 <.0001

The more statistically oriented reader might be asking the question of how well does this model actually fit the data.  What is the R-Square?  It is small.  The preceding model explains about 5% of the variation in Fullback’s Plus-Minus statistics.

And that is the important point.  The model does its job in that it tells us there is a significant relationship between passing skill and goal differential.  But it is far from a complete picture.  The decision maker needs to understand what the model shows.  However, the decision maker also needs to understand what the model doesn’t reveal.   This model (and the vast majority of other models) is inherently limited.  Like I said last time – the model is a decision support tool / not something that makes the decision.

Admittedly I didn’t try to find a model that fits the data really well.  But I can tell you that in my experience in sports and really any context that involves predicting or explaining individual human behavior, the models usually only explain a small fraction of variance in performance data.