Player Analytics Fundamentals: Part 4 – Statistical Models

Today’s post introduces the topic of statistical modeling.  This is, maybe, the trickiest part of the series to write.  The problem is that mastering the technical side of statistical analysis usually takes years of education.  And, more critically, developing the wisdom and intuition to use statistical tools effectively and creatively takes years of practice.  The goal of this segment is to point people in the right direction, more than to provide detailed instruction.  That said – I can adjust if there is a call for more technical material.  (If you want to start from the beginning parts 1, 2 and 3 are a click away.)

Let’s start with a simple point.  The primary tool for every analytics professional (sports or otherwise) should be linear regression.  Linear regression allows the analyst to quantify the relationship between some focal variable of interest (dependent measure or DV) and a set of variables that we think drive that variable (independent variables).  In other words, regression is a tool that can produce an equation that shows how some inputs produce an outcome of interest.  In the case of player analytics, this might be a prediction of future performance based on a player’s past statistics or physical attributes.

To make this more concrete, let’s say we want to do an analysis of rookie quarterback performance (we’ve been talking a bit about QB metrics so far in the series).  Selecting QBs involves significant uncertainty.  The transition from the college game to the pro game requires the QB to be able to deal with more complex offensive systems, more sophisticated defenses and more talented opposing players.  The task of the general manager is to identify prospects that can successfully make the transition.

Data and statistical analysis can potentially play a part in this type of decision.  The starting point would be the idea that observable data on college prospects can help predict rookie year performance.  As a starting point let’s assume that general managers can obtain data on the number of games won as a college player, whether the player graduated (or will graduate) and the player’s height.  (We just might be foreshadowing a famous set of rules for drafting quarterbacks).

The other key decision for a statistical analysis of rookie QB performance versus college career and physical data is a performance metric.  We could use the NFL passer rating formula that we have been discussing.  Or we could use something else.  For example, maybe the number of TD passes thrown as a rookie.  This metric is interesting as it captures something about playing time and ability to create scores.

Touchdowns are  also a metric that “fits” linear regression.  Linear regression is best suited to the analysis of quantitative variables that vary continuously.  The number of touchdowns we observe in data will range from zero to whatever the is the rookie TD record.  In contrast, other metrics such as whether the player becomes a starter or a pro bowler are categorical variables.  There are other techniques that are better for analyzing categorical variables.  (if you are a stats jockey and are objecting to the last couple of statements please see the note below).

The purpose of regression analysis is to create an equation of the following form:

This equation says that TD passes are a function of college wins, graduation and height.  The βs are the weights that are determined by the linear regression analysis.  Specifically, linear regression determines the βs that best fits the data.  This is the important point.  The weights or βs are determined from the data.  To illustrate how the equation works lets imagine that we estimated the regression model and obtained the following equation.

This equation says that we can predict rookie TD passes by plugging in each player’s data related to college wins, graduation and height.  It also says that a history of winning is positively related to TDs and graduation also is a positive.  The coefficient for height is zero.  This indicates that height is not a predictor of rookie TDs (I’m making these number up – height probably matters).  One benefit of developing a model is that we let the data speak.  Our “expert” judgment might be that height matters for quarterbacks.  The regression results can help identify decision biases if the coefficients don’t match the experts predictions.  I am neglecting the issue of significance for now – just to keep the focus on intuition.

Let’s say we have two prospects.  Lewis Michaels out of the University of Illinois who won 40 college games (hypothetical and unrealistic), graduated (in engineering) and is 5’10” (a Flutiesque prospect).  Our second prospect is Manny Trips out of Duke.  Manny won 10 games, failed to graduate and is 6’ tall.  Michaels would seem to be the better prospect based on the available data.  The statistical model allows us to predict how much better.

We make our predictions by simply plugging our player level data into the equation.  We would predict Lewis would throw 10 TDs in his rookie year (1+.1*40+5*1+0*70).  For Manny the prediction would be 2 TDs.  For now, I am just making up the coefficients (βs).  In a later entry I will estimate the model using some data on actual NFL rookie QB performance.

Regression has its shortcomings and many analysts love to object to regression analyses.  But for the most part, linear regression is a solid tool for analyzing patterns in data.  It’s also relatively easy to implement.  We can run regressions in Excel!  We shouldn’t underestimate how important it is to be able to do our analyses in standard tools like Excel.

I will extend our tool kit in a future entry.  I briefly mentioned categorical variables such as whether or not a player is a starter.  For these types of Yes/No (starter or not a starter) there is a tool called logistic regression that should be in our repertoire.

*One reason this note is tricky is that I’m trying to get the right balance and tone.  I can already hear the objections.  Lets save these for now.  For example, readers do not need to alert me to the fact that TDs are censored at zero.  Or that there is a mass point at zero because many rookies don’t play.  Or that TDs are counted in discrete units so maybe a Poisson model is more appropriate.  You get the idea.  There are many ways to object to any statistical model.  The real question isn’t whether a model is perfect.  The real question should be whether the model provides value.

Player Analytics Fundamentals: Part 3 – Metrics, Experts and Models

Last time I introduced the topic of player “metrics.” (If you want to get caught up you can start with Part 1 and Part 2 of the series.)  As I noted, determining the right metric is perhaps the most important task in player analytics.  It’s almost too obvious of a point to make – but the starting point for any analytics project should be deciding what to measure or manage.  It’s a non-trivial task because while the end goal (profit, wins) might be obvious, how this goal relates to an individual player (or strategy) may not be.

However, before I get too deep into metric development, I want to take a small detour and talk briefly about statistical models.  We won’t get to modeling in this entry – the goal is to motivate the need for statistical models!  If we are doing player analytics we need some type of tool kit to move us from mere opinion to fact based arguments.

To illustrate what I mean by “opinion” lets consider the example of rating quarterbacks.  In the previous entry, I presented the Passer Rating Formula used to rate NFL quarterbacks.  As a quick refresher let’s look at this beast one more time.The formula includes completion percentage (accuracy), yards per attempt (magnitude), touchdowns (ultimate success) and interceptions (failures).  Let’s pretend for a second that the formula only contained touchdowns and interceptions (just to make it simple).  The question then becomes how much should we weight touchdowns per attempt relative to interceptions per attempt?  The actual formula is hopelessly complex in some ways – we have fractional weights and statistics in different units – so let’s take a step back from the actual formula.

Imagine we have two experts proposing Passer Rating statistics that are based on touchdowns and interceptions only.  One expert might say that touchdowns per attempt are twice as important as interceptions.  We will label this “expert” created formula as ePR1 for expert 1 Passer rating.  The formula would be:

Maybe this judgment would be accompanied by some logic along the lines of “touchdowns are twice as important because the opposing team doesn’t always score as the result of an interception.”

However, the second expert suggests that the touchdowns and interceptions should be weighted equally.  Maybe the logic of the second expert is that interceptions have both direct negative consequences (loss of possession) and also negative psychological effects (loss of momentum), and should therefore be weighted more heavily.  The formula for expert 2 can be written as:

I suspect that many readers (or a high percentage of a few readers) are objecting to developing metrics using this approach.  The approach probably seems arbitrary.  It is.  I’ve intentionally presented things in a manner that highlights the subjective nature of the process.  I’ve reduced things down to just 2 stats and I’ve chosen very simple weights.  But the reality is that this is the basic process through which novices tend to develop “new” or “advanced” statistics.  In fact, it is still very much a standard practice.  The decision maker or supporting analysts gather multiple pieces of information and then use a system of weights to determine a final “grade” or evaluation.

The question then becomes which formula do we use?  Both formulas include multiple pieces of data and are based on a combination of logic and experience.  I am ignoring (for the moment) a critical element of this topic – the issue of decision biases.  In subsequent entries, I’m going to advocate for an approach that is based on data and statistical models.  Next time, we will start to talk more about statistical tools.

Player Analytics Fundamentals: Part 2 – Performance Metrics

I want to start the series with the topic of “Metric Development.”  I’m going to use the term “metric” but I could have just as easily used words like stats, measures or KPIs.  Metrics are the key to sports and other analytics functions since we need to be sure that we have the right performance standards in place before we try and optimize.  Let me say that one more time – METRIC DEVELOPMENT IS THE KEY.

The history of sports statistics has focused on so called “box score” statistics such as hits, runs or RBIs in baseball.  These simple statistics have utility but also significant limitations.  For example, in baseball a key statistic is batting average.  Batting average is intuitively useful as it shows a player’s ability to get on base and to move other runners forward.  However, batting average is also limited as it neglects the difference between types of hits.  In a batting average calculation, a double or home run is of no greater value than a single.  It also neglects the value of walks.

These short-comings motivated the development of statistics like OBPS (on base plus slugging).  Measures like OBPS that are constructed from multiple statistics are appealing because they begin to capture the multiple contributions made by a player.  On the downside these types of constructed statistics often have an arbitrary nature in terms of how component statistics are weighted.

The complexity of player contributions and the “arbitrary nature” of how simple statistics are weighted is illustrated by the formula for the NFL quarterback ratings.

This equation combines completion percentage (COMP/ATT), yards per attempt (YARDS/ATT), touchdown rate (TD/ATT) and interception rate (INT/ATT) to arrive at a single statistic for a quarterback.  On the plus side the metric includes data related to “accuracy” (completion percentage) to “scale” (yards per), to “conversion” (TDs), and to “failures” (interceptions).  We can debate if this is a sufficiently complete look at QBs (should we include sacks?) but it does cover multiple aspects of passing performance.   However, a common reaction to the formula is a question about where the weights come from.  Why is completion rate multiplied by 5 and touchdown rates multiplied by 20?

Is it a great statistic?  One way to evaluate is via a quick check of the historical record.  Does the historical ranking jive with our intuition?  Here is a link to historical rankings.

Every sport has examples of these kinds of “multi-attribute” constructed statistics.  Basketball has player efficiency metrics that involve weighting a player’s good events (points, rebounds, steals) and negative outcomes (turnovers, fouls, etc…).  The OBPS metric involves an implicit assumption that “on base percentage” and “slugging” are of equal value.

One area I want to explore is how we should construct these types of performance metrics.  This is a discussion that involves some philosophy and some statistics.  We will take this piece by piece and also show a couple of applications along the way.

Player Analytics Fundamentals: Part 1

Each Spring I teach courses on Sports Analytics.  These courses include both Marketing Analytics and On-Field Analytics.  The “Blog” has tended to focus on the Marketing of Fan side.  Moving forward, I think the balance is going to change a bit.  My plan is to re-balance the blog to include more of the on-field topics.

Last year I published a series of posts related to the fundamentals of sports analytics.  This material is relevant to both the marketing and the team performance sides of sports analytics.  This series featured comments on organizational design and decision theory.

This series is going to be a bit different than the team and player “analytics” that we see on the web.  Rather than present specific studies, I am going to begin with some fundamental principles and talk about a “general” approach to player analytics.  There is a lot of material on the web related to very specific sports analytics questions.  Analytics can be applied to baseball, football, soccer and every other sport.  And within each of these games there are countless questions to be addressed.

Rather than contribute to the littered landscape, I want to talk about how I approach sports analytics questions.  In some ways, this series is the blue print I use for thinking about sports analytics in the classroom.  My starting point is that I want to provide skills and insights that can be applied to any sport.  So we start with the fundamentals and we think a lot about how to structure problems.  I want to supply grounded general principles that can be applied to any player analytics problem.

So what’s the plan?  At a high level, sports analytics are about prediction.  We will start with a discussion about what we should be predicting.  This is a surprisingly complex issue.  From there we will talk a little bit about different statistical models.  This won’t be too bad, because I’m a firm believer in using the simplest possible models.  The second half of the series will focus on different types of prediction problems.  These will range from predicting booms and busts, to a look at how to do “comparables” in a better fashion.  In terms of the data, I think it will be a mix of football and the other kind of football.

 

NBA Fan Rankings: 2016 Edition

On an (almost) annual basis I present rankings of fan bases across major professional and collegiate leagues.  Today it is time for the NBA.   First, the winners and losers in this year’s rankings.  At the top of the list we have the Knicks, Lakers and Bulls. This may be the trifecta of who the league would love to have playing at Christmas and in the Finals.  At the bottom we have the Grizzlies, Nets and Hornets.

nba2016

Before i get into the details it may be helpful to briefly mention what differentiates these rankings from other analyses of teams and fans. My rankings are driven by statistical models of how teams perform on a variety of marketing metrics.  The key insight is that these models allow us to control for short-run variation in team performance and permanent differences in market potential.  In other words – the analysis uses data to identify engagement or passion (based on attend and spend) beyond what is expected based on how a team is performing and where the team is located.   More details on the methodology can be found here.

spike-lee-knicks

The Winners

This year’s list contains no real surprises.  The top five teams are all major market teams with storied traditions.  The top fan base belongs to the Knicks.   The Lakers, Bulls, Heat and Celtics follow.  The Knicks  highlight how the model works.  While the Knicks might not be winning , Knicks fans still attend and spend.

The number two team on the list (The Lakers) is in much the same situation. A dominant brand with a struggling on-court product.   The Lakers and Clippers are an interesting comparison.  Last season, the Clippers did just a bit better in terms of attendance (100.7% versus 99.7%).  But the Lakers filled their seats with an average ticket price that was substantially higher.  The power of the Laker brand is shown in this comparison because these outcomes occurred in a season where the Clippers won many more games.

Why are the Lakers still the bigger draw?  Is this a star (Kobe) effect?  Probably in part, but fan loyalty is something that evolves over time.  The Lakers have the championships, tradition and therefore the brand loyalty.  It will be interesting to see how much equity is retained long-term if the team is unable to quickly reload.  The shared market makes this an interesting story to watch. I suspect that the Lakers will continue to be the stronger brand for quite a while.

The Losers

At the bottom of the list we have Memphis, Brooklyn and Charlotte.  The interesting one in this group is Brooklyn.  Why do the Nets rank poorly?  It ends up being driven by the relative success of the Knicks versus the Nets.  The Knicks have much more pricing power while the teams operate in basically the same market (we can debate this point).  According to ESPN, the Knicks drew 19,812 fans (100% of capacity) while the Nets filled 83.6% of their building.  The Knicks also command much higher ticket prices.  And while the Nets were worse (21 victories) the Knicks were far from special (32 wins).

What can the teams at the bottom of the list do?  When you go into the data and analyze what drives brand equity the results are intuitive.   Championships, deep playoff runs and consistent playoff appearances are the key to building equity.  easy to understand but tough to accomplish.

And a Draw

An interesting aside in all this is what it means for the league.  The NBA has long been a star and franchise driven league.  In the 1980s it was about the Lakers (Magic) and Celtics (Bird).  In the 1990s it was Michael Jordan and the Bulls.  From there we shifted into Kobe and Lebron.

On one hand, the league might be (even) stronger if the top teams were the Bulls, Knicks and Lakers.  On the other hand, the emergence of Steph Curry and Golden State has the potential to help build another powerful brand.

Some more thoughts…

The Fan Equity metric is just one possible means for assessing fan bases.  In this year’s NFL rankings I reported several more analyses that focus on different market outcomes.  These were social media following, road attendance and win sensitivity (bandwagon fans).  Looking at social following tells us something about the future of the brand as it (broadly) captures fan interest of a younger demographic.  Road Attendance tells us something about national rather than local following.  These analyses also use statistical models to control for market and team performance effects.

Social Equity

Top Social Equity Team: The Lakers

Bottom Social equity: The Nets

Comment: The Lakers are an immensely strong brand on many dimensions.  The Nets are a mid-range brand when you look at raw numbers.  But they suffer when we account for them operating in the NY market.

Road Equity

Top Road Equity: The Lakers

Bottom Road Equity: Portland

Comment: The Lakers dominate.  And as this analysis was done looking at fixed effects across 15 years it is not solely due to Kobe Bryant.  Portland does well locally but is not of much interest nationally.

It is possible to do even more.  We can even look at factors such as win or price sensitivity. Win sensitivity (or bandwagon behavior) tells us whose fans only show up when a team is winning and price sensitivity tells us if a fan base is willing to show up when prices go up.  I’m skipping these latter two analyses today just to avoid overkill (available upon request).  The big message is that we can potentially construct a collection of metrics that provide a fairly comprehensive and deep understanding of each team’s fan base and brand.

Note: I have left one team off the list.  I have decided to stop reporting the local teams (Emory is in Atlanta).  The local teams have all been great to both myself and the Emory community.  This is just a small effort to eliminate some headaches for myself.

Finally… The complete list

City Fan Equity
Boston 5
Charlotte 27
Chicago 3
Cleveland 20
Dallas 15
Denver 11
Detroit 25
GoldenState 16
Houston 7
Indiana 21
LAClips 17
LALakers 2
Memphis 29
Miami 4
Milwaukee 14
Minnesota 22
Brooklyn 28
NewOrleans 24
NYKnicks 1
OKCity 13
Orlando 19
Philadelphia 26
Phoenix 9
Portland 6
Sacramento 10
SanAntonio 12
Toronto 18
Utah 8
Washington 23
 

Analytics, Trump, Clinton and the Polls: Sports Analytics Series Part 5.1

Recent presidential elections (especially 2008 and 2012) have featured heavy use of analytics by candidates and pundits.  The Obama campaigns were credited with using micro targeting and advanced analytics to win elections. Analysts like Nate Silver were hailed as statistical gurus who could use polling data to predict outcomes.  In the lead up to this year’s contest we heard a lot about the Clinton campaign’s analytical advantages and the election forecasters became regular parts of election coverage.

Then Tuesday night happened.  The polls were wrong (by a little) and the advanced micro targeting techniques didn’t pay off (enough).

Why did the analytics fail?

First the polls and the election forecasts (I’ll get to the value of analytics next week). As background, commentators tend to not truly understand polls.  This creates confusion because commentators frequently over- and misinterpret what polls are saying.  For example, whenever “margin of error” is mentioned they tend to get things wrong.  A poll’s margin of error is based on sample size.  The common journalist’s error is that when you are talking about a collection of polls the sample size is much larger than an individual poll with a margin of error of 3% or 4%.  When looking at an average of many polls the “margin of error” is much smaller because the “poll of polls” has a much larger sample size.  This is a key point because when we think about the combined polls it is even more clear that something went wrong in 2016.

Diagnosing what went wrong is complicated by two factors.  First, it should be noted that because every pollster does things differently we can’t make blanket statements or talk in absolutes.  Second, diagnosing the problem requires a deep understanding of the statistics and assumptions involved in polling.

In the 2016 election my suspicion is that a two things went wrong.  As a starting point – we need to realize that polls include strong implicit assumptions about the nature of the underlying population and about voter passion (rather than preference).  When these assumptions don’t hold the polls will systematically fail.

First, most polls start with assumptions about the nature of the electorate.  In particular, there are assumptions about the base levels of Democrats, Republicans and Independents in the population.  Very often the difference between polls relates to these assumptions (LA Times versus ABC News).

The problem with assumptions about party affiliation in an election like 2016, is that the underlying coalitions of the two parties are in transition.  When I grew up the conventional wisdom was that the Republicans were the wealthy, the suburban professionals, and the free trading capitalists while the democrats were the party of the working man and unions.  Obviously these coalitions have changed.  My conjecture is that pollsters didn’t sufficiently re-balance.  In the current environment it might make sense to place greater emphasis on demographics (race and income) when designing sampling segments.

The other issue is that more attention needs to be paid towards avidity / engagement/ passion (choose your own marketing buzz word).  Polls often differentiate between likely and registered voters.  This may have been insufficient in this election. If Clinton’s likely voters were 80% likely to show up and Trumps were 95% likely then having a small percentage lead in a preference poll isn’t going to hold up in an election.

The story of the 2016 election should be something every analytics professional understands.  From the polling side the lesson is that we need to understand and question the underlying assumptions of our model and data.  As the world changes do our assumptions still hold?  Is our data still measuring what we hope it does?  Is a single dependent measure (preference versus avidity in this case) enough?

Moving towards Modeling & Lessons from Other Arenas: Sports Analytics Series Part 5

The material in this series is derived from a combination of my experiences in sports applications and my experiences in customer analysis and database marketing.  In many respects, the development of an analytics function is similar across categories and contexts.  For instance, a key issue in any analytics function is the designing and creation of an appropriate data structure.  Creating or acquiring the right kinds of analytics capabilities (statistical skills) is also a common need across industries.

A need to understand managerial decision making styles is also common across categories.  It’s necessary to understand both the level of interest in using analytics and also the “technical level” of the decision makers.  Less experienced data scientists and statistician have a tendency to use too complicated of methods.  This can be a killer.  If the models are too complex they won’t be understood and then they won’t be used.  Linear regression with perhaps a few extensions (fixed effects, linear probability models) are usually the way to go.    Because sports organizations have less history in terms of using analytics the issue of balancing complexity can be especially challenging.

A key distinction between many sports and marketing applications is the number of variables versus the number of observations.  This is an important point of distinction between sports and non-sports industries and it is also an important issue for when we shift to discussing modeling in a couple of weeks.  When I use the term variables I am referencing individual elements of data.  For example, an element of data could be many different things such as a player’s weight or the number of shots taken or the minutes played.  We might also break variables into the categories of dependent variables (things to explain) versus independent variables (things to explain with).  When I use the term observations I am talking about “units of analysis” like players or games.

In many (most) business contexts we have many observations.  A large company may have millions of customer accounts.  There may, however, be relatively few explanatory variables.  The firm may have only transaction history variables and limited demographics.  Even in sports marketing a team interested in modeling season ticket retention may only have information such as the number of tickets previously purchased, prices paid and a few other data points.  In this same example the team may have tens of thousands of season ticket holders.  If we think of this “information” as a database we would have a row for every customer account (several thousand rows) and perhaps ten or twenty columns of variables related to each customer (past purchases and marketing activities).

One trend is that the number of explanatory variables is expanding in just about every category. In marketing applications we have much more purchase detail and often expanded demographics and psychographics.  However, the ratio of observations to columns usually still favors the observations.

In sports we (increasingly) face a very different data environment.  Especially, in player selection tasks like drafting or free agent signings.  The issue in player selection applications is that there are relatively few player level observations.  In particular, when we drill down into specific positions we often find ourselves having only tens or hundreds or player histories (depending on far back we want to go with the data).  In contrast, we may have an enormous number of variables per player.

We have historically had many different types of “box score” type stats but now we have entered into the era of player tracking and biometrics.  Now we can generate player stats related to second-by-second movement or even detailed physiological data.  In sports ranging from MMA to soccer to basketball the amount of variables has exploded.

A big question as we move forward into more modeling oriented topics is how do we deal with this situation?

The Best NFL Fans 2016: The Dynamic Fan Equity Methodology

The Winners (and Losers) of this years rankings!  First a quick graphic and then the details.

2016B_W

It’s become a tradition for me to rank NFL teams’ fan bases each summer.  The basic approach (more details here) is to use data to develop statistical models of fan interest.  These models are used to determine which cities fans are more willing to spend or follow their teams after controlling for factors like market size and short-term variations in performance.  In past years, two measures of engagement have been featured: Fan Equity and Social Media Equity.  Fan Equity focuses on home box office revenues (support via opening the wallet) and Social Media Equity focuses on fan willingness to engage as part of a team’s community (support exhibited by joining social media communities).

This year I have come up with a new method that combines these two measures: Dynamic Fan Equity (DFE).  The DFE measure leverages the best features of the two measures.  Fan Equity is based on the most important consumer trait – willingness to spend.  Social Equity captures fan support that occurs beyond the walls of the stadium and skews towards a younger demographic.  The key insight that allows for the two measures to be combined is that there is a significant relationship between the Social Media Equity trend and the Fan Equity measure.  Social media performance turns out to be a strong leading indicator for financial performance.

Dynamic Fan Equity is calculated using current fan equity and the trend in fan equity from the team’s social media performance.  I will spare the technical details on the blog but I’m happy to go into depth if there is interest.  On the data side we are working with 15 years of attendance data and 4 years of social data.

The Winners

We have a new number one on the list – the New England Patriots. Followed by the Cowboys, Broncos, 49ers and Eagles.  The Patriots victory is driven by fans willingness to pay premium prices, strong attendance and phenomenal social media following.  The final competition between the Cowboys and the Patriots was actually determined by the long-term value of the Patriots greater social following.  The Patriots have about 2.4 million Twitter followers compared to 1.7 for the Cowboys.  Of course this is all relative a team like the Jaguars has just 340 thousand followers.

The Eagles are the big surprise on the list.  The Eagles are also a good example of how the analysis works.  Most fan rankings are based on subjective judgments and lack controls for short-term winning rates.  This latter point is a critical shortcoming.  It’s easy to be supportive of a winning team. While Eagles fans might not be happy they are supportive in the face of mediocrity.  Last year the Eagles struggled on the field but fans still paid premium prices and filled the stadium.  We’ll come back to the Eagles in more detail in a moment.

The Strugglers

At the bottom we have the Bills, Rams, Chiefs, Raiders and Jaguars.  This is a similar list to last year.  The Jags, for example, only filled 91% of capacity (ranked 27th) despite an average ticket price of just $57.  The Chiefs struggle because the fan support doesn’t match the team’s performance.  The Chiefs capacity utilization rate ranks 17th in the league despite a winning record and low ticket prices.  The Raiders fans again finish low in our rankings.  And every year the response is a great deal of anger and often threats.

The Steelers

The one result that gives me the most doubt is for the Pittsburgh Steelers.  The Steelers have long been considered one of the league premier teams and brands.  The Steelers have a history of championships and have been known to turn opposing stadiums into seas of yellow and black.  So why are the Steelers ranked 18th?

steeler_atl

A comparison between the Steelers and the Eagles highlights the underlying issues.  Last year the Steelers had an average attendance of 64,356 and had an average ticket price of $84 (from ESPN and Team Market Report).  In comparison the Eagles averaged 69,483 fans with an average price of $98.69.  In terms of filling capacity the Steelers were at 98.3% compared to the Eagles at 102.8%.  The key is that the greater support enjoyed by the Eagles was despite a much worse record.

One issue to consider is that of pricing.  It may well be that the Steelers ownership makes a conscious effort to underprice relative to what the market would allow.  The high attendance rates across the NFL do suggest that many teams could profitably raise prices.  It’s entirely reasonable to argue that the Steelers relationship to the Pittsburgh community results in a policy of pricing below market.

In past years the Steelers have been our social media champions.  This past year did see a bit of a dip.  In terms of the Social Media Equity rankings the Steelers dropped to 5th.    As a point of comparison, the Steelers have about 1.3 million Twitter followers compared to 2.4 million for the Patriots and 1.7 million for the Cowboys.

 

The Complete List

And finally, the complete rankings.  Enjoy!


2016complete

Fan Rankings 2014

Evaluating sports brands, or any brands, is a complicated endeavor.  The fundamental issue is that a brand is an intangible asset so the analyst must rely on indirect measures of the brand.  Last year, we introduced a measure of fan loyalty that we termed “fan equity.”  This measure was based on the degree to which fans were willing to support a franchise after controlling for factors such as population and winning percentage.  We also explored a social media based metric that used a similar approach to evaluate a team’s success in building a social media footprint.

This summer, we are updating our analyses across the four major sports leagues (NFL, NBA, MLB, & NHL) and the two major college sports (football & basketball).  We are also including several additional analyses that further illuminate fan support and brand equity.  Shifting to multiple measures of “fan support” provides significant benefits.  First, using multiple measures allows for a form of triangulation, since we expect that a great fan base will excel on most or all of the measures.  The second benefit is that since each measure has some unique elements, the construction of multiple measures allows for a richer description of each fan base.  Next, we provide basic descriptions and critiques of each of the metrics to be published.

Fan Equity

Our baseline concept of fan quality is something we term fan equity.  This is similar in spirit to “brand equity” but is adapted to focus specifically on the intensity of customer preference (rather than to consider market coverage or awareness).  We calculate fan equity using a revenue-premium model.  The basic approach is to develop a statistical model of team revenues based on team performance and market characteristics.  We then compare the forecasted revenues from this model for each team to actual revenues.  When teams actual revenues exceed predicted revenues, we take this as evidence of superior fan support.

The fan equity measure has some significant benefits.  First, since it is calculated using revenues, it is based on actual fan spending decisions.  In general, measures based on actual purchasing are preferred to survey based data.  The other prime benefit is that a statistical model is used to control for factors such as market size, and short variations in team performance.  This allows the measure to reflect true preference levels for a team rather than effects due to a team playing in a large market, or because a team is currently a winner. However, the fan equity measure also has a couple of potential issues.  First, one of the distinguishing features of sports is capacity constraints.  Measures of attendance or revenues may therefore underestimate true consumer demand simply because we do not observe demand above stadium capacity.  The second issue relates to owner pricing decisions.  An implicit assumption in the revenue-premium model is that teams are revenue maximizers.

Social Media Equity

Our social media equity metric is similar in spirit to our fan equity measure, but rather than focus on revenues we use social community size as the key dependent measure.  The calculation of social media equity involves a statistical model that predicts social media community size as a function of market characteristics and current season performance.  Social media equity is then based on a comparison of actual versus predicted social media following.

The social media equity metric provides two key advantages relative to the revenue-premium metric.  Since social media following is not constrained by stadium size and does not require fans to make a financial sacrifice, this metric provides 1) a measure of unconstrained demand and 2) avoids assumptions about owner’s pricing decisions.  On the negative side, the social media equity does not differentiate between passive and engaged fans.  Following of a team on Facebook or Twitter requires a minimal, one time effort.

Trend Analysis (Fan Equity Growth)

A key issue in evaluating fan or brand equity is the time horizon used in the analysis.  The methods described above produce an estimate of “equity” for each season.  The dilemma is in determining how many years should be used to construct rankings.  The shorter the time horizon used, the more likely the results are to be biased by random fluctuations or one-time events.  On the other hand, using a long time horizon is problematic because fan equity is likely to evolve over time.  This year, we present an analysis of each team’s fan equity trajectory.

Price Elasticity and Win Elasticity

This year we are adding analyses that look at the sensitivity of attendance to winning and price at the team-level.  This is accomplished by estimating a model of attendance (demand) as a function of various factors such as price, population, and winning rates.  The key thing about this model specification is that we include team level dummy variables and interactions between the team dummies and the focal variables of winning and price.

The win elasticity provides a measure of the importance of quality in driving demand.  For example, if the statistical model finds that a team’s demand is unrelated to winning rate, then the implication is that fans have so much of a preference for the team that winning and losing don’t matter.  For a weaker team (brand) the model would produce a strong relationship between demand and winning.

This benefit of this measure is that the results come directly from data.  A possible issue with this analysis is that the results may be driven by omitted variables.  For example, prior to conducting the analysis we might speculate that demand for the Chicago Cubs might only be slightly related to the team’s winning percentage.  This speculation is based on the fact that the Cubs never seem to win but always seem to have a loyal following.  Our finding would, however, need to be evaluated with care since the “Cub” effect is perfectly correlated with a “Wrigleyville Neighborhood” effect.

Social Media Based Personality

This year we are adding another new analysis that uses social media (Twitter) data to evaluate the personality of different fan bases.  The foundation for this analysis is information on “sentiment.”  Sentiment is basically a measure of the tone of the conversation about a team.  To understand fan personality, we examine how Twitter sentiment varies over time.  We do comparisons of how much sentiment varies across teams.  This tells us if some fan bases are even-keeled while other are more volatile.  We can also look at whether some teams tend have higher highs or lower lows.  These analyses are based on the distribution of sentiment scores over a multiple year period.

Twitter based sentiment has both positives and negatives.  On the positive side, Twitter conversations are useful because they represent the unfiltered opinions of fans.  Fans are free to be as happy or as distraught as they want to be.  The availability of sentiment over time is also useful as it allows for the capture of how opinion changes over time.  On the downside, Twitter sentiment scores are only as good as the algorithm used to evaluate each Tweet.  Twitter data may also be a bit biased towards the opinions of younger fans.

Mike Lewis & Manish Tripathi, Emory University 2014.

NFL Fan Equity: Method Limitations and Focus on the Falcons

Our analyses frequently generate criticism.  Our work has been described as “garbage,” “silly” and “annoying” (and this is just from Mike’s wife).  To us, one of the most interesting things about this project is that we are often surprised by whom we offend.  In the case of last week’s analysis, we were humored by the fact that Saints fans seemed equally interested in their 4th place ranking and the Falcons’ 31st place ranking.  Given that we are based in Atlanta, we thought it would be a good idea to discuss why the Falcons finished so low and, more importantly, how these results should be interpreted.

Our starting point in these analyses is that we are evaluating fandom from a marketing perspective.   This means that we are trying to identify which customer base is the most loyal in terms of their willingness to support their team through buying tickets.  This may seem like a crass measure to some, but it is at least an objective and observable metric.  Most critics seem to want us to somehow read the minds of the fans, and make ratings based on “passion.”  This is a fine notion but the implementation is somewhere between difficult and impossible.  Difficult, because a large scale survey would be needed to ask fans questions about how passionate they are, and nearly impossible because the survey would need to be repeated year after year to control for variation in team quality.

Our method, like all methods, has some limitations.  In our case, two limitations are most notable.  First, we rely on publicly available data (FCI pricing data, ESPN attendance estimates, Forbes’ team value estimates, US Census data, Title IX reporting data, etc.).  Publicly available data (and private data) will always contain inaccuracies.  The real question is whether the publicly available data is inherently biased against certain teams or types of teams.  We are happy to listen to debate about this issue.

The second limitation relates to a team’s marketing objectives.  One issue in sports marketing is that we do not get to observe true demand due to the constraints imposed by stadiums with finite capacities.  For this reason, we primarily rely on estimates of revenue.  This is an important distinction because it means that we implicitly make an assumption about how teams price.  The implicit assumption is that teams are attempting to maximize revenues.

You can definitely criticize this assumption.  This assumption comes into play when evaluating teams that regularly sellout (e.g. Green Bay).  How can these fans be any more loyal?  This leads to the question of why don’t teams like the Packers price higher.  I can think of a couple of potential answers.  One, perhaps they don’t have enough information or expertise to maximize revenues.  Demand forecasting for an NFL stadium is a non-trivial task.  Historical data is of limited use because demand for certain types of seats is censored.  The variation in the quality of tickets is also a problem as revenue maximizing teams would also need to understand the cross-elasticities across ticket types.

But the salient question is: if not a revenue maximizing assumption then what?  The best answer, we believe, is that some teams may systematically underprice in order to build or invest in their customer base.  The logic is that because the team lacks an extended tradition of success or that the team competes locally with other sports offerings, it makes sense to charge below market rates to get people into an exciting, sold-out stadium.  Of course, as more astute readers may have noticed, this explanation is also consistent with the story that the team lacks brand equity.  We could also make arguments that some team price too high and may therefore be “harvesting” brand equity.

This brings us back to the case of the Atlanta Falcons.  The explanation for why the Falcons finished low despite recent success on the field and in terms of sellout attendance is because they price lower than would be expected.  According to the Team Market Report’s fan cost index, over the last decade the Falcons have tended to price below the league average.  But it isn’t sufficient to just consider relative prices.  We also need to consider the “quality” of the market.  The Atlanta metro area has population and median income levels that are well above the league averages.

The other issue that was mentioned locally is: what does this mean for the Falcons’ quest for a new stadium?  A case can be made that our findings support the need for a new stadium. If we believe the assumption that professional sports are an important civic asset (because they draw attention, create economic value, enhance the culture, etc.) then it makes sense for the city to invest in the team.  The Falcons’ have a relatively short history, and play in a city full of transplants.  Just as the Falcons may be underpricing in order to develop their fan equity, it may make sense for the local community to also invest back into the team.

Click here for an alternative methodology for ranking fan bases that relies on social media data.

Mike Lewis & Manish Tripathi, Emory University 2013.