Player Analytics Fundamentals: Part 2 – Performance Metrics

I want to start the series with the topic of “Metric Development.”  I’m going to use the term “metric” but I could have just as easily used words like stats, measures or KPIs.  Metrics are the key to sports and other analytics functions since we need to be sure that we have the right performance standards in place before we try and optimize.  Let me say that one more time – METRIC DEVELOPMENT IS THE KEY.

The history of sports statistics has focused on so called “box score” statistics such as hits, runs or RBIs in baseball.  These simple statistics have utility but also significant limitations.  For example, in baseball a key statistic is batting average.  Batting average is intuitively useful as it shows a player’s ability to get on base and to move other runners forward.  However, batting average is also limited as it neglects the difference between types of hits.  In a batting average calculation, a double or home run is of no greater value than a single.  It also neglects the value of walks.

These short-comings motivated the development of statistics like OBPS (on base plus slugging).  Measures like OBPS that are constructed from multiple statistics are appealing because they begin to capture the multiple contributions made by a player.  On the downside these types of constructed statistics often have an arbitrary nature in terms of how component statistics are weighted.

The complexity of player contributions and the “arbitrary nature” of how simple statistics are weighted is illustrated by the formula for the NFL quarterback ratings.

This equation combines completion percentage (COMP/ATT), yards per attempt (YARDS/ATT), touchdown rate (TD/ATT) and interception rate (INT/ATT) to arrive at a single statistic for a quarterback.  On the plus side the metric includes data related to “accuracy” (completion percentage) to “scale” (yards per), to “conversion” (TDs), and to “failures” (interceptions).  We can debate if this is a sufficiently complete look at QBs (should we include sacks?) but it does cover multiple aspects of passing performance.   However, a common reaction to the formula is a question about where the weights come from.  Why is completion rate multiplied by 5 and touchdown rates multiplied by 20?

Is it a great statistic?  One way to evaluate is via a quick check of the historical record.  Does the historical ranking jive with our intuition?  Here is a link to historical rankings.

Every sport has examples of these kinds of “multi-attribute” constructed statistics.  Basketball has player efficiency metrics that involve weighting a player’s good events (points, rebounds, steals) and negative outcomes (turnovers, fouls, etc…).  The OBPS metric involves an implicit assumption that “on base percentage” and “slugging” are of equal value.

One area I want to explore is how we should construct these types of performance metrics.  This is a discussion that involves some philosophy and some statistics.  We will take this piece by piece and also show a couple of applications along the way.

Player Analytics Fundamentals: Part 1

Each Spring I teach courses on Sports Analytics.  These courses include both Marketing Analytics and On-Field Analytics.  The “Blog” has tended to focus on the Marketing of Fan side.  Moving forward, I think the balance is going to change a bit.  My plan is to re-balance the blog to include more of the on-field topics.

Last year I published a series of posts related to the fundamentals of sports analytics.  This material is relevant to both the marketing and the team performance sides of sports analytics.  This series featured comments on organizational design and decision theory.

This series is going to be a bit different than the team and player “analytics” that we see on the web.  Rather than present specific studies, I am going to begin with some fundamental principles and talk about a “general” approach to player analytics.  There is a lot of material on the web related to very specific sports analytics questions.  Analytics can be applied to baseball, football, soccer and every other sport.  And within each of these games there are countless questions to be addressed.

Rather than contribute to the littered landscape, I want to talk about how I approach sports analytics questions.  In some ways, this series is the blue print I use for thinking about sports analytics in the classroom.  My starting point is that I want to provide skills and insights that can be applied to any sport.  So we start with the fundamentals and we think a lot about how to structure problems.  I want to supply grounded general principles that can be applied to any player analytics problem.

So what’s the plan?  At a high level, sports analytics are about prediction.  We will start with a discussion about what we should be predicting.  This is a surprisingly complex issue.  From there we will talk a little bit about different statistical models.  This won’t be too bad, because I’m a firm believer in using the simplest possible models.  The second half of the series will focus on different types of prediction problems.  These will range from predicting booms and busts, to a look at how to do “comparables” in a better fashion.  In terms of the data, I think it will be a mix of football and the other kind of football.

 

NBA Fan Rankings: 2016 Edition

On an (almost) annual basis I present rankings of fan bases across major professional and collegiate leagues.  Today it is time for the NBA.   First, the winners and losers in this year’s rankings.  At the top of the list we have the Knicks, Lakers and Bulls. This may be the trifecta of who the league would love to have playing at Christmas and in the Finals.  At the bottom we have the Grizzlies, Nets and Hornets.

nba2016

Before i get into the details it may be helpful to briefly mention what differentiates these rankings from other analyses of teams and fans. My rankings are driven by statistical models of how teams perform on a variety of marketing metrics.  The key insight is that these models allow us to control for short-run variation in team performance and permanent differences in market potential.  In other words – the analysis uses data to identify engagement or passion (based on attend and spend) beyond what is expected based on how a team is performing and where the team is located.   More details on the methodology can be found here.

spike-lee-knicks

The Winners

This year’s list contains no real surprises.  The top five teams are all major market teams with storied traditions.  The top fan base belongs to the Knicks.   The Lakers, Bulls, Heat and Celtics follow.  The Knicks  highlight how the model works.  While the Knicks might not be winning , Knicks fans still attend and spend.

The number two team on the list (The Lakers) is in much the same situation. A dominant brand with a struggling on-court product.   The Lakers and Clippers are an interesting comparison.  Last season, the Clippers did just a bit better in terms of attendance (100.7% versus 99.7%).  But the Lakers filled their seats with an average ticket price that was substantially higher.  The power of the Laker brand is shown in this comparison because these outcomes occurred in a season where the Clippers won many more games.

Why are the Lakers still the bigger draw?  Is this a star (Kobe) effect?  Probably in part, but fan loyalty is something that evolves over time.  The Lakers have the championships, tradition and therefore the brand loyalty.  It will be interesting to see how much equity is retained long-term if the team is unable to quickly reload.  The shared market makes this an interesting story to watch. I suspect that the Lakers will continue to be the stronger brand for quite a while.

The Losers

At the bottom of the list we have Memphis, Brooklyn and Charlotte.  The interesting one in this group is Brooklyn.  Why do the Nets rank poorly?  It ends up being driven by the relative success of the Knicks versus the Nets.  The Knicks have much more pricing power while the teams operate in basically the same market (we can debate this point).  According to ESPN, the Knicks drew 19,812 fans (100% of capacity) while the Nets filled 83.6% of their building.  The Knicks also command much higher ticket prices.  And while the Nets were worse (21 victories) the Knicks were far from special (32 wins).

What can the teams at the bottom of the list do?  When you go into the data and analyze what drives brand equity the results are intuitive.   Championships, deep playoff runs and consistent playoff appearances are the key to building equity.  easy to understand but tough to accomplish.

And a Draw

An interesting aside in all this is what it means for the league.  The NBA has long been a star and franchise driven league.  In the 1980s it was about the Lakers (Magic) and Celtics (Bird).  In the 1990s it was Michael Jordan and the Bulls.  From there we shifted into Kobe and Lebron.

On one hand, the league might be (even) stronger if the top teams were the Bulls, Knicks and Lakers.  On the other hand, the emergence of Steph Curry and Golden State has the potential to help build another powerful brand.

Some more thoughts…

The Fan Equity metric is just one possible means for assessing fan bases.  In this year’s NFL rankings I reported several more analyses that focus on different market outcomes.  These were social media following, road attendance and win sensitivity (bandwagon fans).  Looking at social following tells us something about the future of the brand as it (broadly) captures fan interest of a younger demographic.  Road Attendance tells us something about national rather than local following.  These analyses also use statistical models to control for market and team performance effects.

Social Equity

Top Social Equity Team: The Lakers

Bottom Social equity: The Nets

Comment: The Lakers are an immensely strong brand on many dimensions.  The Nets are a mid-range brand when you look at raw numbers.  But they suffer when we account for them operating in the NY market.

Road Equity

Top Road Equity: The Lakers

Bottom Road Equity: Portland

Comment: The Lakers dominate.  And as this analysis was done looking at fixed effects across 15 years it is not solely due to Kobe Bryant.  Portland does well locally but is not of much interest nationally.

It is possible to do even more.  We can even look at factors such as win or price sensitivity. Win sensitivity (or bandwagon behavior) tells us whose fans only show up when a team is winning and price sensitivity tells us if a fan base is willing to show up when prices go up.  I’m skipping these latter two analyses today just to avoid overkill (available upon request).  The big message is that we can potentially construct a collection of metrics that provide a fairly comprehensive and deep understanding of each team’s fan base and brand.

Note: I have left one team off the list.  I have decided to stop reporting the local teams (Emory is in Atlanta).  The local teams have all been great to both myself and the Emory community.  This is just a small effort to eliminate some headaches for myself.

Finally… The complete list

City Fan Equity
Boston 5
Charlotte 27
Chicago 3
Cleveland 20
Dallas 15
Denver 11
Detroit 25
GoldenState 16
Houston 7
Indiana 21
LAClips 17
LALakers 2
Memphis 29
Miami 4
Milwaukee 14
Minnesota 22
Brooklyn 28
NewOrleans 24
NYKnicks 1
OKCity 13
Orlando 19
Philadelphia 26
Phoenix 9
Portland 6
Sacramento 10
SanAntonio 12
Toronto 18
Utah 8
Washington 23
 

Analytics, Trump, Clinton and the Polls: Sports Analytics Series Part 5.1

Recent presidential elections (especially 2008 and 2012) have featured heavy use of analytics by candidates and pundits.  The Obama campaigns were credited with using micro targeting and advanced analytics to win elections. Analysts like Nate Silver were hailed as statistical gurus who could use polling data to predict outcomes.  In the lead up to this year’s contest we heard a lot about the Clinton campaign’s analytical advantages and the election forecasters became regular parts of election coverage.

Then Tuesday night happened.  The polls were wrong (by a little) and the advanced micro targeting techniques didn’t pay off (enough).

Why did the analytics fail?

First the polls and the election forecasts (I’ll get to the value of analytics next week). As background, commentators tend to not truly understand polls.  This creates confusion because commentators frequently over- and misinterpret what polls are saying.  For example, whenever “margin of error” is mentioned they tend to get things wrong.  A poll’s margin of error is based on sample size.  The common journalist’s error is that when you are talking about a collection of polls the sample size is much larger than an individual poll with a margin of error of 3% or 4%.  When looking at an average of many polls the “margin of error” is much smaller because the “poll of polls” has a much larger sample size.  This is a key point because when we think about the combined polls it is even more clear that something went wrong in 2016.

Diagnosing what went wrong is complicated by two factors.  First, it should be noted that because every pollster does things differently we can’t make blanket statements or talk in absolutes.  Second, diagnosing the problem requires a deep understanding of the statistics and assumptions involved in polling.

In the 2016 election my suspicion is that a two things went wrong.  As a starting point – we need to realize that polls include strong implicit assumptions about the nature of the underlying population and about voter passion (rather than preference).  When these assumptions don’t hold the polls will systematically fail.

First, most polls start with assumptions about the nature of the electorate.  In particular, there are assumptions about the base levels of Democrats, Republicans and Independents in the population.  Very often the difference between polls relates to these assumptions (LA Times versus ABC News).

The problem with assumptions about party affiliation in an election like 2016, is that the underlying coalitions of the two parties are in transition.  When I grew up the conventional wisdom was that the Republicans were the wealthy, the suburban professionals, and the free trading capitalists while the democrats were the party of the working man and unions.  Obviously these coalitions have changed.  My conjecture is that pollsters didn’t sufficiently re-balance.  In the current environment it might make sense to place greater emphasis on demographics (race and income) when designing sampling segments.

The other issue is that more attention needs to be paid towards avidity / engagement/ passion (choose your own marketing buzz word).  Polls often differentiate between likely and registered voters.  This may have been insufficient in this election. If Clinton’s likely voters were 80% likely to show up and Trumps were 95% likely then having a small percentage lead in a preference poll isn’t going to hold up in an election.

The story of the 2016 election should be something every analytics professional understands.  From the polling side the lesson is that we need to understand and question the underlying assumptions of our model and data.  As the world changes do our assumptions still hold?  Is our data still measuring what we hope it does?  Is a single dependent measure (preference versus avidity in this case) enough?

Moving towards Modeling & Lessons from Other Arenas: Sports Analytics Series Part 5

The material in this series is derived from a combination of my experiences in sports applications and my experiences in customer analysis and database marketing.  In many respects, the development of an analytics function is similar across categories and contexts.  For instance, a key issue in any analytics function is the designing and creation of an appropriate data structure.  Creating or acquiring the right kinds of analytics capabilities (statistical skills) is also a common need across industries.

A need to understand managerial decision making styles is also common across categories.  It’s necessary to understand both the level of interest in using analytics and also the “technical level” of the decision makers.  Less experienced data scientists and statistician have a tendency to use too complicated of methods.  This can be a killer.  If the models are too complex they won’t be understood and then they won’t be used.  Linear regression with perhaps a few extensions (fixed effects, linear probability models) are usually the way to go.    Because sports organizations have less history in terms of using analytics the issue of balancing complexity can be especially challenging.

A key distinction between many sports and marketing applications is the number of variables versus the number of observations.  This is an important point of distinction between sports and non-sports industries and it is also an important issue for when we shift to discussing modeling in a couple of weeks.  When I use the term variables I am referencing individual elements of data.  For example, an element of data could be many different things such as a player’s weight or the number of shots taken or the minutes played.  We might also break variables into the categories of dependent variables (things to explain) versus independent variables (things to explain with).  When I use the term observations I am talking about “units of analysis” like players or games.

In many (most) business contexts we have many observations.  A large company may have millions of customer accounts.  There may, however, be relatively few explanatory variables.  The firm may have only transaction history variables and limited demographics.  Even in sports marketing a team interested in modeling season ticket retention may only have information such as the number of tickets previously purchased, prices paid and a few other data points.  In this same example the team may have tens of thousands of season ticket holders.  If we think of this “information” as a database we would have a row for every customer account (several thousand rows) and perhaps ten or twenty columns of variables related to each customer (past purchases and marketing activities).

One trend is that the number of explanatory variables is expanding in just about every category. In marketing applications we have much more purchase detail and often expanded demographics and psychographics.  However, the ratio of observations to columns usually still favors the observations.

In sports we (increasingly) face a very different data environment.  Especially, in player selection tasks like drafting or free agent signings.  The issue in player selection applications is that there are relatively few player level observations.  In particular, when we drill down into specific positions we often find ourselves having only tens or hundreds or player histories (depending on far back we want to go with the data).  In contrast, we may have an enormous number of variables per player.

We have historically had many different types of “box score” type stats but now we have entered into the era of player tracking and biometrics.  Now we can generate player stats related to second-by-second movement or even detailed physiological data.  In sports ranging from MMA to soccer to basketball the amount of variables has exploded.

A big question as we move forward into more modeling oriented topics is how do we deal with this situation?

The Best NFL Fans 2016: The Dynamic Fan Equity Methodology

The Winners (and Losers) of this years rankings!  First a quick graphic and then the details.

2016B_W

It’s become a tradition for me to rank NFL teams’ fan bases each summer.  The basic approach (more details here) is to use data to develop statistical models of fan interest.  These models are used to determine which cities fans are more willing to spend or follow their teams after controlling for factors like market size and short-term variations in performance.  In past years, two measures of engagement have been featured: Fan Equity and Social Media Equity.  Fan Equity focuses on home box office revenues (support via opening the wallet) and Social Media Equity focuses on fan willingness to engage as part of a team’s community (support exhibited by joining social media communities).

This year I have come up with a new method that combines these two measures: Dynamic Fan Equity (DFE).  The DFE measure leverages the best features of the two measures.  Fan Equity is based on the most important consumer trait – willingness to spend.  Social Equity captures fan support that occurs beyond the walls of the stadium and skews towards a younger demographic.  The key insight that allows for the two measures to be combined is that there is a significant relationship between the Social Media Equity trend and the Fan Equity measure.  Social media performance turns out to be a strong leading indicator for financial performance.

Dynamic Fan Equity is calculated using current fan equity and the trend in fan equity from the team’s social media performance.  I will spare the technical details on the blog but I’m happy to go into depth if there is interest.  On the data side we are working with 15 years of attendance data and 4 years of social data.

The Winners

We have a new number one on the list – the New England Patriots. Followed by the Cowboys, Broncos, 49ers and Eagles.  The Patriots victory is driven by fans willingness to pay premium prices, strong attendance and phenomenal social media following.  The final competition between the Cowboys and the Patriots was actually determined by the long-term value of the Patriots greater social following.  The Patriots have about 2.4 million Twitter followers compared to 1.7 for the Cowboys.  Of course this is all relative a team like the Jaguars has just 340 thousand followers.

The Eagles are the big surprise on the list.  The Eagles are also a good example of how the analysis works.  Most fan rankings are based on subjective judgments and lack controls for short-term winning rates.  This latter point is a critical shortcoming.  It’s easy to be supportive of a winning team. While Eagles fans might not be happy they are supportive in the face of mediocrity.  Last year the Eagles struggled on the field but fans still paid premium prices and filled the stadium.  We’ll come back to the Eagles in more detail in a moment.

The Strugglers

At the bottom we have the Bills, Rams, Chiefs, Raiders and Jaguars.  This is a similar list to last year.  The Jags, for example, only filled 91% of capacity (ranked 27th) despite an average ticket price of just $57.  The Chiefs struggle because the fan support doesn’t match the team’s performance.  The Chiefs capacity utilization rate ranks 17th in the league despite a winning record and low ticket prices.  The Raiders fans again finish low in our rankings.  And every year the response is a great deal of anger and often threats.

The Steelers

The one result that gives me the most doubt is for the Pittsburgh Steelers.  The Steelers have long been considered one of the league premier teams and brands.  The Steelers have a history of championships and have been known to turn opposing stadiums into seas of yellow and black.  So why are the Steelers ranked 18th?

steeler_atl

A comparison between the Steelers and the Eagles highlights the underlying issues.  Last year the Steelers had an average attendance of 64,356 and had an average ticket price of $84 (from ESPN and Team Market Report).  In comparison the Eagles averaged 69,483 fans with an average price of $98.69.  In terms of filling capacity the Steelers were at 98.3% compared to the Eagles at 102.8%.  The key is that the greater support enjoyed by the Eagles was despite a much worse record.

One issue to consider is that of pricing.  It may well be that the Steelers ownership makes a conscious effort to underprice relative to what the market would allow.  The high attendance rates across the NFL do suggest that many teams could profitably raise prices.  It’s entirely reasonable to argue that the Steelers relationship to the Pittsburgh community results in a policy of pricing below market.

In past years the Steelers have been our social media champions.  This past year did see a bit of a dip.  In terms of the Social Media Equity rankings the Steelers dropped to 5th.    As a point of comparison, the Steelers have about 1.3 million Twitter followers compared to 2.4 million for the Patriots and 1.7 million for the Cowboys.

 

The Complete List

And finally, the complete rankings.  Enjoy!


2016complete

Fan Rankings 2014

Evaluating sports brands, or any brands, is a complicated endeavor.  The fundamental issue is that a brand is an intangible asset so the analyst must rely on indirect measures of the brand.  Last year, we introduced a measure of fan loyalty that we termed “fan equity.”  This measure was based on the degree to which fans were willing to support a franchise after controlling for factors such as population and winning percentage.  We also explored a social media based metric that used a similar approach to evaluate a team’s success in building a social media footprint.

This summer, we are updating our analyses across the four major sports leagues (NFL, NBA, MLB, & NHL) and the two major college sports (football & basketball).  We are also including several additional analyses that further illuminate fan support and brand equity.  Shifting to multiple measures of “fan support” provides significant benefits.  First, using multiple measures allows for a form of triangulation, since we expect that a great fan base will excel on most or all of the measures.  The second benefit is that since each measure has some unique elements, the construction of multiple measures allows for a richer description of each fan base.  Next, we provide basic descriptions and critiques of each of the metrics to be published.

Fan Equity

Our baseline concept of fan quality is something we term fan equity.  This is similar in spirit to “brand equity” but is adapted to focus specifically on the intensity of customer preference (rather than to consider market coverage or awareness).  We calculate fan equity using a revenue-premium model.  The basic approach is to develop a statistical model of team revenues based on team performance and market characteristics.  We then compare the forecasted revenues from this model for each team to actual revenues.  When teams actual revenues exceed predicted revenues, we take this as evidence of superior fan support.

The fan equity measure has some significant benefits.  First, since it is calculated using revenues, it is based on actual fan spending decisions.  In general, measures based on actual purchasing are preferred to survey based data.  The other prime benefit is that a statistical model is used to control for factors such as market size, and short variations in team performance.  This allows the measure to reflect true preference levels for a team rather than effects due to a team playing in a large market, or because a team is currently a winner. However, the fan equity measure also has a couple of potential issues.  First, one of the distinguishing features of sports is capacity constraints.  Measures of attendance or revenues may therefore underestimate true consumer demand simply because we do not observe demand above stadium capacity.  The second issue relates to owner pricing decisions.  An implicit assumption in the revenue-premium model is that teams are revenue maximizers.

Social Media Equity

Our social media equity metric is similar in spirit to our fan equity measure, but rather than focus on revenues we use social community size as the key dependent measure.  The calculation of social media equity involves a statistical model that predicts social media community size as a function of market characteristics and current season performance.  Social media equity is then based on a comparison of actual versus predicted social media following.

The social media equity metric provides two key advantages relative to the revenue-premium metric.  Since social media following is not constrained by stadium size and does not require fans to make a financial sacrifice, this metric provides 1) a measure of unconstrained demand and 2) avoids assumptions about owner’s pricing decisions.  On the negative side, the social media equity does not differentiate between passive and engaged fans.  Following of a team on Facebook or Twitter requires a minimal, one time effort.

Trend Analysis (Fan Equity Growth)

A key issue in evaluating fan or brand equity is the time horizon used in the analysis.  The methods described above produce an estimate of “equity” for each season.  The dilemma is in determining how many years should be used to construct rankings.  The shorter the time horizon used, the more likely the results are to be biased by random fluctuations or one-time events.  On the other hand, using a long time horizon is problematic because fan equity is likely to evolve over time.  This year, we present an analysis of each team’s fan equity trajectory.

Price Elasticity and Win Elasticity

This year we are adding analyses that look at the sensitivity of attendance to winning and price at the team-level.  This is accomplished by estimating a model of attendance (demand) as a function of various factors such as price, population, and winning rates.  The key thing about this model specification is that we include team level dummy variables and interactions between the team dummies and the focal variables of winning and price.

The win elasticity provides a measure of the importance of quality in driving demand.  For example, if the statistical model finds that a team’s demand is unrelated to winning rate, then the implication is that fans have so much of a preference for the team that winning and losing don’t matter.  For a weaker team (brand) the model would produce a strong relationship between demand and winning.

This benefit of this measure is that the results come directly from data.  A possible issue with this analysis is that the results may be driven by omitted variables.  For example, prior to conducting the analysis we might speculate that demand for the Chicago Cubs might only be slightly related to the team’s winning percentage.  This speculation is based on the fact that the Cubs never seem to win but always seem to have a loyal following.  Our finding would, however, need to be evaluated with care since the “Cub” effect is perfectly correlated with a “Wrigleyville Neighborhood” effect.

Social Media Based Personality

This year we are adding another new analysis that uses social media (Twitter) data to evaluate the personality of different fan bases.  The foundation for this analysis is information on “sentiment.”  Sentiment is basically a measure of the tone of the conversation about a team.  To understand fan personality, we examine how Twitter sentiment varies over time.  We do comparisons of how much sentiment varies across teams.  This tells us if some fan bases are even-keeled while other are more volatile.  We can also look at whether some teams tend have higher highs or lower lows.  These analyses are based on the distribution of sentiment scores over a multiple year period.

Twitter based sentiment has both positives and negatives.  On the positive side, Twitter conversations are useful because they represent the unfiltered opinions of fans.  Fans are free to be as happy or as distraught as they want to be.  The availability of sentiment over time is also useful as it allows for the capture of how opinion changes over time.  On the downside, Twitter sentiment scores are only as good as the algorithm used to evaluate each Tweet.  Twitter data may also be a bit biased towards the opinions of younger fans.

Mike Lewis & Manish Tripathi, Emory University 2014.

NFL Fan Equity: Method Limitations and Focus on the Falcons

Our analyses frequently generate criticism.  Our work has been described as “garbage,” “silly” and “annoying” (and this is just from Mike’s wife).  To us, one of the most interesting things about this project is that we are often surprised by whom we offend.  In the case of last week’s analysis, we were humored by the fact that Saints fans seemed equally interested in their 4th place ranking and the Falcons’ 31st place ranking.  Given that we are based in Atlanta, we thought it would be a good idea to discuss why the Falcons finished so low and, more importantly, how these results should be interpreted.

Our starting point in these analyses is that we are evaluating fandom from a marketing perspective.   This means that we are trying to identify which customer base is the most loyal in terms of their willingness to support their team through buying tickets.  This may seem like a crass measure to some, but it is at least an objective and observable metric.  Most critics seem to want us to somehow read the minds of the fans, and make ratings based on “passion.”  This is a fine notion but the implementation is somewhere between difficult and impossible.  Difficult, because a large scale survey would be needed to ask fans questions about how passionate they are, and nearly impossible because the survey would need to be repeated year after year to control for variation in team quality.

Our method, like all methods, has some limitations.  In our case, two limitations are most notable.  First, we rely on publicly available data (FCI pricing data, ESPN attendance estimates, Forbes’ team value estimates, US Census data, Title IX reporting data, etc.).  Publicly available data (and private data) will always contain inaccuracies.  The real question is whether the publicly available data is inherently biased against certain teams or types of teams.  We are happy to listen to debate about this issue.

The second limitation relates to a team’s marketing objectives.  One issue in sports marketing is that we do not get to observe true demand due to the constraints imposed by stadiums with finite capacities.  For this reason, we primarily rely on estimates of revenue.  This is an important distinction because it means that we implicitly make an assumption about how teams price.  The implicit assumption is that teams are attempting to maximize revenues.

You can definitely criticize this assumption.  This assumption comes into play when evaluating teams that regularly sellout (e.g. Green Bay).  How can these fans be any more loyal?  This leads to the question of why don’t teams like the Packers price higher.  I can think of a couple of potential answers.  One, perhaps they don’t have enough information or expertise to maximize revenues.  Demand forecasting for an NFL stadium is a non-trivial task.  Historical data is of limited use because demand for certain types of seats is censored.  The variation in the quality of tickets is also a problem as revenue maximizing teams would also need to understand the cross-elasticities across ticket types.

But the salient question is: if not a revenue maximizing assumption then what?  The best answer, we believe, is that some teams may systematically underprice in order to build or invest in their customer base.  The logic is that because the team lacks an extended tradition of success or that the team competes locally with other sports offerings, it makes sense to charge below market rates to get people into an exciting, sold-out stadium.  Of course, as more astute readers may have noticed, this explanation is also consistent with the story that the team lacks brand equity.  We could also make arguments that some team price too high and may therefore be “harvesting” brand equity.

This brings us back to the case of the Atlanta Falcons.  The explanation for why the Falcons finished low despite recent success on the field and in terms of sellout attendance is because they price lower than would be expected.  According to the Team Market Report’s fan cost index, over the last decade the Falcons have tended to price below the league average.  But it isn’t sufficient to just consider relative prices.  We also need to consider the “quality” of the market.  The Atlanta metro area has population and median income levels that are well above the league averages.

The other issue that was mentioned locally is: what does this mean for the Falcons’ quest for a new stadium?  A case can be made that our findings support the need for a new stadium. If we believe the assumption that professional sports are an important civic asset (because they draw attention, create economic value, enhance the culture, etc.) then it makes sense for the city to invest in the team.  The Falcons’ have a relatively short history, and play in a city full of transplants.  Just as the Falcons may be underpricing in order to develop their fan equity, it may make sense for the local community to also invest back into the team.

Click here for an alternative methodology for ranking fan bases that relies on social media data.

Mike Lewis & Manish Tripathi, Emory University 2013.

“Revenue Premium” Versus Survey-Based Attitudinal Measures

A criticism of our previous rankings of fan bases is that our approach is overly financial and doesn’t capture the “passion” of fans.  This critique has some validity but probably less than our critics realize.  When we talk about quantifying customer loyalty in sports or even in general marketing contexts we very quickly run into some challenges.

For example, when I speak to classes about what loyalty means, the first answer I get is that loyal customers engage in repeat buying of a brand.  I will then throw out the example of the local cable company.  The key to this example is that cable companies have very high repeat buying rates but they also frequently have fairly unhappy customers.  When asked if a company can have loyal but unhappy customers students quickly realize that it is difficult to cleanly measure loyalty.

Another distinction I make when teaching is the difference between observable and unobservable measures of loyalty.  As a marketer, I can often measure repeat buying and customer lifetime.  I can even convert this into some measure of customer lifetime value.  These are observable measures.  On the other hand other loyalty oriented factors such as customer satisfaction, preference or likelihood of repurchase are unobservable, unless I do an explicit survey.

I think what our critics are getting at is that they would prefer to see primary / survey data of customer preference or intensity (questions such as on a 1 to 7 scale rank how much you love the Florida Gators).  BUT, what our critics don’t seem to get is that this type of primary data collection would also suffer from some significant flaws.  First, whenever we do a consumer survey we worry about response bias.  The issue is how do we collect a representative sample of college or pro sports fans?  This is an unsolvable problem that we tend to live with in marketing since anyone who is willing to answer a survey (spend time with a marketing researcher) is by definition non-representative (a bit weird, I know).

A second and more profound issue is that it would be impossible to separate out the effects of current season performance from underlying loyalty using a survey.  I suspect that if you surveyed Michigan basketball fans this year you would find a great deal of loyalty to the team.  But I think we all know that fans of winning teams will be much happier and therefore respond much more positively during a winning season.

Related to the preceding two issues is that our critics seem to assume that they know what is in the heart of various fan bases.  Mike Decourcey took exception with our college basketball rankings that rated Louisville over Kentucky and Oklahoma State over Kansas.  A key mistake he makes is that he assumes that somehow he just knows that Kentucky fans are more passionate than Louisville’s, or that Kansas fans love their team more than Oklahoma State loves theirs.  He knows this based not on any systematic review of data, but based on a few anecdotes (this is especially convenient since the reliance on anecdotes means that there is no need to control for team quality) and his keen insight into the psyches of fans everywhere.

The other issue is whether our “Revenue Premium” captures fan passion or just disposable income.  This is another impossible question to fully answer, but in our defense the nice thing about this measure is that it is observable, and willingness to pay for a product is about the best measure of preference you can get short of climbing into someone’s head.  I think another way in which our critics are confused is that they associate noise with loyalty.  Is an active and loud student section a true measure of the fan base quality?  Perhaps so, but do we really believe that the 19 year old face painter is a better fan than the alumni who has been attending for 40 years but no longer stands up for the entire game?

Converting High School Talent into NBA Draft Picks: Ranking the ACC

The NBA Draft can be a time for college basketball fans to cheer about the “success” of their basketball program.  Kentucky, Duke, North Carolina, and Kansas fans can boast about the number of alums currently in the NBA.  This year, ESPN is taking that discussion one step farther by describing the quality of NBA players produced, and ranking the “NBA Pedigree” of colleges.

Our take is a bit different as we will examine the process of taking high school talent and converting it into NBA draft picks. In other words, we want to understand how efficient are colleges at transforming their available high school talent into NBA draft picks? Today, we launch our NBA draft series by ranking the schools in the ACC based on their ability to convert talent into draft picks.

The initial approach is fairly simple.  Each year, (almost) every basketball program has an incoming freshman class.  The players in the class have been evaluated by several national recruiting/ranking companies (e.g. Rivals, Scout, etc…).  In theory, these evaluations provide a measure of the player’s talent or quality*.  Each year, we also observe which players get drafted by the NBA.  Thus, we can measure conversion rates over time for each college.  Conversion rates may be indicative of the school’s ability to coach-up talent, to identify talent, or to invest in players.  These rates may also depend on the talent composition of all of the players on the team.  This last factor is particularly important from a recruiting standpoint.  Should players flock to places that other highly ranked players have selected?  Should they look for places where they have a higher probability of getting on the court quickly? Next week we will present a statistical analysis (logistic regression) that includes multiple factors (quality of other recruits, team winning rates, tournament success, investment in the basketball program, etc…). But for now we will just present simple statistics related to school’s ability to produce output (NBA draft picks) as a function of input (quality of recruits).

Our first set of rankings is for the ACC.  At the top of the list we have Boston College and Georgia Tech.  Boston College has done a good job of converting low-ranked talent into NBA picks (in this time period they had two three-star players and a non-rated player drafted).  Georgia Tech, on the other hand, has converted all of its five-star recruits, and several of its four-star recruits.  A result that may at first glance seem surprising is the placement of UNC and Duke.  However, upon reflection these results make a good deal of sense.  When players choose these “blue blood” programs they face stiff competition for playing time from both current and future teammates.

Here are some questions you probably have about our methodology:

What time period does this represent?

We examined recruiting classes from 2002 to 2011 (this represents the year of graduation from high school).  While the chart above ranks the ACC, we compiled data for over 300 Division 1 colleges (over 12,000 players).

How did you compute the conversion rate?

The conversion rate for each school is defined as (Sum of draft picks for the 2002-2011 recruiting classes)/(Weighted Recruiting Talent).  Weighted Recruiting Talent is determined by summing the recruiting “points” for each class.  These “points” are computed by weighting each recruit by the overall population average probability of being drafted for recruits at that corresponding talent level.  We are using ratings data from Rivals.com.  The weights for each “type” of recruit were 0.51 for each five star recruit, 0.13 for each four star, 0.03 for each three star, 0.008 for each two star, and 0.004 for each not ranked.  

Second-round picks often don’t even make the team.  What if you only considered first round picks?

We have also computed the rates using first round picks only, please see the table below.

NEXT: RANKING THE BIG 10

*We can already hear our friends at Duke explaining how players are rated more highly by services just because they are being recruited by Duke.  We acknowledge that it is very difficult to get a true measure of a high school player’s ability.  However, we also believe that over the last ten years, given all of the media exposure for high school athletes, this problem has attenuated.

Mike Lewis & Manish Tripathi, Emory University 2013.