NBA Fan Rankings: 2016 Edition

On an (almost) annual basis I present rankings of fan bases across major professional and collegiate leagues.  Today it is time for the NBA.   First, the winners and losers in this year’s rankings.  At the top of the list we have the Knicks, Lakers and Bulls. This may be the trifecta of who the league would love to have playing at Christmas and in the Finals.  At the bottom we have the Grizzlies, Nets and Hornets.

nba2016

Before i get into the details it may be helpful to briefly mention what differentiates these rankings from other analyses of teams and fans. My rankings are driven by statistical models of how teams perform on a variety of marketing metrics.  The key insight is that these models allow us to control for short-run variation in team performance and permanent differences in market potential.  In other words – the analysis uses data to identify engagement or passion (based on attend and spend) beyond what is expected based on how a team is performing and where the team is located.   More details on the methodology can be found here.

spike-lee-knicks

The Winners

This year’s list contains no real surprises.  The top five teams are all major market teams with storied traditions.  The top fan base belongs to the Knicks.   The Lakers, Bulls, Heat and Celtics follow.  The Knicks  highlight how the model works.  While the Knicks might not be winning , Knicks fans still attend and spend.

The number two team on the list (The Lakers) is in much the same situation. A dominant brand with a struggling on-court product.   The Lakers and Clippers are an interesting comparison.  Last season, the Clippers did just a bit better in terms of attendance (100.7% versus 99.7%).  But the Lakers filled their seats with an average ticket price that was substantially higher.  The power of the Laker brand is shown in this comparison because these outcomes occurred in a season where the Clippers won many more games.

Why are the Lakers still the bigger draw?  Is this a star (Kobe) effect?  Probably in part, but fan loyalty is something that evolves over time.  The Lakers have the championships, tradition and therefore the brand loyalty.  It will be interesting to see how much equity is retained long-term if the team is unable to quickly reload.  The shared market makes this an interesting story to watch. I suspect that the Lakers will continue to be the stronger brand for quite a while.

The Losers

At the bottom of the list we have Memphis, Brooklyn and Charlotte.  The interesting one in this group is Brooklyn.  Why do the Nets rank poorly?  It ends up being driven by the relative success of the Knicks versus the Nets.  The Knicks have much more pricing power while the teams operate in basically the same market (we can debate this point).  According to ESPN, the Knicks drew 19,812 fans (100% of capacity) while the Nets filled 83.6% of their building.  The Knicks also command much higher ticket prices.  And while the Nets were worse (21 victories) the Knicks were far from special (32 wins).

What can the teams at the bottom of the list do?  When you go into the data and analyze what drives brand equity the results are intuitive.   Championships, deep playoff runs and consistent playoff appearances are the key to building equity.  easy to understand but tough to accomplish.

And a Draw

An interesting aside in all this is what it means for the league.  The NBA has long been a star and franchise driven league.  In the 1980s it was about the Lakers (Magic) and Celtics (Bird).  In the 1990s it was Michael Jordan and the Bulls.  From there we shifted into Kobe and Lebron.

On one hand, the league might be (even) stronger if the top teams were the Bulls, Knicks and Lakers.  On the other hand, the emergence of Steph Curry and Golden State has the potential to help build another powerful brand.

Some more thoughts…

The Fan Equity metric is just one possible means for assessing fan bases.  In this year’s NFL rankings I reported several more analyses that focus on different market outcomes.  These were social media following, road attendance and win sensitivity (bandwagon fans).  Looking at social following tells us something about the future of the brand as it (broadly) captures fan interest of a younger demographic.  Road Attendance tells us something about national rather than local following.  These analyses also use statistical models to control for market and team performance effects.

Social Equity

Top Social Equity Team: The Lakers

Bottom Social equity: The Nets

Comment: The Lakers are an immensely strong brand on many dimensions.  The Nets are a mid-range brand when you look at raw numbers.  But they suffer when we account for them operating in the NY market.

Road Equity

Top Road Equity: The Lakers

Bottom Road Equity: Portland

Comment: The Lakers dominate.  And as this analysis was done looking at fixed effects across 15 years it is not solely due to Kobe Bryant.  Portland does well locally but is not of much interest nationally.

It is possible to do even more.  We can even look at factors such as win or price sensitivity. Win sensitivity (or bandwagon behavior) tells us whose fans only show up when a team is winning and price sensitivity tells us if a fan base is willing to show up when prices go up.  I’m skipping these latter two analyses today just to avoid overkill (available upon request).  The big message is that we can potentially construct a collection of metrics that provide a fairly comprehensive and deep understanding of each team’s fan base and brand.

Note: I have left one team off the list.  I have decided to stop reporting the local teams (Emory is in Atlanta).  The local teams have all been great to both myself and the Emory community.  This is just a small effort to eliminate some headaches for myself.

Finally… The complete list

City Fan Equity
Boston 5
Charlotte 27
Chicago 3
Cleveland 20
Dallas 15
Denver 11
Detroit 25
GoldenState 16
Houston 7
Indiana 21
LAClips 17
LALakers 2
Memphis 29
Miami 4
Milwaukee 14
Minnesota 22
Brooklyn 28
NewOrleans 24
NYKnicks 1
OKCity 13
Orlando 19
Philadelphia 26
Phoenix 9
Portland 6
Sacramento 10
SanAntonio 12
Toronto 18
Utah 8
Washington 23
 

Analytics, Trump, Clinton and the Polls: Sports Analytics Series Part 5.1

Recent presidential elections (especially 2008 and 2012) have featured heavy use of analytics by candidates and pundits.  The Obama campaigns were credited with using micro targeting and advanced analytics to win elections. Analysts like Nate Silver were hailed as statistical gurus who could use polling data to predict outcomes.  In the lead up to this year’s contest we heard a lot about the Clinton campaign’s analytical advantages and the election forecasters became regular parts of election coverage.

Then Tuesday night happened.  The polls were wrong (by a little) and the advanced micro targeting techniques didn’t pay off (enough).

Why did the analytics fail?

First the polls and the election forecasts (I’ll get to the value of analytics next week). As background, commentators tend to not truly understand polls.  This creates confusion because commentators frequently over- and misinterpret what polls are saying.  For example, whenever “margin of error” is mentioned they tend to get things wrong.  A poll’s margin of error is based on sample size.  The common journalist’s error is that when you are talking about a collection of polls the sample size is much larger than an individual poll with a margin of error of 3% or 4%.  When looking at an average of many polls the “margin of error” is much smaller because the “poll of polls” has a much larger sample size.  This is a key point because when we think about the combined polls it is even more clear that something went wrong in 2016.

Diagnosing what went wrong is complicated by two factors.  First, it should be noted that because every pollster does things differently we can’t make blanket statements or talk in absolutes.  Second, diagnosing the problem requires a deep understanding of the statistics and assumptions involved in polling.

In the 2016 election my suspicion is that a two things went wrong.  As a starting point – we need to realize that polls include strong implicit assumptions about the nature of the underlying population and about voter passion (rather than preference).  When these assumptions don’t hold the polls will systematically fail.

First, most polls start with assumptions about the nature of the electorate.  In particular, there are assumptions about the base levels of Democrats, Republicans and Independents in the population.  Very often the difference between polls relates to these assumptions (LA Times versus ABC News).

The problem with assumptions about party affiliation in an election like 2016, is that the underlying coalitions of the two parties are in transition.  When I grew up the conventional wisdom was that the Republicans were the wealthy, the suburban professionals, and the free trading capitalists while the democrats were the party of the working man and unions.  Obviously these coalitions have changed.  My conjecture is that pollsters didn’t sufficiently re-balance.  In the current environment it might make sense to place greater emphasis on demographics (race and income) when designing sampling segments.

The other issue is that more attention needs to be paid towards avidity / engagement/ passion (choose your own marketing buzz word).  Polls often differentiate between likely and registered voters.  This may have been insufficient in this election. If Clinton’s likely voters were 80% likely to show up and Trumps were 95% likely then having a small percentage lead in a preference poll isn’t going to hold up in an election.

The story of the 2016 election should be something every analytics professional understands.  From the polling side the lesson is that we need to understand and question the underlying assumptions of our model and data.  As the world changes do our assumptions still hold?  Is our data still measuring what we hope it does?  Is a single dependent measure (preference versus avidity in this case) enough?

Moving towards Modeling & Lessons from Other Arenas: Sports Analytics Series Part 5

The material in this series is derived from a combination of my experiences in sports applications and my experiences in customer analysis and database marketing.  In many respects, the development of an analytics function is similar across categories and contexts.  For instance, a key issue in any analytics function is the designing and creation of an appropriate data structure.  Creating or acquiring the right kinds of analytics capabilities (statistical skills) is also a common need across industries.

A need to understand managerial decision making styles is also common across categories.  It’s necessary to understand both the level of interest in using analytics and also the “technical level” of the decision makers.  Less experienced data scientists and statistician have a tendency to use too complicated of methods.  This can be a killer.  If the models are too complex they won’t be understood and then they won’t be used.  Linear regression with perhaps a few extensions (fixed effects, linear probability models) are usually the way to go.    Because sports organizations have less history in terms of using analytics the issue of balancing complexity can be especially challenging.

A key distinction between many sports and marketing applications is the number of variables versus the number of observations.  This is an important point of distinction between sports and non-sports industries and it is also an important issue for when we shift to discussing modeling in a couple of weeks.  When I use the term variables I am referencing individual elements of data.  For example, an element of data could be many different things such as a player’s weight or the number of shots taken or the minutes played.  We might also break variables into the categories of dependent variables (things to explain) versus independent variables (things to explain with).  When I use the term observations I am talking about “units of analysis” like players or games.

In many (most) business contexts we have many observations.  A large company may have millions of customer accounts.  There may, however, be relatively few explanatory variables.  The firm may have only transaction history variables and limited demographics.  Even in sports marketing a team interested in modeling season ticket retention may only have information such as the number of tickets previously purchased, prices paid and a few other data points.  In this same example the team may have tens of thousands of season ticket holders.  If we think of this “information” as a database we would have a row for every customer account (several thousand rows) and perhaps ten or twenty columns of variables related to each customer (past purchases and marketing activities).

One trend is that the number of explanatory variables is expanding in just about every category. In marketing applications we have much more purchase detail and often expanded demographics and psychographics.  However, the ratio of observations to columns usually still favors the observations.

In sports we (increasingly) face a very different data environment.  Especially, in player selection tasks like drafting or free agent signings.  The issue in player selection applications is that there are relatively few player level observations.  In particular, when we drill down into specific positions we often find ourselves having only tens or hundreds or player histories (depending on far back we want to go with the data).  In contrast, we may have an enormous number of variables per player.

We have historically had many different types of “box score” type stats but now we have entered into the era of player tracking and biometrics.  Now we can generate player stats related to second-by-second movement or even detailed physiological data.  In sports ranging from MMA to soccer to basketball the amount of variables has exploded.

A big question as we move forward into more modeling oriented topics is how do we deal with this situation?