NFL Fan Base and Brand Rankings 2017

NFL Fandom Report 2017: The “Best” NFL Fans

Who has the best fans in the NFL?  What are the best brands in the NFL? These are simple questions without simple answers.  First we have to decide what we mean by “best”.  What makes for a great fan or brand?  Fans that show up even when the team is losing?  Fans that are willing to pay the highest prices?  Fans that are willing to follow a team on the road or social media?

Even after we agree on the question, answering it is also a challenge.  How do we adjust for the fact that one team might have gone on a miraculous run that filled the stadium?  Or perhaps another team suffered a slew of injuries?  How do we compare fan behavior in a market like New York with fans in a place like Green Bay?

My approach to evaluating fan bases is to use data to develop statistical models of fan interest (more details here).  The key is that these models are used to determine which city’s fans are more willing to spend or follow their teams after controlling for factors like market size and short-term changes in winning and losing.

In past years, two measures of engagement have been featured: Fan Equity and Social Media Equity.  Fan Equity focuses on home box office revenues (support via opening the wallet) and Social Media Equity focuses on fan willingness to engage as part of a team’s community (support exhibited by joining social media communities).  This year I am adding a third measure Road Equity.  Road Equity focuses on how teams draw on the road after adjusting for team performance.   These metrics provide a balance – a measure of willingness to spend, a measure unconstrained by stadium size and a measure of national appeal.

To get at an overall ranking, I’m going to use the simplest possible method.  We are just going to average the across the three metrics.  (similar analyses are available for the NBA and MLB).

The Winners

The top five fan bases (team brands if you prefer) are the Cowboys, Patriots, Eagles, Giants and Steelers.  The Cowboys excel on all the metrics.  They win in terms of Fan Equity (a revenue premium measure of brand strength), Road Equity and finish second in social media.  The underlying data (I will spare everybody the statistical models) reveals why Dallas does so well.  The Cowboy’s average home attendance (reported by ESPN) is more than 10,000 higher than the next team.  The Cowboys average ticket price is also well above average and they have the second most Twitter followers after the Patriots.  The other thing to note is that the Cowboys achieve these year in and year out , even in years when the team is not great.  

There are likely some objections to the list.  Patriot fans are bandwagon fans!  The Steelers are too low!  The Eagles above the Packers or Bears?!   Way too much to get into in a short blog post but a couple of comments.

First, Patriot fans may be bandwagon fans.  But at this point it is tough to tell.  The team has been excellent and the fans have been supportive for a long time.  And even when things tend to go wrong for the Patriots they come out ahead.  I believe that the deflate gate controversy had a significant positive impact on the Patriots’ social media following.

The Steelers are low in Fan Equity and higher on the other metrics.  We can trace this to the Steelers pricing.  The Steelers seem to price on the low side of what is possible.

The Eagle do surprise me.  They do get a bump from playing in the NFC East interms of the Road Equity metric.  The NFC East is a strong collection of brands that benefit each other.  It is not easy to disentangle these effects.  And perhaps we shouldn’t since we can make a case that the rivalries that benefit these teams are because of the interest in the individual brands.

The Losers

At the other extreme we have the Bengals, Jaguars, Titans, Rams and Chiefs.  Some of these are no surprises.  At the top of the list we have the NFL’s royalty.  No one has ever placed the Bengals, Jaguars or titans in that category.

The teams at the bottom of the rankings all suffer from relatively low attendance, have below average pricing power and have limited social followings.  The Rams are a special case.  While not a great brand in past years, the move to LA tends to punish the Rams because their results have not kept pace with the higher income and population levels in LA.

The Chiefs are the tough one on this list.  The Chiefs fill their stadium but at relatively low price.  Keep in mind that the analysis includes factors such as population and median income.  In addition, Kansas City was ranked 29th in terms of Road Attendance last year and the social media following (Twitter) is middle of the road.  The fundamental issue is that that the Chiefs produce these below average fan-based results while performing well above average on the field.

The Complete List

The complete list follows.  In addition to the overall ranking of fan bases, I also report rankings on the social and road measures.  Following the table, I provide a bit more detail regarding each of the metrics.

Three metrics are used to get a complete picture of fans.  But there are other ways to look at fan behavior and brand strength.  For example, we could look at pricing power (which teams are able to extract significant price premiums) or bandwagon fan behavior (which fans are most sensitive to winning).  I’m happy to provide these additional rankings if there is interest.

Fan Equity

Winners: Cowboys, Patriots and 49ers

Losers: Rams, Raiders, Jags

Fan Equity looks at home revenues relative to expected revenue based on team performance and market characteristics.  The goal of the metric is to measure over or under performance relative to other teams in the league.  In other words, statistical models are used to create an apples-to-apples type comparison to avoid distortions due to long-term differences in market size or short-term differences in winning rates.

The 49ers are the interesting winner on this metric.  After the last couple of years, it is doubtful that people are thinking about the 49ers having a rabid fan base.  However, the 49ers are a great example of how the approach works.  On the field the 49ers have been terrible.  But despite the on-field struggles the 49ers still pack in the fans and charge high prices.  This is evidence of a very strong brand because even while losing the 49ers fans still attend and spend.  In terms of the overall rankings the 49ers don’t do all that great because the team does not perform as well as a road or social media draw.

In terms of business concepts, this “Fan Equity” measure is similar to a “revenue premium” measure of brand equity.  It captures the differentials in fan’s willingness to financially support teams of similar quality.  From a business or marketing perspective this is a gold standard of metrics as it directly relates to how a strong brand translates to revenues and profits.

One important thing to note is that some teams may not be trying to maximize revenues.  Perhaps the team is trying to build a fan base by keeping prices low.  Or a team my price on the low side based on some notion of loyalty to its community.   In these cases the Fan equity metric may understate the engagement of fans.

Social Media Equity

Winners: Patriots, Cowboys and Broncos

Losers: Chiefs, Rams and Cardinals

Social Media Equity is also an example of a “premium” based measure of brand equity.  It differs from the Fan Equity in that it focuses on how many fans a team has online rather than fans’ willingness to pay higher prices.  Similar to Fan Equity, Social Media Equity is also constructed using statistical models that control for performance and market differences.

In terms of business application, the social media metric has several implications both on its own merits and in conjunction with the Fan Equity measure.  For example, the lack of local constraints, means that the Social Equity measure is more of a national level measure.  so while the Fan Equity metric focuses on local box office revenues, the social metric provides insight into how a team’s fandom extends beyond a metro area.

Social Media Equity may also serve as a leading indicator of a team’s future fortunes.  For a team to grow revenues it is often necessary to implement controversial price increases.  Convincing fans to sign expensive contracts to buy season tickets can also be a challenge.  Increasing prices and acquiring season ticket holders can therefore take time, while social media communities can grow quickly.  Some preliminary analysis suggests that vibrant social communities are positively correlated with future revenue growth.

A comparison of Fan Equity and Social Media can also be useful.  If Social Media equity exceeds Fan Equity it is evidence that the team has some marketing potential that is not being exploited.  For example, one issue that is common in sports is that it is difficult to estimate the price elasticity of demand because demand is often highest for the best teams and best seats.  The unconstrained nature of social media can provide an important data point for assessing whether a team has additional pricing flexibility.

Road Equity

Winners: Cowboys, Eagles and Raiders

Losers: Texans, Titans and Seahawks

Another way to look at fan quality is to look at how a team draws on the Road.  In the NBA these effects are pronounced.  Lebron or a retiring Kobe coming to town can often lead to sell outs.  At the college level some teams are known to travel very well.  A fan base that travels is almost by definition incredible passionate.

This one has a bit of a muddled interpretation.  If a team has great road attendance is it because the fans are following the team or because they have a national following?  In other words, do the local fans travel or does a team with high road attendance have a national following.  When the Steelers turned the Georgia Dome Yellow and Black was it because Steelers fans came down from Pittsburg or because the Steelers have fans everywhere.

Furthermore, if it is a national following is it because the team is popular across the country or because a lot of folks have moved from places like Pittsburgh or Buffalo to the Sun Belt.  A national following is a great characteristic that might suggest that a team’s brand is on an upswing.  Or it might be that the city itself is on a downward trajectory.

 

 

MLB Fan Base and Brand Rankings 2017

MLB Fandom Report 2017: The “Best” Fans in Baseball – Rough Draft

Who has the best fans in Major League Baseball?  What are the best brands in MLB? These are simple questions without simple answers.  What makes for a great fan or brand?  Fans that show up even when the team is losing?  Fans that are willing to pay the most?  Fans that are willing to follow a team on the road or social media?

Even after we agree on the question(s), answering it is also a challenge.  How do we adjust for the fact that one team might have gone on a miraculous run that filled the stadium?  Or perhaps another team suffered a slew of injuries?  How do we compare fan behavior in a market like New York with fans in a place like Milwaukee?  What if a team just opened a new stadium?

My approach to evaluating fan bases is to use data to develop statistical models of fan interest (more details here).  The key is that these models are used to determine which city’s fans are more willing to spend or follow their teams after controlling for factors like market size and short-term variations in performance.

This year’s overall rankings are based on three sub-rankings.  In past years, two measures of engagement have been featured: Fan Equity and Social Media Equity.  Fan Equity focuses on home box office revenues (support via opening the wallet) and Social Media Equity focuses on fan willingness to engage as part of a team’s community (support exhibited by joining social media communities).  This year I am adding a third measure – Road Equity.  Road Equity focuses on how teams draw on the road after adjusting for team performance.   These metrics provide a balance – a measure of willingness to spend, a measure unconstrained by stadium size and a measure of national appeal.

To get at an overall ranking, I’m going to use the simplest method possible.  We are just going to average the across the three metrics.

Today’s post is focused on MLB but if you are interested you can see last year’s NBA fan rankings here and this year’s  NFL rankings will be posted soon.

The Winners

Overall, the group of clubs that comprise the Top 5 contains little in the way of surprises.  The Yankees rank number one and are followed by the Cubs, Red Sox, Giants and Dodgers.  The Yankees “win” because they draw fans (usually top 5) and charge high prices even when on-field results dip.  The Yankees are also a great attraction on the road and have an enormous social media following.

In general, the clubs at the top of the list share these same traits.  They are all able to motivate fans to attend and spend as they all possess great attendance numbers and relatively high prices.  More to the point, these teams are even able to draw well and command price premiums when they are not winning.  The Cubs are the best example of this.

The list of winners probably raises an issue of “large” market bias.  However, keep in mind that the methodology is designed to control for home market effects.  The method is explicitly designed to control for differences in market demographics (and team performance).  While the “winners” tend to come from the bigger and more lucrative markets, other major market teams do not fair particularly well (see below).

The Laggards

The bottom of the list features the Marlins, Indians, Athletics, Angels and White Sox.  It is interesting that the bottom also includes teams from major markets such as LA, Chicago and Miami.

The Marlins finish is a reflection of how the team struggles on multiple dimensions. Attendance is often in the bottom 5 of the league despite being located in a major metro area.  Pricing is also below average for MLB.  Cleveland also struggles on these metrics but given the advantages of the Miami market, the Marlins relative performance is just a bit worse.

From a branding perspective it is not surprising that we see one dominant brand in the cities with two clubs.  Being a sports fan is about being part of a community.  Many fans are drawn to the bigger and more dominant community – Yankees, Cubs or Dodgers rather than the Mets, White Sox or Angels.  The A’s probably also suffer a similar set of problems as they compete against the Giants in the Bay area.

The Complete List

The complete list follows.  In addition to the overall ranking of fan bases, I also report rankings on the social and road measures.  Following the table, I provide a bit more detail regarding each of the metrics.

The Details

Fan Equity

The Winners: Red Sox, Yankees and Cardinals

The Losers: Mets, Indians and Marlins

Fan Equity looks at home revenues relative to expected revenue based on team performance and market characteristics.  The goal of the metric is to measure over (or under) performance relative to other teams in the league.  In other words, statistical models are used to create an apples-to-apples type comparison to avoid distortions due to long-term differences in market size or short-term differences in winning rates.

In terms of business concepts, this measure is similar to a “revenue premium” measure of brand equity.  It captures the differentials in fans willingness to financially support teams of similar quality.  From a business or marketing perspective this is a gold standard of metrics as it directly relates to how a strong brand translates to revenues and profits.

However, the context is sports, and that does make things different.  At a basic level sports organizations have dual objectives.  They care about winning and profit.  That is important because some teams may not be trying to maximize revenues.  Perhaps the team is trying to build a fan base by keeping prices low.   If this is the case the Fan equity metric understates the engagement of fans.

The Cardinals are the big story in terms of fan equity.  St. Louis is a unique baseball town.  Amazingly supportive fans for a market the size of St. Louis.  The Cardinals just fall short on the other more national metrics.

Social Media Equity

Winners: Blue Jays, Braves, and Yankees

Losers: Mariners, A’s and Nationals

Social Media Equity is also an example of a “premium” based measure of brand equity.  It differs from the Fan Equity in that it focuses on how many fans a team has online rather than fans’ willingness to pay higher prices.  Similar to the Fan Equity metric, Social Media Equity is also constructed using statistical models that control for performance and market differences.  Social Media Equity is more about potential.  I think that social equity is an indicator of what can be built.  but teams still have to win to make the conversion.

In terms of business application, the social media metric has several implications both on its own merits and in conjunction with the Fan Equity measure.  For example, the lack of local constraints, means that the Social Equity measure is more of a national level measure.  The Fan Equity metric focuses on local box office revenues.  In contrast, the social metric provides insight into how a team’s fandom extends beyond a metro area.

Social Media Equity may also serve as a leading indicator of a team’s future fortunes.  For a team to grow revenues it is often necessary to implement controversial price increases.  Convincing fans to sign expensive contracts to buy season tickets can also be a challenge.  Increasing prices and acquiring season ticket holders can take time while social media communities can grow quickly.  Social community size has been found to be positively correlated with future revenue growth.

A comparison of Fan Equity and Social Media can be useful.  If Social Media equity exceeds Fan Equity it is evidence that the team has some marketing potential that is not being exploited.  For example, one issue that is common in sports is that it is difficult to estimate the price elasticity of demand because demand is often highest for the best teams and best seats.  The unconstrained nature of social media can provide an important data point for assessing whether teams have additional pricing flexibility.

This is an interesting list of winners.  My guess is that the Braves and Blue Jays are on the upswing as brands.  For the teams at the bottom – it’s a concerning situation.  These teams don’t seem to be capturing the next generation.

Road Equity

Winners: Yankees, Dodgers and Cubs

Losers: Marlins, White Sox and Indians

This is a new metric for the blog. One way to look at fan quality is to look at how a team draws on the Road.  In the NBA these effects are pronounced.  Lebron or a retiring Kobe coming to town can often lead to sell outs.  At the college level some teams are known to travel very well.  A fan base that travels is almost by definition incredibly passionate.

This one has a bit of a muddled interpretation.  If a team has great road attendance is it because the fans are following the team or because they have a national following?  If the Yankees play the Rays and attendance spikes is it because Yankees fans travel or because Tampa  residents come out to see the Yankees?

The winners on this list are no surprise.  One reason I like this metric is that it is consistent with the conventional wisdom.  It has tons of face validity.

At the bottom of the rankings we have the Marlins, Indians and White Sox.  These seem to be struggling brands that lack local and national appeal.

 

 

NBA Fan Rankings: 2016 Edition

On an (almost) annual basis I present rankings of fan bases across major professional and collegiate leagues.  Today it is time for the NBA.   First, the winners and losers in this year’s rankings.  At the top of the list we have the Knicks, Lakers and Bulls. This may be the trifecta of who the league would love to have playing at Christmas and in the Finals.  At the bottom we have the Grizzlies, Nets and Hornets.

nba2016

Before i get into the details it may be helpful to briefly mention what differentiates these rankings from other analyses of teams and fans. My rankings are driven by statistical models of how teams perform on a variety of marketing metrics.  The key insight is that these models allow us to control for short-run variation in team performance and permanent differences in market potential.  In other words – the analysis uses data to identify engagement or passion (based on attend and spend) beyond what is expected based on how a team is performing and where the team is located.   More details on the methodology can be found here.

spike-lee-knicks

The Winners

This year’s list contains no real surprises.  The top five teams are all major market teams with storied traditions.  The top fan base belongs to the Knicks.   The Lakers, Bulls, Heat and Celtics follow.  The Knicks  highlight how the model works.  While the Knicks might not be winning , Knicks fans still attend and spend.

The number two team on the list (The Lakers) is in much the same situation. A dominant brand with a struggling on-court product.   The Lakers and Clippers are an interesting comparison.  Last season, the Clippers did just a bit better in terms of attendance (100.7% versus 99.7%).  But the Lakers filled their seats with an average ticket price that was substantially higher.  The power of the Laker brand is shown in this comparison because these outcomes occurred in a season where the Clippers won many more games.

Why are the Lakers still the bigger draw?  Is this a star (Kobe) effect?  Probably in part, but fan loyalty is something that evolves over time.  The Lakers have the championships, tradition and therefore the brand loyalty.  It will be interesting to see how much equity is retained long-term if the team is unable to quickly reload.  The shared market makes this an interesting story to watch. I suspect that the Lakers will continue to be the stronger brand for quite a while.

The Losers

At the bottom of the list we have Memphis, Brooklyn and Charlotte.  The interesting one in this group is Brooklyn.  Why do the Nets rank poorly?  It ends up being driven by the relative success of the Knicks versus the Nets.  The Knicks have much more pricing power while the teams operate in basically the same market (we can debate this point).  According to ESPN, the Knicks drew 19,812 fans (100% of capacity) while the Nets filled 83.6% of their building.  The Knicks also command much higher ticket prices.  And while the Nets were worse (21 victories) the Knicks were far from special (32 wins).

What can the teams at the bottom of the list do?  When you go into the data and analyze what drives brand equity the results are intuitive.   Championships, deep playoff runs and consistent playoff appearances are the key to building equity.  easy to understand but tough to accomplish.

And a Draw

An interesting aside in all this is what it means for the league.  The NBA has long been a star and franchise driven league.  In the 1980s it was about the Lakers (Magic) and Celtics (Bird).  In the 1990s it was Michael Jordan and the Bulls.  From there we shifted into Kobe and Lebron.

On one hand, the league might be (even) stronger if the top teams were the Bulls, Knicks and Lakers.  On the other hand, the emergence of Steph Curry and Golden State has the potential to help build another powerful brand.

Some more thoughts…

The Fan Equity metric is just one possible means for assessing fan bases.  In this year’s NFL rankings I reported several more analyses that focus on different market outcomes.  These were social media following, road attendance and win sensitivity (bandwagon fans).  Looking at social following tells us something about the future of the brand as it (broadly) captures fan interest of a younger demographic.  Road Attendance tells us something about national rather than local following.  These analyses also use statistical models to control for market and team performance effects.

Social Equity

Top Social Equity Team: The Lakers

Bottom Social equity: The Nets

Comment: The Lakers are an immensely strong brand on many dimensions.  The Nets are a mid-range brand when you look at raw numbers.  But they suffer when we account for them operating in the NY market.

Road Equity

Top Road Equity: The Lakers

Bottom Road Equity: Portland

Comment: The Lakers dominate.  And as this analysis was done looking at fixed effects across 15 years it is not solely due to Kobe Bryant.  Portland does well locally but is not of much interest nationally.

It is possible to do even more.  We can even look at factors such as win or price sensitivity. Win sensitivity (or bandwagon behavior) tells us whose fans only show up when a team is winning and price sensitivity tells us if a fan base is willing to show up when prices go up.  I’m skipping these latter two analyses today just to avoid overkill (available upon request).  The big message is that we can potentially construct a collection of metrics that provide a fairly comprehensive and deep understanding of each team’s fan base and brand.

Note: I have left one team off the list.  I have decided to stop reporting the local teams (Emory is in Atlanta).  The local teams have all been great to both myself and the Emory community.  This is just a small effort to eliminate some headaches for myself.

Finally… The complete list

City Fan Equity
Boston 5
Charlotte 27
Chicago 3
Cleveland 20
Dallas 15
Denver 11
Detroit 25
GoldenState 16
Houston 7
Indiana 21
LAClips 17
LALakers 2
Memphis 29
Miami 4
Milwaukee 14
Minnesota 22
Brooklyn 28
NewOrleans 24
NYKnicks 1
OKCity 13
Orlando 19
Philadelphia 26
Phoenix 9
Portland 6
Sacramento 10
SanAntonio 12
Toronto 18
Utah 8
Washington 23
 

Analytics, Trump, Clinton and the Polls: Sports Analytics Series Part 5.1

Recent presidential elections (especially 2008 and 2012) have featured heavy use of analytics by candidates and pundits.  The Obama campaigns were credited with using micro targeting and advanced analytics to win elections. Analysts like Nate Silver were hailed as statistical gurus who could use polling data to predict outcomes.  In the lead up to this year’s contest we heard a lot about the Clinton campaign’s analytical advantages and the election forecasters became regular parts of election coverage.

Then Tuesday night happened.  The polls were wrong (by a little) and the advanced micro targeting techniques didn’t pay off (enough).

Why did the analytics fail?

First the polls and the election forecasts (I’ll get to the value of analytics next week). As background, commentators tend to not truly understand polls.  This creates confusion because commentators frequently over- and misinterpret what polls are saying.  For example, whenever “margin of error” is mentioned they tend to get things wrong.  A poll’s margin of error is based on sample size.  The common journalist’s error is that when you are talking about a collection of polls the sample size is much larger than an individual poll with a margin of error of 3% or 4%.  When looking at an average of many polls the “margin of error” is much smaller because the “poll of polls” has a much larger sample size.  This is a key point because when we think about the combined polls it is even more clear that something went wrong in 2016.

Diagnosing what went wrong is complicated by two factors.  First, it should be noted that because every pollster does things differently we can’t make blanket statements or talk in absolutes.  Second, diagnosing the problem requires a deep understanding of the statistics and assumptions involved in polling.

In the 2016 election my suspicion is that a two things went wrong.  As a starting point – we need to realize that polls include strong implicit assumptions about the nature of the underlying population and about voter passion (rather than preference).  When these assumptions don’t hold the polls will systematically fail.

First, most polls start with assumptions about the nature of the electorate.  In particular, there are assumptions about the base levels of Democrats, Republicans and Independents in the population.  Very often the difference between polls relates to these assumptions (LA Times versus ABC News).

The problem with assumptions about party affiliation in an election like 2016, is that the underlying coalitions of the two parties are in transition.  When I grew up the conventional wisdom was that the Republicans were the wealthy, the suburban professionals, and the free trading capitalists while the democrats were the party of the working man and unions.  Obviously these coalitions have changed.  My conjecture is that pollsters didn’t sufficiently re-balance.  In the current environment it might make sense to place greater emphasis on demographics (race and income) when designing sampling segments.

The other issue is that more attention needs to be paid towards avidity / engagement/ passion (choose your own marketing buzz word).  Polls often differentiate between likely and registered voters.  This may have been insufficient in this election. If Clinton’s likely voters were 80% likely to show up and Trumps were 95% likely then having a small percentage lead in a preference poll isn’t going to hold up in an election.

The story of the 2016 election should be something every analytics professional understands.  From the polling side the lesson is that we need to understand and question the underlying assumptions of our model and data.  As the world changes do our assumptions still hold?  Is our data still measuring what we hope it does?  Is a single dependent measure (preference versus avidity in this case) enough?

Moving towards Modeling & Lessons from Other Arenas: Sports Analytics Series Part 5

The material in this series is derived from a combination of my experiences in sports applications and my experiences in customer analysis and database marketing.  In many respects, the development of an analytics function is similar across categories and contexts.  For instance, a key issue in any analytics function is the designing and creation of an appropriate data structure.  Creating or acquiring the right kinds of analytics capabilities (statistical skills) is also a common need across industries.

A need to understand managerial decision making styles is also common across categories.  It’s necessary to understand both the level of interest in using analytics and also the “technical level” of the decision makers.  Less experienced data scientists and statistician have a tendency to use too complicated of methods.  This can be a killer.  If the models are too complex they won’t be understood and then they won’t be used.  Linear regression with perhaps a few extensions (fixed effects, linear probability models) are usually the way to go.    Because sports organizations have less history in terms of using analytics the issue of balancing complexity can be especially challenging.

A key distinction between many sports and marketing applications is the number of variables versus the number of observations.  This is an important point of distinction between sports and non-sports industries and it is also an important issue for when we shift to discussing modeling in a couple of weeks.  When I use the term variables I am referencing individual elements of data.  For example, an element of data could be many different things such as a player’s weight or the number of shots taken or the minutes played.  We might also break variables into the categories of dependent variables (things to explain) versus independent variables (things to explain with).  When I use the term observations I am talking about “units of analysis” like players or games.

In many (most) business contexts we have many observations.  A large company may have millions of customer accounts.  There may, however, be relatively few explanatory variables.  The firm may have only transaction history variables and limited demographics.  Even in sports marketing a team interested in modeling season ticket retention may only have information such as the number of tickets previously purchased, prices paid and a few other data points.  In this same example the team may have tens of thousands of season ticket holders.  If we think of this “information” as a database we would have a row for every customer account (several thousand rows) and perhaps ten or twenty columns of variables related to each customer (past purchases and marketing activities).

One trend is that the number of explanatory variables is expanding in just about every category. In marketing applications we have much more purchase detail and often expanded demographics and psychographics.  However, the ratio of observations to columns usually still favors the observations.

In sports we (increasingly) face a very different data environment.  Especially, in player selection tasks like drafting or free agent signings.  The issue in player selection applications is that there are relatively few player level observations.  In particular, when we drill down into specific positions we often find ourselves having only tens or hundreds or player histories (depending on far back we want to go with the data).  In contrast, we may have an enormous number of variables per player.

We have historically had many different types of “box score” type stats but now we have entered into the era of player tracking and biometrics.  Now we can generate player stats related to second-by-second movement or even detailed physiological data.  In sports ranging from MMA to soccer to basketball the amount of variables has exploded.

A big question as we move forward into more modeling oriented topics is how do we deal with this situation?

Decision Biases: Sports Analytics Series Part 4

One way to look at on-field analytics is that it is a search for decision biases.  Very often, sports analytics takes the perspective of challenging the conventional wisdom.  This can take the form of identifying key statistics for evaluating players.  For example, one (too) simple conclusion from “Moneyball” would be that people in baseball did not adequately value the value of being walked and on-base percentage.  The success of the A’s (again – way oversimplifying) was based on finding flaws in the conventional wisdom.

Examples of “challenges” to conventional wisdom are common in analyses of on-field decision making.  For example, in past decades the conventional wisdom was that it is a good idea to use a sacrifice bunt to move players into scoring position or that it is almost always a good idea to punt on fourth down.  I should note that even the term conventional wisdom is problematic as there have likely always been long-term disagreements about the right strategies to use at different points in a game.  Now, however, we are increasingly in a position to use data to determine the right or optimal strategies.

As we discussed last time, humans tend to be good at overall or holistic judgments while models are good at precise but narrow evaluations.  When the recommendations implied by the data or model are at odds with how decisions are made, there is often an opportunity for improvement.  Using data to find types of undervalued players or to find beneficial tactics represents an effort to correct human decision making biases.

This is an important point.  Analytics will almost never outperform human judgment when it comes to individuals.  What analytics are useful for is helping human decision makers self-correct.  When the model yields different insights than the person it’s time to drill down and determine why.  Maybe it’s a shortcoming of the model or maybe it’s a bias on the part of the general manager.

The term bias has a negative connotation.  But it shouldn’t for this discussion.  For this discussion a bias should just be viewed as a tendency to systematically make decisions based on less than perfect information.

The academic literature has investigated many types of biases.  Wikipedia provides a list of a large number of biases that might lead to decision errors.  This list even includes the sports inspired “hot-hand fallacy” which is described as a “belief that a person who has experienced success with a random event has a greater chance of further success in additional attempts.”  From a sports analytics perspective the question might be asked is whether the hot-hand is a real thing or just a belief. The analyst might be interested in developing a statistical test to assess whether a player on a hot streak is more likely to be successful on his next attempt.  This model would have implications for whether a coach should “feed” the hot hand.

Academic work has also looked at the impact of factors like sunk costs on player decisions.  The idea behind “sunk costs” is that if costs have already been incurred then those costs should not impact current or future decision making.  In the case of player decisions “sunk costs” might be factors like salary or when the player was drafted.  Ideally, a team would use the players with the highest expected performance.  A tendency towards playing individuals based on the past would represent a bias.

Other academic work has investigated the idea of “status” bias.  In this case the notion is that referees might call a game differently depending on the players involved.  It’s probably obvious that this is the case.  Going old school for a moment, even the most fervent Bulls fans of the 90’s would have to admit that Craig Ehlo wouldn’t get the same calls as Michael Jordan.

In these cases, it is possible (though tricky) to look for biases in human decision making.  In the case of sunk costs investigators have used statistical models to examine the link between when a player was drafted and the decision to play an athlete (controlling for player performance).  If such a bias exists, then the analysis might be used to inform general managers of this trait.

In the case of advantageous calls for high profile players, an analysis might lead to a different type of conclusion. If such a bias exists, then perhaps leagues should invest more heavily in using technology to monitor and correct referee’s decisions.

  • People suffer from a variety of decision biases. These biases are often the result of decision making heuristics or rules of thumbs.
  • One use of statistical models is to help identify decision making biases.
  • The identification of widespread biases is potentially of great value as these biases can help identify imperfections in the market for players or improved game strategies.

A Quick Example of the Limitations of Analytics: Sports Analytics Series Part 3.1

In Part 3 we started to talk about the complementary role of human decision makers and models.  Before we get to the next topic – Decision Biases – I wanted to take a moment to present an example that helps illustrate the points being made in the last entry.

I’m going to make the point using an intentionally nontraditional example.  Part of the reason I’m using this example is that I think it’s worthwhile to think about what might be “questionable” in terms of the analysis.  So rather than look at some well-studied relationships in contexts like NFL quarterbacks or NBA players, I’m going to develop a model of Fullback performance in Major League Soccer.

To keep this simple, I’m going to try and figure out the relationship between a player’s Plus-Minus statistic and a few key performance variables.  I’m not going to provide a critique of Plus-Minus but I encourage everyone to think about the value of such a statistic in soccer in general and for the Fullback position in particular.  This is an important exercise for thinking about combining statistical analysis and human insight.  What is the right bottom line metric for a defensive player in a team sport?

The specific analysis is a simple regression model that quantifies the relationship between Plus-Minus and the following performance measures:

  • % of Defensive Ground Duels Won
  • % of Defensive Aerial Duels Won
  • Tackling Success Rate (%)
  • % of Successful Passes in the Opponents ½

This is obviously a very limited set of statistics.  One thing to think about is that if I am creating this statistical model with even multiple years of data, I probably don’t have very many observations.  This is a common problem.  In any league there are usually about 30 teams and maybe 5 players at any position.  We can potentially capture massive amounts of data but maybe we only have 150 observations a year.  Note that in the case of MLS fullbacks we have less than that.  This is important because it means that in sports contexts we need to have parsimonious models.  We can’t throw all of our data into the models because we don’t have enough observations.

The table below lists the regression output.  Basically, the output is saying that % Successful passes in the opponent’s half is the only statistic that is significantly and positively correlated with a Fullback’s Plus-Minus statistic.

Parameter Estimates
Variable DF Parameter
Estimate
Standard
Error
t Value Pr > |t|
Intercept 1 -1.66764 0.41380 -4.03 <.0001
% Defensive Ground Duels Won 1 -0.00433 0.00314 -1.38 0.1692
% Def Aerial Duels Won 1 -0.00088542 0.00182 -0.49 0.6263
 Tackling Success Percentage 1 0.39149 0.25846 1.51 0.1305
% Successful Passes in Opponents 1/2 1 0.02319 0.00480 4.83 <.0001

The more statistically oriented reader might be asking the question of how well does this model actually fit the data.  What is the R-Square?  It is small.  The preceding model explains about 5% of the variation in Fullback’s Plus-Minus statistics.

And that is the important point.  The model does its job in that it tells us there is a significant relationship between passing skill and goal differential.  But it is far from a complete picture.  The decision maker needs to understand what the model shows.  However, the decision maker also needs to understand what the model doesn’t reveal.   This model (and the vast majority of other models) is inherently limited.  Like I said last time – the model is a decision support tool / not something that makes the decision.

Admittedly I didn’t try to find a model that fits the data really well.  But I can tell you that in my experience in sports and really any context that involves predicting or explaining individual human behavior, the models usually only explain a small fraction of variance in performance data.

A Non-Judgmental Analysis of the NFL Rating Decline

Over the last week there has been a lot of discussion regarding the decline in NFL ratings this season.  The facts seem to be that the NFL is experiencing a weakness in prime time games that has resulted in an 11% drop in ratings.  The NFL has circulated a memo that cites a variety of factors such as the presidential campaign.  Notably the memo states that there is no evidence that “concern over player protests is having a material impact on our ratings.”

I have gone on record in multiple articles over the past few years talking about the likely impact of high profile or controversial events such as domestic violence incidents and concussions on NFL fandom.  My opinion has been that the NFL would continue to be strong and fans would continue to watch.  So what’s different now?  On this blog, the emphasis is almost always on data driven analyses.  In this case, it’s not possible to take that approach.  I would need much more detailed data on TV ratings and even then I likely wouldn’t have the ability to rule out different possible causes.

The NFL has suggested a confluence of events as the culprit.  I think this is true but perhaps not in the manner the NFL is implying.  I think the NFL is right that the presidential campaign is having an impact.  But I suspect it is having less of a direct impact due to people’s attention being shifted in a different direction.  College football does not seem to have experienced a decline in viewership.

I think it is the nature of the current political campaign and the emotions the campaign is generating.  This campaign has highlighted very distinct cultural differences.  The world views of Trump and Clinton supporters seem to be fundamentally different.  The potential problem is that a lot (majority?) of the NFL’s fan base may lean in the Trump direction while the protests lean in the Clinton direction.  In what follows I’m going to talk about this situation on a theoretical level.  I am making no value judgments about any protests or response to protests – I’m just looking at the marketing and branding issues.

Why are the protests potentially damaging to the NFL brand?  I think there are a couple of related issues.

First, the NFL has been known for shutting down individual expression by players.  Remember it is the No Fun League and it’s all about protecting the shield (brand).  Now, however, we seem to have a protest that is allowed.  And it is a protest with which many fans may disagree.  On some level the NFL seems to be changing its policies to accommodate the protests.  I think it is this “change” that may be the key issue.  Especially if the “change” is to accommodate something that is controversial to the core audience.  If a league is known for shutting down everything from TD celebrations to minor uniform violations then is not shutting something down an implicit endorsement?

The stridency of the current presidential campaign in terms of insiders versus outsider and political correctness makes this type of “authenticity” issue especially salient to certain segments of fans.  The impact may be  subtle.  It may manifest as a softening in enthusiasm or engagement with the NFL brand rather than a decrease in stated preference.  Fans still like the game and the players but maybe they are just not as compelled to watch.  (I don’t have access to the NFL’s data but this may be a tricky issue to assess using traditional marketing research techniques.)

The second, and related issue, is that there are other factors impacting the brand.  The current protests occurred in the wake of seasons that featured domestic abuse and the concussion issues.  The NFL brand may be resilient to any one event but over time problems can weaken the foundation.  This type of subtle brand weakness may be especially relevant given that the NFL is currently lacking some star power.  While the NFL is less of a star driven league than say the NBA, having Peyton Manning retired and Tom Brady suspended makes the league more vulnerable.

 

Questioning the Value of Analytics: Sports Analytics Series Part 3

Continuing the discussion about organizational issues and challenges, a fundamental issue is understanding and balancing the relative strengths and weaknesses of human decision makers and mathematical models.  This is an important discussion because before diving into specific questions related to predicting player performance it’s worthwhile to first think about how modeling and statistics should fit into an overall structure for decision making.  The short answer is that analytics should serve as a complement to human insight. 

The “value” of analytics in sports has been the topic of debate.  A high profile example of this occurred between Charles Barkley and Daryl Morey.  Barkley has gone on record questioning the value of analytics.

“Analytics don’t work at all. It’s just some crap that people who were really smart made up to try to get in the game because they had no talent. Because they had no talent to be able to play, so smart guys wanted to fit in, so they made up a term called analytics.  Analytics don’t work.” 

The quote reflects an extreme perspective and it is legitimate to question whether Charles Barkley has the background to assess the value of analytics (or maybe he does, who knows?).  But, I do think that Barkley’s opinion does have significant merit.

In much of the popular press surrounding books like Moneyball or The Extra 2% analytics often seem like a magic bullet.  The reality is that statistical models are better viewed as decision support aids.  Note that I am talking about the press rather than then books.

The fundamental issue is that models and statistics are incomplete.  They don’t tell the whole story.  A lot of analytics revolves around summarizing performance into statistics and then predicting how performance will evolve. Defining a player based on a single number is efficient but it can only capture a slice of the person’s strengths and weaknesses.  Predicting how human performance will evolve over time is a tenuous proposition.

What statistics and models are good at is quantifying objective relationships in the data.  For example, if we were interested in building a model of how quarterback performance translates from college to professional football we could estimate the mathematical relationship between touchdown passes at the college level and touchdown passes at the pro level.  A regression model would give us the numerical patterns in the data but such a model would likely have little predictive power since many other factors come in to play.

The question is whether the insights generated from analytics or the incremental forecasting power actually translate into something meaningful.  They can.  But the effects may be subtle and they may play out over years.  And remember we are not even considering the financial side of things.  If the best predictive models improve player evaluations by a couple of percent maybe it translates to your catcher having a 5% higher on base percentage or your quarterback having a passer rating that is 1 or 2 points higher.  These things matter.  But are they dwarfed by being able to throw 10 or 20 million more into signing a key player?

If the key to winning a championship is having a couple of superstars.  Then maybe analytics don’t matter much.  What matters is being able to manage the salary cap and attract the talent.  But maybe the goal is to make the playoffs in a resource or salary cap constrained environment.  Then maybe spending efficiently and generating a couple of extra is the objective.  In this case analytics can be a difference maker.

Understanding the Organization: Sports Analytics Series Part 2

The purpose of this series is to discuss the use of analytics in sports organizations (see part 1).  Rather than jump into a discussion of models, I want to start with something more fundamental.  I want to talk about how organizations work and how people make decisions.  Sophisticated statistics and detailed data are potentially of great value.  However, if the organization or the decision maker is not interested in or comfortable with advanced statistics then it really doesn’t matter if the analyses are of high quality.

Analytics efforts can fail to deliver optimal value for a variety of reasons in almost any industry.  The idea that we can use data to guide decisions is intuitively appealing.  It seems like more data can only create more understanding and therefore better decisions.  But going from this logic to improved decision making can be a difficult journey.

Difficulties can arise from a variety of sources.  The organization may lack commitment in terms of time and resources.  Individual decision makers may lack sufficient interest in, or understanding of analytics.  Sometimes the issue can be the lack of vision as to what analytics is supposed accomplish.  There can also be a disconnect between the problems to be solved and the skills of the analytics group.

These challenges can be particularly significant in the sports industry because there is often a lack of institutional history of using analytics.  Usually organizations have existing approaches and structures for decision making and the incorporation of new data structures or analytical techniques requires some sort of change.  In the earliest stages, the shift towards analytics involves moving into uncharted territory.  The decision maker is (implicitly) asked to alter how he operates and this change may be driven by information that is derived from unfamiliar techniques.

Several key concerns can be best illustrated by considering two categories of analyses.  The first category involves long-term projects for addressing repeated decisions.  For instance, a common repeated decision might be drafting players.  Since a team drafts every year it makes sense to assemble extensive data and to build high quality predictive models to support annual player evaluation.  This kind of organizational decision demands a consistent and committed approach.  But the important point is that this type of decision may require years of investments before a team can harvest significant value. 

It is also important to realize that with repeated tasks there will be an existing decision making structure in place.  The key is to think about how the “analytics” add to or compliment this structure rather than thinking that “analytics” is a new or replacement system (we will discuss why this is true in detail soon).  The existing approach to scouting and drafting likely involves many people and multiple systems.  The analytics elements need to be integrated rather than imposed.

A second category of analyses are short-term one-off types of projects.  These projects can be almost anything ranging from questions about in game strategies or very specific evaluations of player performance.  These projects primarily demand flexibility.  Someone in the organization may see or hear something that generates a question.  This question then gets tossed to the analytics group (or person) and a quick turn-around is required.

Since these questions can come from anywhere the analytics function may struggle with even having the right data or having the data in an accessible format.  Given the time sensitive nature of these requests there will likely be a need to use flawed data or imperfect methods.  The organization needs to be realistic about what is possible in the short-term and more critically the analysis needs to be understood at a level where the human decision maker can adjust for any shortcomings (and there are always shortcomings).  In other words, the decision maker needs to understand the limitations associated with a given analysis so that the analytics can inform rather than mislead.

The preceding two classes of problems highlight issues that arise when an organization starts on the path towards being more analytically driven.  In addition, there can also be problems caused by inexperienced analysts.  For example, many analysts (particularly those coming from academia) fail to grasp is that problems are seldom solved through the creation of an ideal statistic or equation.  Decision making in organizations is often driven by short-term challenges (putting out fires).  Decision support capabilities need to be designed to support fast moving, dynamic organizations rather than perfectly and permanently solving well defined problems.

In the next entry, we will start to take a more in depth look at how analytics and human decision making can work together.  We will talk about the relative merits of human decision making versus statistical models.  After that we will get into a more psychological topic –decision making biases.

Part 2 Key Takeaways…

  • The key decision makers need to be committed to and interested in analytics.
  • Sufficient investment in people and data is a necessary condition.
  • Many projects require a long-term commitment. It may be necessary to invest in multiyear database building efforts before value can be obtained.

The latest work from Professor Mike Lewis