2015 NFL Fan Equity Rankings

Note: For Part 2 of our rankings (NFL Social Media Equity) click here 

For the past three years, we have tried to answer the question of which teams have the “best” fans. “Best” is a funny word that can mean a lot of things but what we are really trying to get at is what team has the most avid, engaged, passionate and supportive fans. The twist is that we are doing this using hard data, and that we are doing it in a very controlled and statistically careful fashion.

By hard data we mean data on actual fan behavior. In particular, we are focused on market outcomes like attendance, prices or revenues. A lot of marketing research focused on branding issues relies on things like consumer surveys. This is fine in some ways, but opinion surveys are also problematic. It’s one thing to just say you are a fan of a local team, and quite another to be willing to pay several thousand dollars to purchase a season ticket.

To truly understand fan engagement, it’s important to statistically control for temporary changes in the environment. This is a huge issue in sports because fans almost always chase a winner. The real quality of the sports brand is revealed when fans support a team through the tough times. The Packers or Steelers will sell-out the year after they go 6-10, not so much for the Jaguars. The other thing that separates sports brands from consumer brands is the cities themselves. The support a New York team gets in terms of attendance and pricing is always going to be tough to achieve for the team in Charlotte.

In terms of the nuts and bolts of what we are about to present, we use fifteen years of data on NFL team performance, ticket prices, market populations, median incomes, won-loss records and multiple other factors. We create statistical models of box office revenue, and then see which teams over- and under- perform the model’s predictions.   For a much fuller description, and some limitations about what we are doing click here.

So who has the best fans? The winner this year is the Dallas Cowboys followed by the Patriots, Giants, Ravens, and Jets. The Cowboys have a storied history, a market that loves all forms of football, and a world-class stadium. “Deflate-gate” hasn’t hit the window of our analysis yet (it is after the 2014-2015 season), but the Pats strong showing in our ranking suggests that the impact will be small. The Jets position might be somewhat surprising, but this team draws well, and has great pricing power without a lot of winning on the field.

Maybe the biggest surprise is some of the teams that aren’t at the top. The Steelers and Packers have great fan followings.  The Seahawks are slowly developing a great fan base.  And these teams will do better when we switch to non-financial metrics such as social media following. But for the current “revenue premium” model these teams just don’t price high enough. In a way, these teams with massive season ticket waiting lists are the most supportive of their fans.

At the bottom we have the Bills, Jags, Raiders, Browns and Dolphins. There are some interesting and storied teams on this list. The Raiders have a ton of passion in the end zone but maybe not throughout the stadium.   Cleveland may have never recovered from the loss of the Ravens, and the recreation of the Browns. Florida is almost always a problem on our lists. Whether it is the weather or the fact that many of the locals are transplants that didn’t grow up with the team, Florida teams just don’t get the support of teams in other regions.


Mike Lewis & Manish Tripathi, Emory 2015.

2015 NBA Draft Efficiency

Last night, the NBA held its annual draft.  The NBA draft is often a time for colleges to extol the success of their programs based on the number of draft picks they have produced.  Fans and programs seem to be primarily focused on the output of the draft.  Our take is a bit different, as we examine the process of taking high school talent and converting it into NBA draft picks.  In other words, we want to understand how efficient are colleges at transforming their available high school talent into NBA draft picks?  Today, we present our third annual ranking of schools based on their ability to convert talent into NBA draft picks.

Our approach is fairly simple.  Each year, (almost) every basketball program has an incoming freshman class.  The players in the class have been evaluated by several national recruiting/ranking companies (e.g. Rivals, Scout, etc…).  In theory, these evaluations provide a measure of the player’s talent or quality.  Each year, we also observe which players get drafted by the NBA.  Thus, we can measure conversion rates over time for each college.  Conversion rates may be indicative of the school’s ability to coach-up talent, to identify talent, or to invest in players.  These rates may also depend on the talent composition of all of the players on the team.  This last factor is particularly important from a recruiting standpoint.  Should players flock to places that other highly ranked players have selected?  Should they look for places where they have a higher probability of getting on the court quickly?  A few years ago, we conducted a statistical analysis (logistic regression) that included multiple factors (quality of other recruits, team winning rates, tournament success, investment in the basketball program, etc…).  But today, we will just present simple statistics related to school’s ability to produce output (NBA draft picks) as a function of input (quality of recruits).

For our analysis, we only focused on first round draft picks, since second round picks often don’t make the NBA.  We also only considered schools that had at least two first round draft picks in past six years.  Here are our rankings:

NBA First Round Draft Efficiency 2010-2015Colorado may be a surprise at the top of the list.  However, they have converted two three-star players into first round NBA draft picks in the last six years.  This is impressive since less than 1.5% of three-star players become first round draft picks.  Kentucky also stands out because while they do attract a lot of great HS talent, they have done an amazing job of converting that talent into a massive number of 1st round draft picks.

Here are some questions you probably have about our methodology:

What time period does this represent?

We examined recruiting classes from 2006 to 2014 (this represents the year of graduation from high school), and NBA drafts from 2010 to 2015.  We compiled data for over 300 Division 1 colleges.

How did you compute the conversion rate?

The conversion rate for each school is defined as (Sum of draft picks for the 2010-2015 NBA Drafts)/(Weighted Recruiting Talent).  Weighted Recruiting Talent is determined by summing the recruiting “points” for each class.  These “points” are computed by weighting each recruit by the overall population average probability of being drafted for recruits at that corresponding talent level.  We are trying to control for the fact that a five-star recruit is much more likely to get drafted than a four or three-star recruit.  We are using ratings data from Rivals.com.  We index the conversion rate for the top school at 100.

Mike Lewis & Manish Tripathi, Emory University 2015

Analytics vs Intuition in Decision Making Part IV: Outliers

We have been talking about developing predictive models for tasks like evaluating draft prospects.  Last time we focused on the question of what to predict.  For drafting college prospects, this amounts to predicting things like rookie year performance measures.  In statistical parlance, this is the dependent or the Y variables.  We did this in the context of basketball and talked broadly about linear models that deliver point estimates and probability models that give the likelihood of various categories of outcomes.

Before we move to the other side of the equation and talk about the “what” and the “how” of working with the explanatory or X variables, we wanted to take a quick diversion and discuss predicting draft outliers.  What we mean by outliers is the identification of players that significantly over or under perform relative to their draft position.  In the NFL, we can think of this as the how to avoid Ryan Leaf with the second overall pick and grab Tom Brady before the sixth round problem.

In our last installment, we focused on predicting performance regardless of when a player is picked.  In some ways, this is a major omission.  All the teams in a draft are trying to make the right choices.  This means that what we are really trying to do is to exploit the biases of our competitors to get more value with our picks.

There are a variety of ways to address this problem, but for today we will focus on a relatively simple two-step approach.  The key to this approach is to create a dependent variable that indicates that a player over-performs relative to their draft position. And then try and understand if there is data that is systematically related to these over and under performing picks.

For illustrative purposes, let us assume that our key performance metric is rookie year player efficiency (PER(R)).  If teams draft rationally and efficiently (and PER is the right metric), then there should be a strong linkage between rookie year PER and draft position in the historical record.  Perhaps we estimate the following equation:

PER(R) = B0 + BDPDraftPosition + …

where PER(R) is rookie year efficiency and draft position is the order the player is selected.  In this “model” we expect that when we estimate the model that BDP will be negative since as draft position increases we would expect lower rookie year performance.  As always in these simple illustrations, the proposed model is too simple.  Maybe we need a quadratic term or some other nonlinear transformation of the explanatory variable (draft position).  But we are keeping it simple to focus on the ideas.

The second step would then be to calculate how specific players deviate from their predicted performance based on draft position.  A measure of over or under performance could then be computed by taking the difference between the players actual PER(R) and the predicted PER(R) based on draft position.

DraftPremium = PER(R) – PER(R)

Draft Premium (or deficit) would then be the dependent variable in an additional analysis.  For example, we might theorize that teams overweight the value of the most recent season.   In this case the analysts might specify the following equation.

DraftPremium = B0 + BPPER(4) + BDIFF(PER(4) – PER(3)) + …

This expression explains the over (or under) performance (DraftPremium) based on PER in the player’s senior season (PER(4)) and the change in PER between the 3rd and 4th seasons.  If the statistical model yielded a negative value for BDIFF it would suggest that players with dramatic improvements tended to be a bit of a fluke.  We might also include physical traits or level of play (Europe versus the ACC?).  Again, we will call these empirical questions that must be answer by spending (a lot of) time with the data.

We could also define “booms” or “busts” based on the degree of deviation from the predicted PER.  For example, we might label players in the top 15% of over performers to be “booms” and players in the bottom 15% to be “busts”.  We could then use a probability model like a binary probit to predict the likelihood of boom or bust.

Boom / Bust methodologies can be an important and specialized tool.  For instance, a team drafting in the top five might want to statistically assess the risk of taking a player with a minimal track record (1 year wonders, high school preps, European players, etc…).   Alternatively, when drafting in late rounds maybe it’s worth it to pick high risk players with high upsides.  The key point about using statistical models is that words like risk and upside can now be quantified.

For those following the entire series it is worth noting that we are doing something very different in this “outlier” analysis compared to the previous “predictive” analyses.  Before, we wanted to “predict” the future based on currently available data.  Today we have shifted to trying to find ‘value” by identifying the biases of other decision makers.

Mike Lewis & Manish Tripathi, Emory University 2015.

For Part 1 Click Here

For Part 2 Click Here

For Part 3 Clicke Here

Analytics vs Intuition in Decision-Making Part III: Building Predictive Models of Performance

So far in our series on draft analytics, we have discussed the relative strengths and weaknesses of statistical models relative to human experts, and we have talked about some of the challenges that occur when building databases.  We now turn to questions and issues related to building predictive models of athlete performance.

“What should we predict?” is a deceptively simple question that needs to be answered early and potentially often throughout the modeling process.  Early – because we need to have some idea of what we want to predict before the database can be fully assembled.  Often – because frequently it will be the case that no one metric performance will be ideal.

There is also the question of what “type” of thing should be predicted.  It can be a continuous variable, like how much of something.  Yards gained in football, batting average in baseball or points score in basketball would be examples.  It can also be categorical (e.g. is the player an all-star or not).

A Simple Example

So what to predict?  For now, we will focus on basketball with a few comments directed towards other sports.  We have options.  We can start with something simple like points or rebounds (note that these are continuous quantities – things like points that vary from zero to the high twenties rather than categories like whether a player is a starter or not).  We don’t think these are bad metrics but they do have limitations.  The standard complaint is that these single statistics are too one dimensional.  This is true (by definition, in this case) but there may be occasions when this is a useful analysis.

First, maybe the team seeks a one dimensional player.  The predicted quantity doesn’t need to be points.  Perhaps, there is a desperate need for rebounding or assists.  It’s a team game, and it is legitimate to try and fill a specialist role.  A single measure like points might also be useful because it could be correlated with other good “things” that are of interest to the team.

For a moment, let us assume that we select points per game as the measure to be predicted, and we predict this using all sorts of collegiate statistics (the question of the measures we should use to predict is for next time).   In the equation below, we write what might be the beginning of a forecasting equation.  In this expression, points scored during the rookie season (Points(R)) is to be predicted using points scored in college (Points(C)), collegiate strength of schedule (SOS), an interaction of points scored and strength of schedule (Points(C) X SOS) and potentially other factors.

Points(R)=β0P Points(C)+βSOS SOS+βPS Points(C)×SOS+⋯

The logic of this equation is that points scored rookie year is predictable from college points, level of competition and an adjustment for if the college points were scored against high level competition.  When we take this model to the data via a linear regression procedure we get numerical values for the beta terms.  This gives us a formula that we can use to “score” or predict the performance of a set of prospects.

The preceding is a “toy” specification in that a serious analysis would likely use a greatly expanded specification.  In the next part of our series we will focus on the right side of the equation.  What should be used as explanatory variables and what form these variables should take.

Some questions naturally arise from this discussion…

  • What pro statistics are predictable based on college performance. Maybe scoring doesn’t translate but steals do?
  • Is predicting rookie year scoring appropriate? Should we predict 3rd year scoring to get a better sense of what the player will eventually become?
  • Should the model vary based on position? Are the variables that predict something like scoring or rebounding be the same for guards versus forwards?

Most of these questions are things that should be addressed by further analysis.  One thing that the non-statistically inclined tend not to get is that there is value in looking at multiple models.  It is seldom clear-cut what the model should look like, and it’s rare that one size fits all (same model for point guards and centers?).  And maybe models only work sometimes.  Maybe we can predict pro steals but not points.  One reason why the human experts need to become at least statistically literate is that if they aren’t, the results from that analytics guys either need to be overly simplified or the expert will tend to reject the analytics because the multitude of models is just too complex.

A simple metric like points (or rebounds, or steals, etc…) is inherently limited.  There are a variety of other statistics that could be predicted that better capture the all-round performance of a player or the player’s impact on the team.  But the basic modeling procedure is the same.  We use data on existing pros to estimate a statistical model that predicts the focal metric based on data available about college prospects.

Some other examples of continuous variables we might want to predict…

  1. Player Efficiency

How about something that includes a whole spectrum of player statistics like John Hollinger’s Player Efficiency Rating (PER)?  PER involves a formula that weights points, steals, rebounds assists and other measures by fixed weights (not weights estimated from data as above).  For instance, points are multiplied by 1 while defensive rebounds are worth .3.

There are some issues with PER, such as the formula being structured that even low percentage shooters can increase their efficiency rates by taking more shots.  But the use of multiple types of statistics does provide a more holistic measurement.   In our project with the Dream we used a form of PER adapted to account for some of the data limitations.  In this project some questions were raised whether PER was an appropriate metric for the women’s game or if the weights should be different.

  1. Plus/Minus

Plus/Minus rates are a currently popular metric.  Plus/Minus stats basically measure how a player’s team performs when he or she is on the court.  Plus/Minus is great because it captures the fact that teams play better or worse when a given player is on the court.  But Plus/Minus can also be argued against if substitution patterns are highly correlated.  In our project with the Dream Plus/Minus wasn’t considered simply because we did not have a source.

  1. Minutes played

One metric that we like is simply minutes played.  While this may seem like a primitive metric, it has some nice properties.  The biggest plus is that it reflects the coach’s (a human expert) judgment.  Assuming that the human decision is influenced by production (points, rebounds, etc…) this metric is more of an intuition / analysis hybrid.  On the downside, minutes played are obviously a function of the other players on the team and injuries.

Categories of Success & Probability Models

As noted, the preceding discussion revolves around predicting numerical quantities.  There is also a tradition of placing players into broad categories.  A player that starts for a decade is probably viewed as a great draft pick while someone that doesn’t make a roster is a disaster.  Our goal with “categories” is to predict that probability that each outcome occurs.

This type of approach likely calls for a different class of models.  Rather than use linear regression we would use a probability model.  For example, there is something called an order logistic regression model that we can use to predict the probability of “ordered” career outcomes.  For example, we could predict the probabilities of a player becoming an all-star, a long-term starter, an occasional starter, career backup or a non-contributor with this type of model.  Again, we can make this prediction as a function of the player’s college performance and other available data.

Below we write an equation that captures this.

Pr(Category=j)=f(college stats,physical attributes,etc…)

This equation says that the probability that a player becomes some category “j” is some function of a bunch of observable traits.  We are going to skip the math but these types of models do require a bit “more” than linear regression models (specialized software mostly) and are more complicated to interpret.

A nice feature of probability models is that the predictions are useful for risk assessment.  For example, an ordered logistic model would provide probability estimates for the range of player categories.  A given prospect might have a 5% chance of becoming an all-star, a 60% of becoming a starter and 35% chance of being a career backup.  In contrast, the linear probability models described previously will only produce a “point” estimate.  Something along the lines of a given prospect is predicted to score 6.5 points per game or to grab 4 rebounds per game as a pro.

This is probably a good place to break.  There is much more to come.  Next time we will talk about predicting outliers and then spend some time on the explanatory variables (what we use to predict).  On a side note – this series is going to form the foundation for several sessions of our sports analytics course.  So, if there are any questions we would love to hear them (Tweet us @sportsmktprof).

Click here for Part I

Click here for Part II 

Mike Lewis & Manish Tripathi, Emory University 2015.

Analytics vs Intuition in Decision-Making Part II: Too Much and Too Little Data

The use of analytics in sports personnel decisions such as drafting and free agency signings is a topic with obvious popular appeal. Sports personnel decisions are fundamentally about how people will perform in the future. These are also tough, complex high risk decisions that are the fodder for talk radio and second guessing from just about everyone.

So how can we make these decisions? As we noted in our last post, the choice between using analytics versus using the “gut” is probably a decision that doesn’t need to be made. Analytics and data should have a role. The question is how much emphasis should be placed on the “models” and how much on the intuition of the “experts.”

In this second installment of the series, we begin the process of going deeper into the mechanics and challenges involved in leveraging data and building models to support personnel decisions. As a backdrop for this discussion, we are going to tell the story of project we helped a group of Emory students complete for the WNBA’s Atlanta Dream. Going into detail about this story / process should illuminate a couple of things. First, there is logic to how these types of analyses can best be structured. Second, a careful and systematic discussion of a project may clarify both the weaknesses and strengths of “Moneyball” type approaches to decision making.

To begin, we want to thank the Dream. This was a great project that the students loved, and it gave us an opportunity to think about the challenges in modeling draft prospects in a whole new arena. An early step in any analytics project is the building of the data infrastructure. For the WNBA, this was a challenge. Storehouses of sports data come from all sorts of places but they often start out as projects driven more by fan passion than any formal effort from an established organization. Baseball is probably the gold standard for information with detailed data going back a century. In contrast, for women’s professional and college basketball the information is comparatively sparse. There’s not a lot and it doesn’t go back very far.

After some searching (with a lot of great assistance from the Dream) we were able to identify information sources for both professional and collegiate stats. As we started to assemble databases a few things became apparent:

  • First, the data available was nowhere as detailed as what could be found for the men’s game. We were limited to season level stats at both the pro and college level. Furthermore, all we had were the basics – the data in box scores. This is good information, but it does leave the analyst wanting more.
  • Second, the data fields on professional performance were not identical to the data on collegiate performance. For example, the pro level data breaks rebounds down into offensive and defensive boards. Maybe this is a big deal and maybe not. It does make it difficult to use established metrics that place different value on the two types of rebounds.
  • Third, there was a LOT of missing data, and multiple types of missing data. In terms of player statistics, information on turnovers was at best scarce. Again, this makes it difficult to use established metrics like PER. The other thing that was missing is players themselves. We never were able to create a repository of data on international players that didn’t participate in NCAA basketball. As a side note, even if we had found international data it would be hard to interpret. How would we judge the importance of a rebound in Europe versus a rebound in South America? This isn’t just a problem for women’s basketball as this is also an issue in any global sport.

There were also a lot of things that we would have liked to have had. Some of this may have been available, and maybe we did not look hard enough. But we always need to ask the question of the incremental value versus the required effort. For example, information on players’ physical traits was very limited. We could obtain height but even basics like weight were difficult to find. And as far as we know – there is no equivalent to the NFL combine.

While these might seem like severe limitations, we think it’s really just par for the course in this type of research. Especially in the first go around! In analytics, you often work with what you have and you try to be clever in order to get the most from the data. We will get to how to approach this type of problem soon. But even with the limitations, we actually have a LOT of data. At the college level we have 4 years of data on games, played, field goals made, field goals attempted, rebounds, steals, 3 pointers, etc… If we have 15 data fields for 4 years we have 60 statistics per player. Add in data on height, strength of schedule and assorted miscellaneous fields and we have maybe 70 pieces of data per player. And maybe we want to do things like combine pieces of information; things like multiplying points per game by strength of schedule to get a measure that accounts for the greater difficulty of scoring in the ACC versus a lower tier conference. So maybe we end up with a 100 variables we want to investigate.

Why are we discussing how many field we have per prospect? Because it brings us to our next problem – the relatively small number of observations in most sports contexts. Remember the basic game in this analysis is to understand “what” predicts a successful pro career. This means that we need observations on successful and less successful pro careers.

The WNBA consists of twelve teams with rosters of twelve players. This means if we go back and collect a few years of data we are looking at just a couple hundred players with meaningful professional careers. While this may seem like a sizeable amount of data, to the data scientist this is almost nothing. Our starting point is trying to relate professional career performance to college data, which in this case means maybe two hundred pro careers to be explained by potentially about a hundred explanatory variables.

It really is a weird starting point. We have serious limitations on the explanatory data available, but we also wish the ratio of observations (players) to explanatory data fields was higher. In our next installment, we will start to talk about what we are trying to predict (measures of pro career success). Following that, we will talk about how to best use our collection of explanatory variables (college stats).

Mike Lewis & Manish Tripathi, Emory University 2015.

Analytics vs Intuition in Decision-Making

Charles Barkley“I’m not worried about Daryl Morey. He’s one of those idiots who believe in analytics.”

Whenever the Houston Rockets do anything good (make the Western Conference Finals) or bad (lose the Western Conference Finals) it’s a sure thing that the preceding Charles Barkley quote about Daryl Morey will be dusted off.  We teach a couple of courses focused on the use of analytics, so these occasions always feel like what a more traditional academic would refer to as a teachable moment.  For us, it’s an occasion to rant on a favorite topic.  The value of data and analytics to business problems is something we think a lot about.  When the business is sports, then this becomes a topic of wide ranging interest.  Before we get into this, one thing to note is that this isn’t going to be a blanket defense of the goodness of analytics.  Sir Charles has a point.

Of course, the reality is that there is probably less distance between the perspectives of Mr. Barkley and Mr. Morey than either party realizes.  The key to the quote and the likelihood that there is a misunderstanding is in the word “believes.”  Belief is a staple of religion, so the quote implies that Daryl Morley is unthinking and just guided by whatever data or statistical analysis is available.  From the other direction, the simplistic interpretation is that Charles Barkley sees no value in data or analysis, and believes that all decisions should be made based on “gut feel.”  These are obviously smart guys so these characterizations undoubtedly don’t reflect reality.

However, the Barkley quote and the notion that decisions are either driven by data analysis or by intuition and gut is a useful starting point for talking about analytics in sports (and other businesses).  As the NBA draft approaches, we are going to discuss some key point related to using analytics to support player decisions.

As a starting point for this series we wanted to discuss the proper use of “analytics” and “intuition” in some general terms.  In regards to analytics, one thing that we have learned from time in the classroom is that statistical analysis and big data are mysterious things to most folks.  The vast majority of the world just isn’t comfortable with building and interpreting statistical models.  And the percentage of people that both really understand statistical models (strengths and limitations) and who also truly understand the underlying domain (be it marketing or sports) is even rarer.

One key truism about statistical models is that they are always incomplete and incorrect.  For example, let’s say that we want to predict college prospects’ success in the NBA.  What this typically boils down to is creating a mathematical equation that relates performance at the college level, physical traits and other factors (personality tests?) to NBA performance.  (For now we will neglect the potential difficulties involved in figuring out the right measure of NBA success, but this is potentially a huge issue.)

In some ways, the analytics game is simple.  We want to relate “information” to pro performance.  Potentially teams can track data on many statistics going back to high school.  These stats may be at the season, game or even play-by-play level.  The challenging part is determining what information to use and what form the data should take.  Assuming we can create the right type of statistical model, we can then identify college players with the right measurable.  On a side note, this is what marketers do all the time – figure out the variables that are correlated with future buying, and then target the best prospects.

Computers are great at this kind of analysis.  Given the necessary data, a computer with the right software will tell us the exact relationship between two pieces of data.  For example, maybe college steal stats are very predictive of professional steal stats, but maybe rebounding in not.  An appropriate statistical analysis will quantify how these relationships work on average.  The computer will give us the facts without bias.  It will also incorporate all the data we give it.

This is what computers, stats, and data are good at.  Summarizing relationships without bias.  But analytics also has its pitfalls.  We will deal with these in detail in later posts, but the big problem is the relative “incompleteness” of models.  Statistical models, and any fancy stat, are by definition limited to what is used in their creation.  While results vary, when predicting individual level results such as player performance statistical models ALWAYS leave a lot unexplained.

And this is where the human element comes in.  Human beings are great at combining multiple factors to determine overall judgments.  Charles Barkley has been watching basketball for decades.  His evaluations likely include his sense of the athlete’s past performances, the athlete’s physical capabilities and the player’s mental approach to the game.  Without much conscious thought an expert like Barkley is condensing a massive amount of diverse information into a summary judgment.  Barkley may automatically incorporate judgments about factors ranging from player work ethic, level of competition, past coaching, obscure physical traits, observations about skills not captured in box scores and myriad other factors along with observable data like points scored into his evaluations.  It’s an overused academic word, but experts like Barkley are great a making holistic judgments.

But experts are people, which means that they are the product of their experiences and prone to biases.  Perhaps Charles Barkley underestimates the value of height or wing-span because he never had the dimensions of a classic power forward, or, maybe not.  It could also be that maybe he overestimates the importance of height and wing span based on some overcompensation.  The point is that he may not get the importance of any given trait exactly right.

To some extent we have two systems for making decisions; Computers that crunch numerical data and people that make heuristic judgments.  Both systems have good traits and both have flaws.  Computers are fast, can process lots of data and unbiased. But they are limited by the design of the models and the conclusions are always incomplete or limited.  Experts can come up with complex and complete evaluations but there is always the issue of bias.

What this whole discussion boils down to is an issue of balance.  In one-off decisions like selecting a player or signing a free agent analytics should not be the complete driver of the decision.  These are evaluations of relatively small sets of players and it’s hard, for a variety of reasons, to create good statistical models.  Since we are usually looking for a complex overall judgment the holistic expert judgments are probably the best way to go.  More generally, in this type of decision making – think about tasks like hiring an executive – analytics should play a supporting role.  But it should play a role.  Neglecting information, especially unbiased information can only be a suboptimal approach.  The trick is that the expert fully understands the analytics and can use the analytics based information to improve decision making.

In the lead up to this year’s NBA draft, we are going to discuss some issues related to player analytics.  As part of this we are going to tell the story of a project focused on draft analytics that we recently partnered on with the Atlanta Dream and members of the Emory women’s basketball team.  We think it’s an interesting story and it provides an opportunity to discuss several data analysis principles relevant to player selection in more detail.  Stay tuned!

 Mike Lewis & Manish Tripathi, Emory University, 2015.

WNBA Social Media Equity Rankings

We begin our summer of fan base rankings with a project done by one of our favorite Emory students – Ilene Tsao.  Ilene presents a multi-dimensional analysis of the WNBA across Facebook, Twitter and Instagram.  The first set of rankings speak to the current state of affairs.  Seattle leads the way followed by LA and Atlanta.  In the second analysis, Ilene takes a look at what is possible in each market (by controlling for time in market and championships).  In this analysis the Atlanta Dream lead the way followed by Minnesota and Chicago.

The teams in the WNBA are constantly looking for ways to improve their brand and continue to expand their fan base. Social media provides a way to measure fan loyalty and support. In order to calculate WNBA teams’ social media equity, we collected data on each team’s followers across the three main social media platforms of Facebook, Twitter, and Instagram. We then ran a regression model to help predict followers for each platform as a function of factors such as metropolitan populations, number of professional teams, team winning percentages, and playoff achievements. After creating this model, we used the predicted number of followers and compared it to each team’s actual number of social media followers.  Our goal is to see who “over” or “under” achieves based on social media followers on average. We then ranked the WNBA teams based on the results.

The first model only used the metropolitan population and winning percentage of each team. After taking the average of the Facebook, Twitter, and Instagram rankings, we found the Seattle Storm had the best performance. The Connecticut Sun and Washington Mystics consistently ranked as the bottom two teams across all three platforms, but teams like the Los Angeles Sparks and Atlanta Dream had more variation. The Dream ranked 6th for Twitter, but 1st for Instagram while the Sparks ranked 1st for Twitter and 6th for Instagram. This could be because both Instagram and the Dream recently joined the social media world and the WNBA, while the Sparks and Twitter have been around for longer. Based on raw numbers, the New York Liberty has high performance in terms of social media followers, but when we adjust for market size and winning percentage, the team does poorly.

Rankings for Facebook, Twitter, and Instagram based on the metropolitan population and the teams’ winning percentages:

WNBA Social Media 1

The second model extended the previous analysis by incorporating the number of other professional teams in the area and number of WNBA championships won into the regression analysis. This model seemed to be a better fit for our data and resulted in small adjustments in the rankings. After taking the average of all three rankings with the new factors, the Atlanta Dream was ranked first while passing the Seattle Storm and Los Angeles Sparks. The Mystics were no longer consistently the worst team, but were still in the bottom half of the rankings.

Rankings based on metropolitan population, winning percentage, number of other professional teams, and number of WNBA championships:

WNBA Social Media 2Ilene Tsao, Emory University, 2015.

2015 NFL Draft Efficiency: A Good Sign of Things to Come for Gator fans?

The first three rounds of the 2015 NFL Draft concluded last night. While there was no Twitter-breaking Manziel event like last year, the event was once again a marketing success for the NFL.

For the past two years, we have examined the NFL draft from a unique perspective.  We analyze the process of taking high school talent and converting it into NFL draft picks.  In other words, we want to understand how efficient are colleges at transforming their available high school human capital into NFL draft picks?

Our approach is fairly simple. Each year, every FBS football program has an incoming class. The players in the class have been evaluated by several national recruiting/ranking companies (e.g. Rivals, Scout, etc.). In theory, these evaluations provide a measure of the player’s talent or quality. Each year, we also observe which players get drafted by the NFL. Thus, we can measure conversion rates over time for each college. Conversion rates may be indicative of the school’s ability to coach-up talent, to identify talent, or to invest in players. These rates may also depend on the talent composition of all of the players on the team. This last factor is particularly important from a recruiting standpoint. Should players flock to places that other highly ranked players have selected?

 How did you compute the conversion rate?

The conversion rate for each school is defined as (Sum of draft picks for the first three rounds of 2015 Draft)/(Weighted Recruiting Talent). Weighted Recruiting Talent is determined by summing the recruiting “points” for the relevant eligible class for the 2015 NFL Draft for each program (this can include eligible juniors as well as fifth year seniors). These “points” are computed by weighting each recruit by the overall population average probability of being drafted in the first three rounds for recruits at that corresponding talent level over the last three years. For example, a five-star recruit is much more likely to get drafted than a four or three-star recruit. We are using ratings data from Rivals.com.

2015 nfl draftThe table above shows the results of our analysis of the first three rounds of the draft.  Colorado State had two draft picks in the first three rounds that were both 3-stars or below coming out of high school.  It will intersting to see how Jim McElwain will be able to shape the higher level of talent he will most likely attract at the University of Florida.  Please note that we did not include schools that only had one player drafted in the first three rounds, as that could be considered an aberration. Of course, a similar argument could be made that one draft is too small of a sample to rate the efficiency of a college. Thus, the table below represents results from the last 4 years of drafts (2012-2015).

2012-2015 NFL DraftThe school that really stands out over the last four years with respect to the development of talent is Stanford University.  While Connecticut and Boise State may be rated higher, Stanford has produced more than double the number of draft picks of the other two schools.

Mike Lewis & Manish Tripathi, Emory University, 2015.

The latest work from Professors Lewis & Tripathi