Player Analytics Fundamentals: Part 4 – Statistical Models

Today’s post introduces the topic of statistical modeling.  This is, maybe, the trickiest part of the series to write.  The problem is that mastering the technical side of statistical analysis usually takes years of education.  And, more critically, developing the wisdom and intuition to use statistical tools effectively and creatively takes years of practice.  The goal of this segment is to point people in the right direction, more than to provide detailed instruction.  That said – I can adjust if there is a call for more technical material.  (If you want to start from the beginning parts 1, 2 and 3 are a click away.)

Let’s start with a simple point.  The primary tool for every analytics professional (sports or otherwise) should be linear regression.  Linear regression allows the analyst to quantify the relationship between some focal variable of interest (dependent measure or DV) and a set of variables that we think drive that variable (independent variables).  In other words, regression is a tool that can produce an equation that shows how some inputs produce an outcome of interest.  In the case of player analytics, this might be a prediction of future performance based on a player’s past statistics or physical attributes.

To make this more concrete, let’s say we want to do an analysis of rookie quarterback performance (we’ve been talking a bit about QB metrics so far in the series).  Selecting QBs involves significant uncertainty.  The transition from the college game to the pro game requires the QB to be able to deal with more complex offensive systems, more sophisticated defenses and more talented opposing players.  The task of the general manager is to identify prospects that can successfully make the transition.

Data and statistical analysis can potentially play a part in this type of decision.  The starting point would be the idea that observable data on college prospects can help predict rookie year performance.  As a starting point let’s assume that general managers can obtain data on the number of games won as a college player, whether the player graduated (or will graduate) and the player’s height.  (We just might be foreshadowing a famous set of rules for drafting quarterbacks).

The other key decision for a statistical analysis of rookie QB performance versus college career and physical data is a performance metric.  We could use the NFL passer rating formula that we have been discussing.  Or we could use something else.  For example, maybe the number of TD passes thrown as a rookie.  This metric is interesting as it captures something about playing time and ability to create scores.

Touchdowns are  also a metric that “fits” linear regression.  Linear regression is best suited to the analysis of quantitative variables that vary continuously.  The number of touchdowns we observe in data will range from zero to whatever the is the rookie TD record.  In contrast, other metrics such as whether the player becomes a starter or a pro bowler are categorical variables.  There are other techniques that are better for analyzing categorical variables.  (if you are a stats jockey and are objecting to the last couple of statements please see the note below).

The purpose of regression analysis is to create an equation of the following form:

This equation says that TD passes are a function of college wins, graduation and height.  The βs are the weights that are determined by the linear regression analysis.  Specifically, linear regression determines the βs that best fits the data.  This is the important point.  The weights or βs are determined from the data.  To illustrate how the equation works lets imagine that we estimated the regression model and obtained the following equation.

This equation says that we can predict rookie TD passes by plugging in each player’s data related to college wins, graduation and height.  It also says that a history of winning is positively related to TDs and graduation also is a positive.  The coefficient for height is zero.  This indicates that height is not a predictor of rookie TDs (I’m making these number up – height probably matters).  One benefit of developing a model is that we let the data speak.  Our “expert” judgment might be that height matters for quarterbacks.  The regression results can help identify decision biases if the coefficients don’t match the experts predictions.  I am neglecting the issue of significance for now – just to keep the focus on intuition.

Let’s say we have two prospects.  Lewis Michaels out of the University of Illinois who won 40 college games (hypothetical and unrealistic), graduated (in engineering) and is 5’10” (a Flutiesque prospect).  Our second prospect is Manny Trips out of Duke.  Manny won 10 games, failed to graduate and is 6’ tall.  Michaels would seem to be the better prospect based on the available data.  The statistical model allows us to predict how much better.

We make our predictions by simply plugging our player level data into the equation.  We would predict Lewis would throw 10 TDs in his rookie year (1+.1*40+5*1+0*70).  For Manny the prediction would be 2 TDs.  For now, I am just making up the coefficients (βs).  In a later entry I will estimate the model using some data on actual NFL rookie QB performance.

Regression has its shortcomings and many analysts love to object to regression analyses.  But for the most part, linear regression is a solid tool for analyzing patterns in data.  It’s also relatively easy to implement.  We can run regressions in Excel!  We shouldn’t underestimate how important it is to be able to do our analyses in standard tools like Excel.

I will extend our tool kit in a future entry.  I briefly mentioned categorical variables such as whether or not a player is a starter.  For these types of Yes/No (starter or not a starter) there is a tool called logistic regression that should be in our repertoire.

*One reason this note is tricky is that I’m trying to get the right balance and tone.  I can already hear the objections.  Lets save these for now.  For example, readers do not need to alert me to the fact that TDs are censored at zero.  Or that there is a mass point at zero because many rookies don’t play.  Or that TDs are counted in discrete units so maybe a Poisson model is more appropriate.  You get the idea.  There are many ways to object to any statistical model.  The real question isn’t whether a model is perfect.  The real question should be whether the model provides value.

Analytics vs Intuition in Decision Making Part IV: Outliers

We have been talking about developing predictive models for tasks like evaluating draft prospects.  Last time we focused on the question of what to predict.  For drafting college prospects, this amounts to predicting things like rookie year performance measures.  In statistical parlance, this is the dependent or the Y variables.  We did this in the context of basketball and talked broadly about linear models that deliver point estimates and probability models that give the likelihood of various categories of outcomes.

Before we move to the other side of the equation and talk about the “what” and the “how” of working with the explanatory or X variables, we wanted to take a quick diversion and discuss predicting draft outliers.  What we mean by outliers is the identification of players that significantly over or under perform relative to their draft position.  In the NFL, we can think of this as the how to avoid Ryan Leaf with the second overall pick and grab Tom Brady before the sixth round problem.

In our last installment, we focused on predicting performance regardless of when a player is picked.  In some ways, this is a major omission.  All the teams in a draft are trying to make the right choices.  This means that what we are really trying to do is to exploit the biases of our competitors to get more value with our picks.

There are a variety of ways to address this problem, but for today we will focus on a relatively simple two-step approach.  The key to this approach is to create a dependent variable that indicates that a player over-performs relative to their draft position. And then try and understand if there is data that is systematically related to these over and under performing picks.

For illustrative purposes, let us assume that our key performance metric is rookie year player efficiency (PER(R)).  If teams draft rationally and efficiently (and PER is the right metric), then there should be a strong linkage between rookie year PER and draft position in the historical record.  Perhaps we estimate the following equation:

PER(R) = B0 + BDPDraftPosition + …

where PER(R) is rookie year efficiency and draft position is the order the player is selected.  In this “model” we expect that when we estimate the model that BDP will be negative since as draft position increases we would expect lower rookie year performance.  As always in these simple illustrations, the proposed model is too simple.  Maybe we need a quadratic term or some other nonlinear transformation of the explanatory variable (draft position).  But we are keeping it simple to focus on the ideas.

The second step would then be to calculate how specific players deviate from their predicted performance based on draft position.  A measure of over or under performance could then be computed by taking the difference between the players actual PER(R) and the predicted PER(R) based on draft position.

Draft Premium (or deficit) would then be the dependent variable in an additional analysis.  For example, we might theorize that teams overweight the value of the most recent season.   In this case the analysts might specify the following equation.

DraftPremium = B0 + BPPER(4) + BDIFF(PER(4) – PER(3)) + …

This expression explains the over (or under) performance (DraftPremium) based on PER in the player’s senior season (PER(4)) and the change in PER between the 3rd and 4th seasons.  If the statistical model yielded a negative value for BDIFF it would suggest that players with dramatic improvements tended to be a bit of a fluke.  We might also include physical traits or level of play (Europe versus the ACC?).  Again, we will call these empirical questions that must be answer by spending (a lot of) time with the data.

We could also define “booms” or “busts” based on the degree of deviation from the predicted PER.  For example, we might label players in the top 15% of over performers to be “booms” and players in the bottom 15% to be “busts”.  We could then use a probability model like a binary probit to predict the likelihood of boom or bust.

Boom / Bust methodologies can be an important and specialized tool.  For instance, a team drafting in the top five might want to statistically assess the risk of taking a player with a minimal track record (1 year wonders, high school preps, European players, etc…).   Alternatively, when drafting in late rounds maybe it’s worth it to pick high risk players with high upsides.  The key point about using statistical models is that words like risk and upside can now be quantified.

For those following the entire series it is worth noting that we are doing something very different in this “outlier” analysis compared to the previous “predictive” analyses.  Before, we wanted to “predict” the future based on currently available data.  Today we have shifted to trying to find ‘value” by identifying the biases of other decision makers.

Mike Lewis & Manish Tripathi, Emory University 2015.

For Part 3 Clicke Here

Analytics vs Intuition in Decision-Making Part III: Building Predictive Models of Performance

So far in our series on draft analytics, we have discussed the relative strengths and weaknesses of statistical models relative to human experts, and we have talked about some of the challenges that occur when building databases.  We now turn to questions and issues related to building predictive models of athlete performance.

“What should we predict?” is a deceptively simple question that needs to be answered early and potentially often throughout the modeling process.  Early – because we need to have some idea of what we want to predict before the database can be fully assembled.  Often – because frequently it will be the case that no one metric performance will be ideal.

There is also the question of what “type” of thing should be predicted.  It can be a continuous variable, like how much of something.  Yards gained in football, batting average in baseball or points score in basketball would be examples.  It can also be categorical (e.g. is the player an all-star or not).

A Simple Example

So what to predict?  For now, we will focus on basketball with a few comments directed towards other sports.  We have options.  We can start with something simple like points or rebounds (note that these are continuous quantities – things like points that vary from zero to the high twenties rather than categories like whether a player is a starter or not).  We don’t think these are bad metrics but they do have limitations.  The standard complaint is that these single statistics are too one dimensional.  This is true (by definition, in this case) but there may be occasions when this is a useful analysis.

First, maybe the team seeks a one dimensional player.  The predicted quantity doesn’t need to be points.  Perhaps, there is a desperate need for rebounding or assists.  It’s a team game, and it is legitimate to try and fill a specialist role.  A single measure like points might also be useful because it could be correlated with other good “things” that are of interest to the team.

For a moment, let us assume that we select points per game as the measure to be predicted, and we predict this using all sorts of collegiate statistics (the question of the measures we should use to predict is for next time).   In the equation below, we write what might be the beginning of a forecasting equation.  In this expression, points scored during the rookie season (Points(R)) is to be predicted using points scored in college (Points(C)), collegiate strength of schedule (SOS), an interaction of points scored and strength of schedule (Points(C) X SOS) and potentially other factors.

Points(R)=β0P Points(C)+βSOS SOS+βPS Points(C)×SOS+⋯

The logic of this equation is that points scored rookie year is predictable from college points, level of competition and an adjustment for if the college points were scored against high level competition.  When we take this model to the data via a linear regression procedure we get numerical values for the beta terms.  This gives us a formula that we can use to “score” or predict the performance of a set of prospects.

The preceding is a “toy” specification in that a serious analysis would likely use a greatly expanded specification.  In the next part of our series we will focus on the right side of the equation.  What should be used as explanatory variables and what form these variables should take.

Some questions naturally arise from this discussion…

• What pro statistics are predictable based on college performance. Maybe scoring doesn’t translate but steals do?
• Is predicting rookie year scoring appropriate? Should we predict 3rd year scoring to get a better sense of what the player will eventually become?
• Should the model vary based on position? Are the variables that predict something like scoring or rebounding be the same for guards versus forwards?

Most of these questions are things that should be addressed by further analysis.  One thing that the non-statistically inclined tend not to get is that there is value in looking at multiple models.  It is seldom clear-cut what the model should look like, and it’s rare that one size fits all (same model for point guards and centers?).  And maybe models only work sometimes.  Maybe we can predict pro steals but not points.  One reason why the human experts need to become at least statistically literate is that if they aren’t, the results from that analytics guys either need to be overly simplified or the expert will tend to reject the analytics because the multitude of models is just too complex.

A simple metric like points (or rebounds, or steals, etc…) is inherently limited.  There are a variety of other statistics that could be predicted that better capture the all-round performance of a player or the player’s impact on the team.  But the basic modeling procedure is the same.  We use data on existing pros to estimate a statistical model that predicts the focal metric based on data available about college prospects.

Some other examples of continuous variables we might want to predict…

1. Player Efficiency

How about something that includes a whole spectrum of player statistics like John Hollinger’s Player Efficiency Rating (PER)?  PER involves a formula that weights points, steals, rebounds assists and other measures by fixed weights (not weights estimated from data as above).  For instance, points are multiplied by 1 while defensive rebounds are worth .3.

There are some issues with PER, such as the formula being structured that even low percentage shooters can increase their efficiency rates by taking more shots.  But the use of multiple types of statistics does provide a more holistic measurement.   In our project with the Dream we used a form of PER adapted to account for some of the data limitations.  In this project some questions were raised whether PER was an appropriate metric for the women’s game or if the weights should be different.

1. Plus/Minus

Plus/Minus rates are a currently popular metric.  Plus/Minus stats basically measure how a player’s team performs when he or she is on the court.  Plus/Minus is great because it captures the fact that teams play better or worse when a given player is on the court.  But Plus/Minus can also be argued against if substitution patterns are highly correlated.  In our project with the Dream Plus/Minus wasn’t considered simply because we did not have a source.

1. Minutes played

One metric that we like is simply minutes played.  While this may seem like a primitive metric, it has some nice properties.  The biggest plus is that it reflects the coach’s (a human expert) judgment.  Assuming that the human decision is influenced by production (points, rebounds, etc…) this metric is more of an intuition / analysis hybrid.  On the downside, minutes played are obviously a function of the other players on the team and injuries.

Categories of Success & Probability Models

As noted, the preceding discussion revolves around predicting numerical quantities.  There is also a tradition of placing players into broad categories.  A player that starts for a decade is probably viewed as a great draft pick while someone that doesn’t make a roster is a disaster.  Our goal with “categories” is to predict that probability that each outcome occurs.

This type of approach likely calls for a different class of models.  Rather than use linear regression we would use a probability model.  For example, there is something called an order logistic regression model that we can use to predict the probability of “ordered” career outcomes.  For example, we could predict the probabilities of a player becoming an all-star, a long-term starter, an occasional starter, career backup or a non-contributor with this type of model.  Again, we can make this prediction as a function of the player’s college performance and other available data.

Below we write an equation that captures this.

Pr(Category=j)=f(college stats,physical attributes,etc…)

This equation says that the probability that a player becomes some category “j” is some function of a bunch of observable traits.  We are going to skip the math but these types of models do require a bit “more” than linear regression models (specialized software mostly) and are more complicated to interpret.

A nice feature of probability models is that the predictions are useful for risk assessment.  For example, an ordered logistic model would provide probability estimates for the range of player categories.  A given prospect might have a 5% chance of becoming an all-star, a 60% of becoming a starter and 35% chance of being a career backup.  In contrast, the linear probability models described previously will only produce a “point” estimate.  Something along the lines of a given prospect is predicted to score 6.5 points per game or to grab 4 rebounds per game as a pro.

This is probably a good place to break.  There is much more to come.  Next time we will talk about predicting outliers and then spend some time on the explanatory variables (what we use to predict).  On a side note – this series is going to form the foundation for several sessions of our sports analytics course.  So, if there are any questions we would love to hear them (Tweet us @sportsmktprof).

Mike Lewis & Manish Tripathi, Emory University 2015.

Analytics vs Intuition in Decision-Making Part II: Too Much and Too Little Data

The use of analytics in sports personnel decisions such as drafting and free agency signings is a topic with obvious popular appeal. Sports personnel decisions are fundamentally about how people will perform in the future. These are also tough, complex high risk decisions that are the fodder for talk radio and second guessing from just about everyone.

So how can we make these decisions? As we noted in our last post, the choice between using analytics versus using the “gut” is probably a decision that doesn’t need to be made. Analytics and data should have a role. The question is how much emphasis should be placed on the “models” and how much on the intuition of the “experts.”

In this second installment of the series, we begin the process of going deeper into the mechanics and challenges involved in leveraging data and building models to support personnel decisions. As a backdrop for this discussion, we are going to tell the story of project we helped a group of Emory students complete for the WNBA’s Atlanta Dream. Going into detail about this story / process should illuminate a couple of things. First, there is logic to how these types of analyses can best be structured. Second, a careful and systematic discussion of a project may clarify both the weaknesses and strengths of “Moneyball” type approaches to decision making.

To begin, we want to thank the Dream. This was a great project that the students loved, and it gave us an opportunity to think about the challenges in modeling draft prospects in a whole new arena. An early step in any analytics project is the building of the data infrastructure. For the WNBA, this was a challenge. Storehouses of sports data come from all sorts of places but they often start out as projects driven more by fan passion than any formal effort from an established organization. Baseball is probably the gold standard for information with detailed data going back a century. In contrast, for women’s professional and college basketball the information is comparatively sparse. There’s not a lot and it doesn’t go back very far.

After some searching (with a lot of great assistance from the Dream) we were able to identify information sources for both professional and collegiate stats. As we started to assemble databases a few things became apparent:

• First, the data available was nowhere as detailed as what could be found for the men’s game. We were limited to season level stats at both the pro and college level. Furthermore, all we had were the basics – the data in box scores. This is good information, but it does leave the analyst wanting more.
• Second, the data fields on professional performance were not identical to the data on collegiate performance. For example, the pro level data breaks rebounds down into offensive and defensive boards. Maybe this is a big deal and maybe not. It does make it difficult to use established metrics that place different value on the two types of rebounds.
• Third, there was a LOT of missing data, and multiple types of missing data. In terms of player statistics, information on turnovers was at best scarce. Again, this makes it difficult to use established metrics like PER. The other thing that was missing is players themselves. We never were able to create a repository of data on international players that didn’t participate in NCAA basketball. As a side note, even if we had found international data it would be hard to interpret. How would we judge the importance of a rebound in Europe versus a rebound in South America? This isn’t just a problem for women’s basketball as this is also an issue in any global sport.

There were also a lot of things that we would have liked to have had. Some of this may have been available, and maybe we did not look hard enough. But we always need to ask the question of the incremental value versus the required effort. For example, information on players’ physical traits was very limited. We could obtain height but even basics like weight were difficult to find. And as far as we know – there is no equivalent to the NFL combine.

While these might seem like severe limitations, we think it’s really just par for the course in this type of research. Especially in the first go around! In analytics, you often work with what you have and you try to be clever in order to get the most from the data. We will get to how to approach this type of problem soon. But even with the limitations, we actually have a LOT of data. At the college level we have 4 years of data on games, played, field goals made, field goals attempted, rebounds, steals, 3 pointers, etc… If we have 15 data fields for 4 years we have 60 statistics per player. Add in data on height, strength of schedule and assorted miscellaneous fields and we have maybe 70 pieces of data per player. And maybe we want to do things like combine pieces of information; things like multiplying points per game by strength of schedule to get a measure that accounts for the greater difficulty of scoring in the ACC versus a lower tier conference. So maybe we end up with a 100 variables we want to investigate.

Why are we discussing how many field we have per prospect? Because it brings us to our next problem – the relatively small number of observations in most sports contexts. Remember the basic game in this analysis is to understand “what” predicts a successful pro career. This means that we need observations on successful and less successful pro careers.

The WNBA consists of twelve teams with rosters of twelve players. This means if we go back and collect a few years of data we are looking at just a couple hundred players with meaningful professional careers. While this may seem like a sizeable amount of data, to the data scientist this is almost nothing. Our starting point is trying to relate professional career performance to college data, which in this case means maybe two hundred pro careers to be explained by potentially about a hundred explanatory variables.

It really is a weird starting point. We have serious limitations on the explanatory data available, but we also wish the ratio of observations (players) to explanatory data fields was higher. In our next installment, we will start to talk about what we are trying to predict (measures of pro career success). Following that, we will talk about how to best use our collection of explanatory variables (college stats).

Mike Lewis & Manish Tripathi, Emory University 2015.

Analytics vs Intuition in Decision-Making

Charles Barkley“I’m not worried about Daryl Morey. He’s one of those idiots who believe in analytics.”

Whenever the Houston Rockets do anything good (make the Western Conference Finals) or bad (lose the Western Conference Finals) it’s a sure thing that the preceding Charles Barkley quote about Daryl Morey will be dusted off.  We teach a couple of courses focused on the use of analytics, so these occasions always feel like what a more traditional academic would refer to as a teachable moment.  For us, it’s an occasion to rant on a favorite topic.  The value of data and analytics to business problems is something we think a lot about.  When the business is sports, then this becomes a topic of wide ranging interest.  Before we get into this, one thing to note is that this isn’t going to be a blanket defense of the goodness of analytics.  Sir Charles has a point.

Of course, the reality is that there is probably less distance between the perspectives of Mr. Barkley and Mr. Morey than either party realizes.  The key to the quote and the likelihood that there is a misunderstanding is in the word “believes.”  Belief is a staple of religion, so the quote implies that Daryl Morley is unthinking and just guided by whatever data or statistical analysis is available.  From the other direction, the simplistic interpretation is that Charles Barkley sees no value in data or analysis, and believes that all decisions should be made based on “gut feel.”  These are obviously smart guys so these characterizations undoubtedly don’t reflect reality.

However, the Barkley quote and the notion that decisions are either driven by data analysis or by intuition and gut is a useful starting point for talking about analytics in sports (and other businesses).  As the NBA draft approaches, we are going to discuss some key point related to using analytics to support player decisions.

As a starting point for this series we wanted to discuss the proper use of “analytics” and “intuition” in some general terms.  In regards to analytics, one thing that we have learned from time in the classroom is that statistical analysis and big data are mysterious things to most folks.  The vast majority of the world just isn’t comfortable with building and interpreting statistical models.  And the percentage of people that both really understand statistical models (strengths and limitations) and who also truly understand the underlying domain (be it marketing or sports) is even rarer.

One key truism about statistical models is that they are always incomplete and incorrect.  For example, let’s say that we want to predict college prospects’ success in the NBA.  What this typically boils down to is creating a mathematical equation that relates performance at the college level, physical traits and other factors (personality tests?) to NBA performance.  (For now we will neglect the potential difficulties involved in figuring out the right measure of NBA success, but this is potentially a huge issue.)

In some ways, the analytics game is simple.  We want to relate “information” to pro performance.  Potentially teams can track data on many statistics going back to high school.  These stats may be at the season, game or even play-by-play level.  The challenging part is determining what information to use and what form the data should take.  Assuming we can create the right type of statistical model, we can then identify college players with the right measurable.  On a side note, this is what marketers do all the time – figure out the variables that are correlated with future buying, and then target the best prospects.

Computers are great at this kind of analysis.  Given the necessary data, a computer with the right software will tell us the exact relationship between two pieces of data.  For example, maybe college steal stats are very predictive of professional steal stats, but maybe rebounding in not.  An appropriate statistical analysis will quantify how these relationships work on average.  The computer will give us the facts without bias.  It will also incorporate all the data we give it.

This is what computers, stats, and data are good at.  Summarizing relationships without bias.  But analytics also has its pitfalls.  We will deal with these in detail in later posts, but the big problem is the relative “incompleteness” of models.  Statistical models, and any fancy stat, are by definition limited to what is used in their creation.  While results vary, when predicting individual level results such as player performance statistical models ALWAYS leave a lot unexplained.

And this is where the human element comes in.  Human beings are great at combining multiple factors to determine overall judgments.  Charles Barkley has been watching basketball for decades.  His evaluations likely include his sense of the athlete’s past performances, the athlete’s physical capabilities and the player’s mental approach to the game.  Without much conscious thought an expert like Barkley is condensing a massive amount of diverse information into a summary judgment.  Barkley may automatically incorporate judgments about factors ranging from player work ethic, level of competition, past coaching, obscure physical traits, observations about skills not captured in box scores and myriad other factors along with observable data like points scored into his evaluations.  It’s an overused academic word, but experts like Barkley are great a making holistic judgments.

But experts are people, which means that they are the product of their experiences and prone to biases.  Perhaps Charles Barkley underestimates the value of height or wing-span because he never had the dimensions of a classic power forward, or, maybe not.  It could also be that maybe he overestimates the importance of height and wing span based on some overcompensation.  The point is that he may not get the importance of any given trait exactly right.

To some extent we have two systems for making decisions; Computers that crunch numerical data and people that make heuristic judgments.  Both systems have good traits and both have flaws.  Computers are fast, can process lots of data and unbiased. But they are limited by the design of the models and the conclusions are always incomplete or limited.  Experts can come up with complex and complete evaluations but there is always the issue of bias.

What this whole discussion boils down to is an issue of balance.  In one-off decisions like selecting a player or signing a free agent analytics should not be the complete driver of the decision.  These are evaluations of relatively small sets of players and it’s hard, for a variety of reasons, to create good statistical models.  Since we are usually looking for a complex overall judgment the holistic expert judgments are probably the best way to go.  More generally, in this type of decision making – think about tasks like hiring an executive – analytics should play a supporting role.  But it should play a role.  Neglecting information, especially unbiased information can only be a suboptimal approach.  The trick is that the expert fully understands the analytics and can use the analytics based information to improve decision making.

In the lead up to this year’s NBA draft, we are going to discuss some issues related to player analytics.  As part of this we are going to tell the story of a project focused on draft analytics that we recently partnered on with the Atlanta Dream and members of the Emory women’s basketball team.  We think it’s an interesting story and it provides an opportunity to discuss several data analysis principles relevant to player selection in more detail.  Stay tuned!

Mike Lewis & Manish Tripathi, Emory University, 2015.