Fanalytics Podcast: Sales Force Analytics

In this podcast episode, I sit down with Jon Adler, Director of New Membership and Ticket Sales for the Atlanta Hawks. In the first half of the episode, Jon and I talk about the mechanics of selecting and managing a team of entry-level sales professionals. The conversation focuses of using incentives to both motivate employees and to teach effective sales tactics. We also talk a little about applying “Money Ball” techniques to sales force management. This is an important point because the same basic techniques that can be used to predict the performance of a point guard can be used to select a salesperson.

In the second half of the episode, I take a deeper dive into how different techniques and theories can help sales managers. Salesforce management has some real challenges related to forecasting performance and using dynamic incentive schemes to motivate performance. Recognizing some of the underlying complexity can be helpful because it provides a guide to decomposing managerial problems and identifying the best analytic approaches.

Click on the logo below to listen to the episode.

You can find the episode on iTunes, Spotify, SoundCloud, TuneIn, Stitcher, and Google Play Music. Please rate, review, and subscribe!

Fanalytics Podcast: Ezekiel Who? Analytics & the Collective Bargaining Agreement

 

In this episode, Professor Lewis discusses why the Ezekiel Elliott holdout is the most important off-season NFL story. It’s a story about how the collective bargaining agreement’s rules for rookie contracts comes into conflicts with analytics. The conflict occurs because for some positions, like running back, NFL rookie contract rules allow teams to avoid paying market rates for the majority or entirety of players’ careers. The episode talks about how last year’s Todd Gurley and Le’veon Bell deals have gotten us to the point where players may be increasingly willing to hold-out and teams may be less likely to invest in running backs.  

Click on the logo to listen to the full podcast episode.

Fanalytics Podcast: Sports Analytics – Getting Your Foot in the Door

Houston Dynamo Data Analyst and Emory alumni Sean Steffen joins Mike Lewis on this Fanalytics podcast episode where they discuss how to get into the field of sports analytics. Getting your foot in the door can be quite competitive. Sean shares his “non-traditional” journey.

Some background on Sean: In college, he majored in creative writing. From there, he started writing for American Soccer Analysis, a blog that focuses on Major League Soccer. The key to Sean’s success was that he backed up his writing with data and analytics.

The conversation gets a little deep in the weeds and even includes a discussion of the competing programming languages – R and Python. For prospective analysts, Sean recommends learning excel and linear regression.  Mike says SQL, R, and linear regression are good starting points to analyze data.

They also talk modern soccer analytics such as the logic and mechanics behind expected goals.

You can reach Sean on Twitter @SeanSteffen or search him on LinkedIn.

Click on the logo below to listen to the podcast episode.

 

Fanalytics Podcast: Three-Point Field Goal

This week, Professor Mike Lewis and Emory student Alex Notis examine the three-point field goal (also 3-pointer) in the NBA.

The modern NBA has been transformed by the three-point shot.  Points are up, turnovers are down and NBA rosters are now built to shoot the three.

Some key facts…

When the three-point line was introduced in 1986 only 3% of shots were three-point attempts.

This season, 36% of shots were three pointers.

In this episode, we talk about Alex’s project which looks into trends and outcomes related to the three-point shot.

In the second half of the episode, Professor Lewis takes a step back and talks about the concept of expected value.  Expected value is a key concept in sports analytics. In decisions ranging from taking a three-point shot in the NBA, pulling the goalie in hockey, going for 2 in the NFL, or bunting to move a runner to second in MLB, expected value calculations are the key.

Click logo below to listen to this Fanlaytics episode.

Major League Baseball Fandom Report 2019: The “Best” Fans in Baseball

Major League Baseball seems to be perennially in crisis in terms of its relationship with its fan base.  Free agency, strikes, steroids, competitive imbalances, short attention spans of millennials, a lack of stars, an aging fan base, and other factors have been cited to explain why baseball has either lost or is in danger of losing its position as the national pastime.

On the other hand, the league keeps setting revenue records. This article from Forbes reports that baseball has set revenue records for 16 straight years.

When thinking about baseball and its fans, it is safest to say it is a mixed bag of positives and warning signs.  Record revenues show that baseball has been able to develop innovative revenue streams and attract high value sponsors. But there does seem to be trouble on the horizon in terms of the next generation of fans.

In terms of demographics, MLB has one of the oldest fan bases (up there with golf). It is also largely viewed as the most family oriented sport. This is an interesting pattern. An aging fan base is a concern for the future if fans are aging out of attending games.  Of course, an aging fan base is also potentially an increasingly wealthy fan base.  The family orientation is an enormous positive. Sports fandom is largely transmitted through the family and the nature of the game helps bring the next generation into the fold (summer schedule, 81 home games, relatively cheap tickets, etc…).

There is also the issue of “tanking.” Tanking has been most frequently mentioned in the context of the NBA but it’s also a concern for baseball. Losing 100 games has long been considered epic futility. In 2018, the Orioles lost 115 games, the Royals lost 104, and the White Sox lost 100. The Marlins and Tigers lost 98.  Tanking is a fan issue because it speaks to the quality of the product that fans (in certain markets) are asked to buy.

Tanking brings us to the issue of the Collective Bargaining Agreement. While the Collective Bargaining Agreement is usually discussed in terms of the labor relationship between owners and players, the CBA is a critical issue for fandom because this agreement essentially defines how the league operates. For example, the CBA largely defines the rules related to revenue sharing, luxury taxes, and salary caps. These rules directly impact fans by creating the structures that influence player movements and competitive balance.

Baseball is notable for having relatively little revenue sharing and no salary cap. Controlling the distribution of spending by teams matters because there is a significant correlation between spending and winning in baseball.   The Red Sox won the World Series and also led the league with a payroll of about $240 million. At the bottom, The White Sox had a payroll of around $80 million. Remember, the White Sox lost 100 games.

The CBA matters because teams will find strategies that work for their circumstances.  There is some speculation that small market teams like the Royals are using business models that involve developing low cost homegrown talent, trying to win for a few years and then dumping payroll to pursue draft picks. The Royals reduced payroll from $185 million in 2017 to $135 million in 2018. The Royals lost 104 games in 2018 after making the playoffs in 2014 and 2015. In 2018 the Dodgers had the 4th highest payroll at about $195 million.  But that payroll was $59 million less than the previous season’s amount.  This decline allowed the team to drop below the luxury tax threshold. Are these strategies designed to maximize fan enjoyment?

The run-up to the 2019 MLB season has also included a “glacially” slow free agent market.  Eventually, the big name stars signed with teams outside of the major markets. Manny Machado signed with the Padres for $300 million for ten years and Bryce Harper signed with the Phillies for $330 million over 13 years. “Stars” matter to fans. Fans like winners but they also like stars.  While the NBA is and has been for a long time all about stars – Larry, Magic, Michael, Kobe, LeBron, Steph…. – MLB doesn’t seem to produce household names anymore. This article states that ESPN’s annual ranking of the most famous athletes includes 13 basketball players, 2 table tennis stars and no baseball players.  This lack of “media” stars matters.  Maybe not in the short-term where winning mostly drives attendance but likely in the long-term. When I have looked at the factors that build brand equity in sports, two items really jump out. Winning championships and having a history of Hall of Famers and All Stars.

 

The Best Baseball Brands

My last statement about how brands are built is based on logic and by running numbers on fandom in MLB and other sports leagues. As we enter the 2019 season, it’s time for my annual data based look at MLB fandom across the MLB brands. This analysis starts from questions like “Who has the best fans in Major League Baseball?” and “What are the best brands in MLB?”

These are simple questions without simple answers.  What makes for a great fan or brand?  Fans that show up even when the team is losing?  Fans that are willing to pay the most?  Fans that are willing to follow a team on the road or social media?  Even after we agree on the question(s), answering it is also a challenge.  How do we adjust for the fact that one team might have gone on a miraculous run that filled the stadium?  Or perhaps another team suffered a slew of injuries?  How do we compare fan behavior in a market like New York with fans in a place like Milwaukee?  What if a team just opened a new stadium?  Did the fans stream in to see the building or to see the team?

For the past few years, I have been studying fandom across professional and college sports.  My approach to evaluating fan bases is to use data to develop statistical models of fan interest (more details here).  The key is that these models are used to determine which cities fans are more willing to spend or follow their teams after controlling for factors like market size and short-term variations in performance.

The “Overall” rankings are based on three sub-rankings – Fan Equity, Social Equity and Road Equity.  Fan Equity is a revenue premium based metric that compares the team’s box office results with league standards.  In other words, Fan Equity assesses how much fans are willing to “attend and spend” relative to fans across the league.  The KEY idea is that we measure this while controlling for team success and market characteristics like incomes and populations.

  • Fan Equity is a great metric for assessing the CURRENT level of passion or engagement in a local fan base.

Social Equity is focused on the team’s social media followings (Facebook and Twitter).  Again, the rankings are based on how a team’s social media results compare across the league after controlling for team success.

  • The Social Equity metric provides insight into the team’s POTENTIAL fan passion.

The third metric is Road Equity.  This metric is based on a statistical model that looks at how teams draw incremental fans when on the road.  The KEY idea is that draw outside of the home market reveals something about a club’s national appeal.

  • Road Equity provides a metric of passion beyond the local market. This passion can be positive (love the Cubs) or negative (hate the Yankees).

I could go on.  In the past I have developed additional metrics related to win sensitivity or price sensitivity.  Willingness to attend even when the team loses probably says something about loyalty.  Fans that don’t watch a loser might be termed bandwagon fans.  Willingness to pay is a great marketing metric.  Willingness to pay to see a team that isn’t winning is another great indication of loyalty.  These metrics are available upon request (mike [dot] lewis [at] emory [dot] edu – FYI, I don’t look at the comments) but I want to keep this article brief.

So, we have three metrics with different pluses and minuses.  In the quest to find an overall winner, I use a weighted average of the three metrics (more weight on the Fan Equity metric).  This may not be the right weighting but it’s usually a good idea to emphasize how customers actually spend.

 

The Winners

Overall, the group of clubs that comprise the Top 6 contains little in the way of surprises.  The Red Sox rank number one and are followed by the Yankees, Giants, Dodgers, Cubs and Cardinals.  The Red Sox are perennially strong and finished first last year.  They also won the world series.  Boston is probably the best sports town in America.

In general, the clubs at the top of the list share these same traits.  They are all able to motivate fans to attend and spend as they all possess great attendance numbers and relatively high prices.  More to the point, these teams are even able to draw well and command price premiums when they are not winning.  Historically, the Cubs are the best example of this.

The list of winners probably raises an issue of “large” market bias.  However, keep in mind that the methodology is designed to control for home market effects.  The method is explicitly designed to control for differences in market demographics (and team performance).  While the “winners” tend to come from the bigger and more lucrative markets, other major market teams do not fair particularly well (White Sox, A’s).  There is also a more subtle point.  The large market teams likely have the best fan bases because they often have significant histories of success and are often featured in the media.

The topic of how these brands are built over time is another one of my favorite things to talk about.  I think it’s mostly two (highly correlated) things – championships and stars. Building brand equity is a fascinating sports topic and I think it’s a difficult one for teams (in small markets) to manage.  Will the current popular strategy of cycles of tanking and competing yield enough winning and “temporary” star to build brands?

 

The Bottom

The bottom of the list features the Marlins, White Sox, Indians, Athletics and Rays.  It is interesting that the bottom also includes teams from major markets such as the Bay Area, Chicago and Miami. The markets with two teams seem to yield dramatically different results within each market. I think this reveals something fundamental about fandom.  Fan bases are communities and many fans want to be a part of the most popular group. It is a simple theory but the end result is that the second team in a market will struggle to compete. Many fans are drawn to the bigger and more dominant community – Yankees, Cubs, Giants or Dodgers rather than the Mets, White Sox, A’s or Angels.

The case of the Marlins reveals another common problem for franchises. The Marlins finish is a reflection of how the team struggles on multiple dimensions. Attendance is often in the bottom 5 of the league despite being located in a major metro area.  Pricing is also below average for MLB.  Why do the Marlins struggle?  Lots of reasons.  Florida weather, a short history (fandom is often generational), a history of small payrolls and bad teams, and Miami being a transient city.

The Indians is an interesting case as well.  Cleveland is a passionate sports town.  But when you look at the numbers there is not a lot of support. An open question is how much of the problem is the Indian’s branding? The Indians have made moves to shift from the Native American imagery but have retained the team name.  I suspect that half measures might be the worst approach.

 

The Movers

In terms of year over year comparisons, there is a good amount of stability on the list.  This is a good sign since sports brands should evolve slowly.  Some notable movers on the list were the Blue Jays and Phillies moving up and the Diamondbacks and Indians dropping down. The Blue Jays illustrate an important feature of the model. When I calculate the brand “premiums” I use the most recent three seasons. This is intended to provide stable but evolving measures of brand equity. In the case of the Blue Jays, the improvement in ranking was mostly driven by attendance growth in 2016 and 2017. In the case of the Phillies the improvement was about growing attendance coupled with relatively high prices.

 

The 2019 Complete List

Click the logo below to listen to this Fanalytics podcast episode:

Player Analytics Fundamentals: Part 5 – Modeling 102

In Part 4 of the series we started talking about what should be in the analyst’s tool kit.  I advocated for linear regression to be the primary tool.  Linear regression is (relatively) easy to implement and produces equations that are (relatively) easy to understand.  I also made the point that linear regression is best suited for predicting continuous measures and used the example of predicting the number of touchdown passes thrown by a rookie QB.

But not everything we want to predict is going to be a continuous variable.  Since we are talking about predicting quarterback performance, maybe we prefer a metric that is more discrete such as whether a player becomes a starter.  Can we still use linear regression?  Maybe.

Let’s return to the example from last time.  The task was to predict professional (rookie year) success based on college level data.  We assumed that general managers can obtain data on the number of games won as a college player, whether the player graduated (or will graduate) and the player’s height.

Our initial measure of pro success was touchdown passes.  We then specified a regression model using the following equation.

But let’s say that we don’t like the TD passes metric.  Maybe we don’t like it because we think TD passes are more related to wide receiver talent than to the quality of the QB.  Rather than use TDs as our dependent variable we want to use whether a player becomes a starter.  This is also an interesting metric as it captures whether the player was selected by a coaching staff to be the primary quarterback.  This is a nice feature as the metric includes some measure of human expertise.  I’ll leave criticism to the readers as an exercise.

This leads us to the following equation:

One issue we have to address before we estimate this model is how we define the term starter.  In a statistical model we need to convert the word or category of “starter” into a number.  In this case, the easy solution is to treat players that became starters as 1’s and players that did not as 0’s.  As a second exercise – what would we do if we had three categories (did not play, reserve, starter)?

Let’s pretend we estimated the preceding model and obtained the following equation:

We can use the equation to “score” or “rate” our imaginary prospects from last time (Lewis Michaels and Manny Trips).  In terms of the input data, Lewis won 40 college games, graduated and is 5′ 10”.  Plugging Michael’s data into the equation gives us a score of .22.  The analysis that we have performed is commonly termed a linear probability model.  A simple interpretation of this result is that the expected probability of Michaels (or better said a prospect with Michaels statistics) becoming a starter is 22%.

So far so good.

Our second prospect is Manny Trips out of Stanford.  Manny won 10 games, failed to graduate and is 6’ tall.  For Manny the prediction would be -12.8%.  This is the big problem with using linear regression to predict binary (Yes/No) outcomes.  How do we interpret a negative probability?  Or a probability that is greater than 1?

So what do we do next?  I think we have two options.  We can ignore the problem.  If the goal is just to rank prospects then maybe we don’t care very much.  In this case, we just care about the relative scores not the actual prediction.  If we are just using analytics to screen QB prospects or to provide another data point then maybe our model is good enough.  The level of investment in a modeling project should be based on how the model is going to be used.  In many or most sports applications I would lean to simpler less complicated models.

Our second option is to move to a more complicated model.  There are a host of models available for categorical data.  We can use a binary logit or Probit model for the case of a binary system as above.  If the categories have a natural ordering to them (never played, reserve, starter) then we can use an ordered logit.  If there is no order to the categories, then we can use a multinomial logit.  I’m still debating on how much attention I should pay to these models.  Having a tool to deal with categorical variables can be invaluable but there is a cost.  The mathematics become more complex, estimation of the model requires specialized software and interpretation of the model becomes less intuitive.

I think I will discuss the binary logit next time.

Player Analytics Fundamentals: Part 4 – Statistical Models

Today’s post introduces the topic of statistical modeling.  This is, maybe, the trickiest part of the series to write.  The problem is that mastering the technical side of statistical analysis usually takes years of education.  And, more critically, developing the wisdom and intuition to use statistical tools effectively and creatively takes years of practice.  The goal of this segment is to point people in the right direction, more than to provide detailed instruction.  That said – I can adjust if there is a call for more technical material.  (If you want to start from the beginning parts 1, 2 and 3 are a click away.)

Let’s start with a simple point.  The primary tool for every analytics professional (sports or otherwise) should be linear regression.  Linear regression allows the analyst to quantify the relationship between some focal variable of interest (dependent measure or DV) and a set of variables that we think drive that variable (independent variables).  In other words, regression is a tool that can produce an equation that shows how some inputs produce an outcome of interest.  In the case of player analytics, this might be a prediction of future performance based on a player’s past statistics or physical attributes.

To make this more concrete, let’s say we want to do an analysis of rookie quarterback performance (we’ve been talking a bit about QB metrics so far in the series).  Selecting QBs involves significant uncertainty.  The transition from the college game to the pro game requires the QB to be able to deal with more complex offensive systems, more sophisticated defenses and more talented opposing players.  The task of the general manager is to identify prospects that can successfully make the transition.

Data and statistical analysis can potentially play a part in this type of decision.  The starting point would be the idea that observable data on college prospects can help predict rookie year performance.  As a starting point let’s assume that general managers can obtain data on the number of games won as a college player, whether the player graduated (or will graduate) and the player’s height.  (We just might be foreshadowing a famous set of rules for drafting quarterbacks).

The other key decision for a statistical analysis of rookie QB performance versus college career and physical data is a performance metric.  We could use the NFL passer rating formula that we have been discussing.  Or we could use something else.  For example, maybe the number of TD passes thrown as a rookie.  This metric is interesting as it captures something about playing time and ability to create scores.

Touchdowns are  also a metric that “fits” linear regression.  Linear regression is best suited to the analysis of quantitative variables that vary continuously.  The number of touchdowns we observe in data will range from zero to whatever the is the rookie TD record.  In contrast, other metrics such as whether the player becomes a starter or a pro bowler are categorical variables.  There are other techniques that are better for analyzing categorical variables.  (if you are a stats jockey and are objecting to the last couple of statements please see the note below).

The purpose of regression analysis is to create an equation of the following form:

This equation says that TD passes are a function of college wins, graduation and height.  The βs are the weights that are determined by the linear regression analysis.  Specifically, linear regression determines the βs that best fits the data.  This is the important point.  The weights or βs are determined from the data.  To illustrate how the equation works lets imagine that we estimated the regression model and obtained the following equation.

This equation says that we can predict rookie TD passes by plugging in each player’s data related to college wins, graduation and height.  It also says that a history of winning is positively related to TDs and graduation also is a positive.  The coefficient for height is zero.  This indicates that height is not a predictor of rookie TDs (I’m making these number up – height probably matters).  One benefit of developing a model is that we let the data speak.  Our “expert” judgment might be that height matters for quarterbacks.  The regression results can help identify decision biases if the coefficients don’t match the experts predictions.  I am neglecting the issue of significance for now – just to keep the focus on intuition.

Let’s say we have two prospects.  Lewis Michaels out of the University of Illinois who won 40 college games (hypothetical and unrealistic), graduated (in engineering) and is 5’10” (a Flutiesque prospect).  Our second prospect is Manny Trips out of Duke.  Manny won 10 games, failed to graduate and is 6’ tall.  Michaels would seem to be the better prospect based on the available data.  The statistical model allows us to predict how much better.

We make our predictions by simply plugging our player level data into the equation.  We would predict Lewis would throw 10 TDs in his rookie year (1+.1*40+5*1+0*70).  For Manny the prediction would be 2 TDs.  For now, I am just making up the coefficients (βs).  In a later entry I will estimate the model using some data on actual NFL rookie QB performance.

Regression has its shortcomings and many analysts love to object to regression analyses.  But for the most part, linear regression is a solid tool for analyzing patterns in data.  It’s also relatively easy to implement.  We can run regressions in Excel!  We shouldn’t underestimate how important it is to be able to do our analyses in standard tools like Excel.

I will extend our tool kit in a future entry.  I briefly mentioned categorical variables such as whether or not a player is a starter.  For these types of Yes/No (starter or not a starter) there is a tool called logistic regression that should be in our repertoire.

*One reason this note is tricky is that I’m trying to get the right balance and tone.  I can already hear the objections.  Lets save these for now.  For example, readers do not need to alert me to the fact that TDs are censored at zero.  Or that there is a mass point at zero because many rookies don’t play.  Or that TDs are counted in discrete units so maybe a Poisson model is more appropriate.  You get the idea.  There are many ways to object to any statistical model.  The real question isn’t whether a model is perfect.  The real question should be whether the model provides value.

Moving towards Modeling & Lessons from Other Arenas: Sports Analytics Series Part 5

The material in this series is derived from a combination of my experiences in sports applications and my experiences in customer analysis and database marketing.  In many respects, the development of an analytics function is similar across categories and contexts.  For instance, a key issue in any analytics function is the designing and creation of an appropriate data structure.  Creating or acquiring the right kinds of analytics capabilities (statistical skills) is also a common need across industries.

A need to understand managerial decision making styles is also common across categories.  It’s necessary to understand both the level of interest in using analytics and also the “technical level” of the decision makers.  Less experienced data scientists and statistician have a tendency to use too complicated of methods.  This can be a killer.  If the models are too complex they won’t be understood and then they won’t be used.  Linear regression with perhaps a few extensions (fixed effects, linear probability models) are usually the way to go.    Because sports organizations have less history in terms of using analytics the issue of balancing complexity can be especially challenging.

A key distinction between many sports and marketing applications is the number of variables versus the number of observations.  This is an important point of distinction between sports and non-sports industries and it is also an important issue for when we shift to discussing modeling in a couple of weeks.  When I use the term variables I am referencing individual elements of data.  For example, an element of data could be many different things such as a player’s weight or the number of shots taken or the minutes played.  We might also break variables into the categories of dependent variables (things to explain) versus independent variables (things to explain with).  When I use the term observations I am talking about “units of analysis” like players or games.

In many (most) business contexts we have many observations.  A large company may have millions of customer accounts.  There may, however, be relatively few explanatory variables.  The firm may have only transaction history variables and limited demographics.  Even in sports marketing a team interested in modeling season ticket retention may only have information such as the number of tickets previously purchased, prices paid and a few other data points.  In this same example the team may have tens of thousands of season ticket holders.  If we think of this “information” as a database we would have a row for every customer account (several thousand rows) and perhaps ten or twenty columns of variables related to each customer (past purchases and marketing activities).

One trend is that the number of explanatory variables is expanding in just about every category. In marketing applications we have much more purchase detail and often expanded demographics and psychographics.  However, the ratio of observations to columns usually still favors the observations.

In sports we (increasingly) face a very different data environment.  Especially, in player selection tasks like drafting or free agent signings.  The issue in player selection applications is that there are relatively few player level observations.  In particular, when we drill down into specific positions we often find ourselves having only tens or hundreds or player histories (depending on far back we want to go with the data).  In contrast, we may have an enormous number of variables per player.

We have historically had many different types of “box score” type stats but now we have entered into the era of player tracking and biometrics.  Now we can generate player stats related to second-by-second movement or even detailed physiological data.  In sports ranging from MMA to soccer to basketball the amount of variables has exploded.

A big question as we move forward into more modeling oriented topics is how do we deal with this situation?

A Short Course on Sports Analytics – Part 1

  1. Sports Analytics in Organizations

This fall the plan is to do something a little different with the blog.  Rather than data driven analyses of sports marketing topics, I want to spend some time talking about using analytics to support player and in-game decision making.  The “Moneyball” side of the sports analytics space.

The focus will mainly be at the level of the organization rather than at the level of specific research questions.  In other words, we will talk about providing effective analytics support within an organization, rather than presenting a series of analyses.  My hope is that this evolves to being something of a web based course on using analytics to drive decisions in sports.

I’ve spent a lot of time over the past few decades working on analytics projects (across multiple industries) and I’ve developed opinions about what firms do right and where mistakes are made.  Over the last few years, I’ve thought a lot about how analytics can be used by sports organizations.  Specifically, about how lessons from other industries can be applied, and instances where sports are just different.

The history of statistical analysis in sports goes way back, but obviously exploded with the publication of Moneyball.  A huge number of sports fans would love to be a General Manager but few people have the athletic ability to gain entry as a former player.  Using statistics to find ways to win is (maybe?) a more accessible route.

But this route is not without its complications.  Using stats and data to win games is an intriguing and challenging intellectual task.  What data should be collected?  How should the data be analyzed?  How should the analysis be included in the decision making structure?  These are all challenging questions that go beyond what a fan with some data can accomplish.

What I’m going to do in this series is talk about how to approach analytics from both a conceptual level and an operational level.  Conceptually, I will cover how humans make decisions in organizations.  At the operational level, we will discuss what types of analyses should be pursued.

What I won’t do in this series is talk about specific models.  At least not very much.  I may drop in a couple of analyses.  This limitation is done with purpose.  It’s my feeling that the sports analytics space is overly littered with too many isolated projects and analyses.  The goal here is to provide a structure for building an analytics function and some general guidance on how to approach several broad classes of analyses.

What will this series include?  Some of the content will be based on whatever becomes top of mind or based on the response I get from readers.  But some things will definitely appear.  There will be material about how analytics can best compliment a human decision maker.  I will also talk about how lessons from other industries can be helpful in the sports context.  There are more similarities than differences between sports and “standard” businesses.  But there are some important differences.

We will also talk about models and statistical analysis.  But this will be done in broad terms.  What I mean is that we will discuss classes of analysis rather than specific studies.  For example, we will discuss player selection analyses but the emphasis will be on how to approach the problem rather than the creation of a particular forecasting model.  There are a variety of ways to analyze players.  We can use simple models like linear regression or more complex models that yield probabilities.  We can also forgo the stats and use raw data to look for player comparisons.  We will discuss the implementation challenges and benefits of each approach.

This series is a work in progress.  I have a number of entries planned but I’m very open to questions.  Shoot me an email and I’ll be happy to respond in future entries or privately (time permitting).

Next: Understanding the Organization