NBA Foul Calls and Bayesian Item Response Theory
An improved version of this analysis is available here.
(Author’s note: many thanks to Robert ([@atlhawksfanatic](https://twitter.com/atlhawksfanatic) on Twitter) for pointing out some subtleties in the data set that I had missed. This post has been revised in line with his feedback. Robert has a very interesting post about how last two minute refereeing has changed over the last three years; I highly recommend you read it.)
I recently found a very interesting data set derived from the NBA’s Last Two Minute Report by Russell Goldenberg of The Pudding. Since 2015, the NBA has released a report reviewing every call and noncall in the final two minutes of every NBA game where the teams were separated by five points or less with two minutes remaining. This data set has extracted each play from the NBAdistributed PDF and augmented it with information from Basketball Reference to produce a convenient CSV. The Pudding has published two very interesting visual essays using this data that you should definitely explore.
The NBA is certainly marketed as a starcentric league, so this data set presents a fantastic opportunity to understand the extent to which the players involved in a decision impact whether or not a foul is called. We will also explore other factors related to foul calls.
%matplotlib inline
import datetime
from warnings import filterwarnings
from matplotlib import pyplot as plt
from matplotlib.ticker import FuncFormatter
import numpy as np
import pandas as pd
import pymc3 as pm
from scipy.special import expit
import seaborn as sns
= sns.color_palette()
blue, green, red, purple, gold, teal
= FuncFormatter(lambda value, _: '${:.1f}M'.format(value / 1e6))
million_dollars_formatter = FuncFormatter(lambda prop, _: "{:.1%}".format(prop)) pct_formatter
'ignore', 'findfont') filterwarnings(
Loading and preprocessing the data
We begin by loading the data set from GitHub. For reproducibility, we load the data from the most recent commit as of the time this post was published.
= 'https://raw.githubusercontent.com/polygraphcool/lasttwominutereport/1b89b71df060add5538b70d391d7ad82a4c24db2/output/all_games.csv'
DATA_URI
= (pd.read_csv(DATA_URI,
raw_df =['committing_player', 'disadvantaged_player',
usecols'committing_team', 'disadvantaged_team',
'seconds_left', 'review_decision', 'date'],
=['date'])
parse_dateslambda df: df.date >= datetime.datetime(2016, 10, 25))
.where(=['date'])
.dropna(subset'date', axis=1))
.drop('review_decision'] = raw_df.review_decision.fillna("INC")
raw_df[= (raw_df.dropna()
raw_df =True)) .reset_index(drop
We restrict our attention to decisions from the 20162017 NBA season, for which salary information is readily available from Basketball Reference.
raw_df.head()
seconds_left  committing_player  disadvantaged_player  review_decision  disadvantaged_team  committing_team  

0  102.0  AlFarouq Aminu  George Hill  CNC  UTA  POR 
1  98.0  Boris Diaw  Damian Lillard  CC  POR  UTA 
2  64.0  Ed Davis  George Hill  CNC  UTA  POR 
3  62.0  Rudy Gobert  CJ McCollum  INC  POR  UTA 
4  27.1  CJ McCollum  Rodney Hood  CC  UTA  POR 
We have only loaded some of the data set’s columns; see the original CSV header for the rest.
The response variable in our analysis is derived from review_decision
, which contains information about whether the incident was a call or noncall and whether, upon postgame review, the NBA deemed the (non)call correct or incorrect. Below we show the frequencies of each type of review_decision
.
= (raw_df.groupby('review_decision')
ax
.size()='bar'))
.plot(kind
"Frequency"); ax.set_ylabel(
The possible values of review_decision
are

CC
for correct call, 
CNC
for correct noncall, 
IC
for incorrect call, and 
INC
for incorrect noncall.
While review_decision
decision provides information about both whether or not a foul was called and whether or not a foul was actually committed, this analysis will focus only on whether or not a foul was called. Including whether or not a foul was actually committed in this analysis introduces some subtleties that are best left to a future post.
In this dataset, the “committing” player is the one that a foul would be called against, if a foul was called on the play, and the other player is “disadvantaged.”
We now encode the data. Since the committing player on one play may be the disadvantaged player on another play, we melt
the raw data frame to have one row per playerplay combination so that we can encode the players in a way that is consistent across columns.
= {
PLAYER_MAP "Jose Juan Barea": "JJ Barea",
"Nene Hilario": "Nene",
"Tim Hardaway": "Tim Hardaway Jr",
"James Ennis": "James Ennis III",
"Kelly Oubre": "Kelly Oubre Jr",
"Taurean WallerPrince": "Taurean Prince",
"Glenn Robinson": "Glenn Robinson III",
"Otto Porter": "Otto Porter Jr"
}
= {
TEAM_MAP "NKY": "NYK",
"COS": "BOS",
"SAT": "SAS"
}
= (pd.melt(
long_df =True)
(raw_df.reset_index(drop'play_id')
.rename_axis(
.reset_index()),=['play_id', 'review_decision',
id_vars'committing_team', 'disadvantaged_team',
'seconds_left'],
=['committing_player', 'disadvantaged_player'],
value_vars='player', value_name='player_name_')
var_name# fix inconsistent player names
=lambda df: (df.player_name_
.assign(player_namestr.replace('\.', '')
.apply(lambda name: PLAYER_MAP.get(name, name))))
.=lambda df: (df.committing_team
.assign(team_== 'committing_player',
.where(df.player
df.disadvantaged_team)))# fix typos in team names
=lambda df: df.team_.apply(lambda team: TEAM_MAP.get(team, team)))
.assign(team'committing_team', 'disadvantaged_team', 'team_'], axis=1))
.drop([
'player_id'], player_map = long_df.player_name.factorize() long_df[
long_df.head()
play_id  review_decision  seconds_left  player  player_name_  player_name  team  player_id  

0  0  CNC  102.0  committing_player  AlFarouq Aminu  AlFarouq Aminu  POR  0 
1  1  CC  98.0  committing_player  Boris Diaw  Boris Diaw  UTA  1 
2  2  CNC  64.0  committing_player  Ed Davis  Ed Davis  POR  2 
3  3  INC  62.0  committing_player  Rudy Gobert  Rudy Gobert  UTA  3 
4  4  CC  27.1  committing_player  CJ McCollum  CJ McCollum  POR  4 
After encoding, we pivot back to a wide data frame with one row per play.
= (long_df.pivot_table(index=['play_id', 'review_decision', 'seconds_left'],
df ='player', values='player_id')
columns={
.rename(columns'committing_player': 'committing_id',
'disadvantaged_player': 'disadvantaged_id'
})'', axis=1)
.rename_axis(
.reset_index()=lambda df: 1 * (df.review_decision.isin(['CC', 'IC'])))
.assign(foul_called'play_id', 'review_decision'],
.drop([=1)) axis
In addition to encoding the players, we have include a column (foul_called
) that indicates whether or not a foul was called on the play.
df.head()
seconds_left  committing_id  disadvantaged_id  foul_called  

0  102.0  0  300  0 
1  98.0  1  124  1 
2  64.0  2  300  0 
3  62.0  3  4  0 
4  27.1  4  6  1 
In order to understand how foul calls vary systematically across players, we will use salary as a proxy for “star power.” The salary data we use was downloaded from Basketball Reference.
= 'http://www.austinrochford.com/resources/nba_irt/2016_2017_salaries.csv'
SALARY_URI
= (pd.read_csv(SALARY_URI, skiprows=1,
salary_df =['Player', '201617'])
usecols=lambda df: (df.Player
.assign(player_namestr.split('\\', expand=True)[0]
.str.replace('\.', '')
.# fix inconsistent player names
apply(lambda name: PLAYER_MAP.get(name, name))),
.=lambda df: (df['201617'].str
salary'$')
.lstrip(
.astype(np.float64)))=lambda df: np.log10(df.salary))
.assign(log_salary=lambda df: (df.log_salary  df.log_salary.mean()) / df.log_salary.std())
.assign(std_log_salary'Player', '201617'], axis=1)
.drop(['player_name')
.groupby(max()
.lambda name: name in player_map)
.select(=lambda df: (np.equal
.assign(player_id
.outer(player_map, df.index)=0)))
.argmax(axis
.reset_index()'player_id')
.set_index( .sort_index())
Since NBA salaries span many orders of magnitude (LeBron James’ salary is just shy of $31M while the lowest paid player made just more than $200K) we will use log salaries, standardized to have mean zero and standard deviation one in our model.
salary_df.head()
player_name  salary  log_salary  std_log_salary  

player_id  
0  AlFarouq Aminu  7680965.0  6.885416  0.848869 
1  Boris Diaw  7000000.0  6.845098  0.797879 
2  Ed Davis  6666667.0  6.823909  0.771080 
3  Rudy Gobert  2121288.0  6.326600  0.142129 
4  CJ McCollum  3219579.0  6.507799  0.371293 
We also produce a dataframe associating players to teams, along with some useful perplayer summaries.
= (long_df.groupby('team')
team_player_map
.player_idapply(pd.Series.drop_duplicates)
.=1, drop=True)
.reset_index(level
.reset_index()=lambda df: player_map[df.player_id],
.assign(name=lambda tpm_df: (df.groupby('disadvantaged_id')
disadvantaged_rate
.foul_called
.mean()
.ix[tpm_df.player_id]
.values),=lambda tpm_df: (df.groupby('disadvantaged_id')
disadvantaged_plays
.size()
.ix[tpm_df.player_id]
.values))0)) .fillna(
team_player_map.head()
team  player_id  disadvantaged_plays  disadvantaged_rate  name  

0  ATL  114  8.0  0.000000  Kyle Korver 
1  ATL  115  13.0  0.538462  Dwight Howard 
2  ATL  116  44.0  0.272727  Paul Millsap 
3  ATL  117  60.0  0.283333  Dennis Schroder 
4  ATL  181  25.0  0.200000  Kent Bazemore 
Modeling
Throughout this post, we will develop a series of models for understanding how foul calls vary across players, starting with a simple betaBernoulli model and working our way up to a hierachical itemresponse theory regression model.
Before building models, we must introduce a bit of notation. The index \(i\) will correspond to a disadvantaged player and the index \(j\) corresponds to a committing player. The index \(k\) corresponds to a play. With this notation \(i(k)\) and \(j(k)\) are the index of the disadvantaged and committing player involved in play \(k\), respectively. The binary variable \(y_k\) indicates whether or not a foul was called on play \(k\). All of our models use the likelihood
\[y_k \sim \textrm{Bernoulli}(p_{i(k), j(k)}).\]
Each model differs in its specification of the probability that a foul is called, \(p_{i, j}\).
BetaBernoulli model
One of the simplest possible models for this data focuses only on the disadvantaged player, so \(p_{i, j} = p_i\), and places independent beta priors on each \(p_i\). For simplicity, we begin with uniform priors, \(p_i \sim \textrm{Beta}(1, 1).\)
Even though this model is conjugate, we will use pymc3
to perform inference with it for consistency with subsequent, nonconjugate models.
= player_map.size
n_players = df.disadvantaged_id.values
disadvantaged_id
= df.foul_called.values
foul_called = foul_called.mean() obs_rate
with pm.Model() as bb_model:
= pm.Beta('p', 1., 1., shape=n_players)
p
= pm.Bernoulli('y_obs', p[disadvantaged_id],
y =foul_called) observed
Throughout this post, we will use the noUturn sampler for inference, tuning the sampler’s hyperparameters for the first two thousand samples and subsequently keeping the next two thousand samples for inference.
= 2000
N_TUNE = 2000
N_SAMPLES
= 506421 # from random.org, for reproducibility SEED
We now sample from the betaBernoulli model.
def sample(model, n_tune, n_samples, seed):
with model:
= pm.sample(n_tune + n_samples, tune=n_tune, random_seed=seed)
full_trace
return full_trace[n_tune:]
= sample(bb_model, N_TUNE, N_SAMPLES, SEED) bb_trace
Autoassigning NUTS sampler...
Initializing NUTS using advi...
2%▏  4793/200000 [00:02<01:47, 1818.84it/s]Median ELBO converged.
Finished [100%]: Average ELBO = 3,554.3
100%██████████ 4000/4000 [01:03<00:00, 63.38it/s]
We use energy energy plots to diagnose possible problems with our samples.
def energy_plot(trace):
= trace['energy']
energy = np.diff(energy)
energy_diff
= plt.subplots(figsize=(8, 6))
fig, ax
 energy.mean(), bins=30,
ax.hist(energy =0, alpha=0.5,
lw="Energy")
label=30,
ax.hist(energy_diff, bins=0, alpha=0.5,
lw="Energy difference")
label
ax.set_xticks([])
ax.set_yticks([])
ax.legend()
energy_plot(bb_trace)
Since the energy and energy difference distributions are quite similar, there is no indication from this plot of sampling issues. For an indepth treatment of Hamiltonian Monte Carlo algorithms and convergence diagnostics, consult Michael Betancourt’s excellent paper A Conceptual Introduction to Hamiltonian Monte Carlo.
We will use the widely applicable information criterion (WAIC) and binned residuals to check and compare our models. WAIC is a Bayesian measure of outofsample predictive accuracy based on insample data that is quite closely related to [leaveoneout crossvalidation](https://en.wikipedia.org/wiki/Crossvalidation_(statistics%29#Leaveoneout_crossvalidation). It attempts to improve upon known shortcomings of the widelyused deviance information criterion. (See Understanding predictive information criteria for Bayesian models for a review and comparison of various information criteria, including DIC and WAIC.) WAIC is easy to calculate with pymc3
.
def get_waic_df(model, trace, name):
with model:
= pm.waic(trace)
waic
return pd.DataFrame.from_records([waic], index=[name], columns=waic._fields)
= get_waic_df(bb_model, bb_trace, "BetaBernoulli") waic_df
/opt/conda/lib/python3.5/sitepackages/pymc3/stats.py:145: UserWarning: For one or more samples the posterior variance of the
log predictive densities exceeds 0.4. This could be indication of
WAIC starting to fail see http://arxiv.org/abs/1507.04544 for details
""")
We see that the WAIC calculation indicates difficulties with the betaBernoulii model, which we will soon confirm.
waic_df
WAIC  WAIC_se  p_WAIC  

BetaBernoulli  6021.491064  66.23822  238.488377 
In addition to the WAIC value, we get an estimate of its standard error (WAIC_se
) and the number of effective parameters in the model (p_WAIC
). The number of effective parameters is an indication of model complexity.
The second diagnostic tool we use on our models are binned residuals, which show how wellcalibrated the model’s predicted probabilities are. Intuitively, if our model predicts that an event has a 35% chance of occurring and we can observe many repetitions of that event, we would expect the event to actually occur about 35% of the time. If the observed occurrences of the event differ substantially from the predicted rate, we have reason to doubt the quality of our model. Since we generally can’t observe each event many times, we instead group events into bins by their predicted probability and check that the average predicted probability in each bin is close to the rate at which the events in that bin are observed.
The binned predictions and residuals for the betaBernoulli model are shown below.
= np.linspace(0, 1, 11)
BINS = BINS[np.newaxis, np.newaxis]
BINS_3D
def binned_residuals(y, p):
= p[..., np.newaxis]
p_3d
= (BINS_3D[..., :1] < p_3d) & (p_3d <= BINS_3D[..., 1:])
in_bin = in_bin.sum(axis=(0, 1))
bin_counts
= (in_bin * p_3d).sum(axis=(0, 1)) / bin_counts
p_mean = (in_bin * y[np.newaxis, :, np.newaxis]).sum(axis=(0, 1)) / bin_counts
y_mean
return y_mean, p_mean, bin_counts
def binned_residual_plot(bin_obs, bin_p, bin_counts):
= plt.subplots(ncols=2, sharex=True, figsize=(16, 6))
fig, (ax, resid_ax)
ax.scatter(bin_p, bin_obs,=300 * np.sqrt(bin_counts / bin_counts.sum()),
s=5)
zorder0, 1], [0, 1], '', c='k')
ax.plot([
0, 1)
ax.set_xlim(0, 1, 5))
ax.set_xticks(np.linspace(
ax.xaxis.set_major_formatter(pct_formatter)"Mean $p$ (binned)")
ax.set_xlabel(
0, 1)
ax.set_ylim(0, 1, 5))
ax.set_yticks(np.linspace(
ax.yaxis.set_major_formatter(pct_formatter)"Observed rate (binned)")
ax.set_ylabel(
 bin_p,
resid_ax.scatter(bin_p, bin_obs =300 * np.sqrt(bin_counts / bin_counts.sum()),
s=5)
zorder
0, 0, 1, 'k', '')
resid_ax.hlines(
0, 1)
resid_ax.set_xlim(0, 1, 5))
resid_ax.set_xticks(np.linspace(
resid_ax.xaxis.set_major_formatter(pct_formatter)"Mean $p$ (binned)")
resid_ax.set_xlabel(
resid_ax.yaxis.set_major_formatter(pct_formatter)"Residual (binned)") resid_ax.set_ylabel(
= binned_residuals(foul_called, bb_trace['p'][:, disadvantaged_id])
bin_obs, bin_p, bin_counts
binned_residual_plot(bin_obs, bin_p, bin_counts)
In these plots, the dashed black lines show how these quantities would be related, for a perfect model. The area of each point is proportional to the number of events whose predicted probability fell in the relevant bin. From these plots, we get further confirmation that our simple betaBernoulli model is quite unsatisfactory, as many binned residuals exceed 5% in absolute value.
Below we plot the posterior mean and 90% credible interval for \(p\) for each player in the data set (grouped by team, for legibility), along with the player’s observed foul called percentage when disadvantaged. The area of the point for observed foul called percentage is proportional to the number of plays in which the player was disadvantaged.
def to_param_df(player_df, trace, varnames):
= player_df
df
for name in varnames:
= trace[name].mean(axis=0)
mean = np.percentile(trace[name], [5, 95], axis=0)
low, high
= df.assign(**{
df '{}_mean'.format(name): mean[df.player_id],
'{}_low'.format(name): low[df.player_id],
'{}_high'.format(name): high[df.player_id]
})
return df
= to_param_df(team_player_map, bb_trace, ['p']) bb_df
def plot_params(mean, interval, names, ax=None, **kwargs):
if ax is None:
= plt.subplots(figsize=(8, 6))
fig, ax
= names.size
n_players
ax.errorbar(mean, np.arange(n_players),=np.abs(mean  interval),
xerr='o',
fmt="Mean with\n90% interval")
label
1, n_players)
ax.set_ylim(
ax.set_yticks(np.arange(n_players))
ax.set_yticklabels(names)
return ax
def plot_p_params(rate, n_plays, league_mean, ax=None, **kwargs):
if ax is None:
= plt.gca()
ax
= rate.size
n_players
ax.scatter(rate, np.arange(n_players),='k', s=20 * np.sqrt(n_plays),
c=0.5, zorder=5,
alpha="Observed")
label
1, n_players,
ax.vlines(league_mean, 'k', '',
="League average")
label
def plot_p_helper(mean, low, high, rate, n_plays, names, league_mean=None, ax=None, **kwargs):
if ax is None:
= plt.gca()
ax
= mean.values
mean = rate.values
rate = n_plays.values
n_plays = names.values
names
= mean.argsort()
argsorted_ix = np.row_stack([low, high])
interval
plot_params(mean[argsorted_ix], interval[:, argsorted_ix], names[argsorted_ix],=ax, **kwargs)
ax
plot_p_params(rate[argsorted_ix], n_plays[argsorted_ix], league_mean,=ax, **kwargs) ax
= sns.FacetGrid(bb_df, col='team', col_wrap=2,
grid =False,
sharey=4, aspect=1.5)
sizemap(plot_p_helper,
grid.'p_mean', 'p_low', 'p_high',
'disadvantaged_rate', 'disadvantaged_plays', 'name',
=obs_rate);
league_mean
r"$p$", "Player");
grid.set_axis_labels(
for ax in grid.axes:
0, 1, 5));
ax.set_xticks(np.linspace(=True)
ax.set_xticklabels(ax.get_xticklabels(), visible;
ax.xaxis.set_major_formatter(pct_formatter)
;
grid.fig.tight_layout(); grid.add_legend()
These plots reveal an undesirable property of this model and its inferences. Since the prior distribution on \(p_i\) is uniform on the interval \([0, 1]\), all posterior estimates of \(p_i\) are pulled towards the prior expected value of 50%. This phenomenon is known as shrinkage. In extreme cases of players that were never disadvantaged, the posterior estimate of \(p_i\) is quite close to 50%. For these players, the league average foul call rate would seem to be a much more reasonable estimate of \(p_i\) than 50%. The league average foul call rate is shown as a dotted black line on the charts above.
There are several possible modifications of the betaBernoulli model that can cause shrinkage toward the league average. Perhaps the most straightforward is the empirical Bayesian method that sets the parameters of the prior distribution on \(p_i\) using the observed data. In this framework, there are many methods of choosing prior hyperparameters that make the prior expected value equal to the league average, therefore causing shrinkage toward the league average. We do not use empirical Bayesian methods in this post as they make it cumbersome to build the more complex models we want to use to understand the relationship between salary and foul calls. Empirical Bayesian methods are, however, an approximation to the fully hierachical models we begin building in the next section.
Hierarchical logisticnormal model
A hierarchical logisticnormal model addresses some of the shortcomings of the betaBernoulli model. For simplicity, this model focuses exclusively on the disadvantaged player and assumes that the logodds of a foul call for a given disadvantaged player are normally distributed. That is,
\[ \begin{align*} \log \left(\frac{p_i}{1  p_i}\right) & \sim N(\mu, \sigma^2) \\ \eta_{k} & = \log \left(\frac{p_{i(k)}}{1  p_{i(k)}}\right), \end{align*}\]
which is equivalent to
\[p_{i(k)} = \frac{1}{1 + \exp(\eta_k)}.\]
We address the betaBernoulli model’s shrinkage problem by placing a normal hyperprior distribution on \(\mu\), \(\mu \sim N(0, 100).\) This shared hyperprior makes this model hierarchical. To complete the specification of this model, we place a halfCauchy prior on \(\sigma\), \(\sigma \sim \textrm{HalfCauchy}(2.5)\).
with pm.Model() as ln_model:
= pm.Normal('μ', 0., 10.)
μ = pm.Normal('Δ', 0., 1., shape=n_players)
Δ = pm.HalfCauchy('σ', 2.5)
σ
= pm.Deterministic('p_player', pm.math.sigmoid(μ + Δ * σ))
p_player
= μ + Δ[disadvantaged_id] * σ
η = pm.Deterministic('p', pm.math.sigmoid(η))
p
= pm.Bernoulli('y_obs', p, observed=foul_called) y
Throughout this post we use an offset parameterization for hierarchical models that significantly improves sampling efficiency. We now sample from this model.
= sample(ln_model, N_TUNE, N_SAMPLES, SEED) ln_trace
Autoassigning NUTS sampler...
Initializing NUTS using advi...
9%▊  17001/200000 [00:12<02:13, 1372.87it/s]Median ELBO converged.
Finished [100%]: Average ELBO = 3,520.3
100%██████████ 4000/4000 [02:27<00:00, 65.27it/s]
The energy plot for this model gives no cause for concern.
energy_plot(ln_trace)
We now calculate the WAIC of the logisticnormal model, and compare it to that of the betaBernoulli model.
= waic_df.append(get_waic_df(ln_model, ln_trace, "Logisticnormal")) waic_df
def waic_plot(waic_df):
= plt.subplots(ncols=2, sharex=True, figsize=(16, 6))
fig, (waic_ax, p_ax)
= np.arange(waic_df.shape[0])
waic_x
waic_ax.errorbar(waic_x, waic_df.WAIC,=waic_df.WAIC_se,
yerr='o')
fmt
waic_ax.set_xticks(waic_x)False)
waic_ax.xaxis.grid(
waic_ax.set_xticklabels(waic_df.index)"Model")
waic_ax.set_xlabel(
"WAIC")
waic_ax.set_ylabel(
p_ax.bar(waic_x, waic_df.p_WAIC)
False)
p_ax.xaxis.grid("Model")
p_ax.set_xlabel(
"Effective number\nof parameters") p_ax.set_ylabel(
waic_plot(waic_df)
The lefthand plot above shows that the logisticnormal model is a significant improvement in WAIC over the betaBernoulli model, which is unsurprising. The righthand plot shows that the logisticnormal model has roughly 20% the number of effective parameters as the betaBernoulli model. This reduction is due to the partial pooling effect of the hierarchical prior. The hyperprior on \(\mu\) causes observations for one player to impact the estimate of \(p_i\) for all players; this sharing of information across players is responsible for the large decrease in the number of effective parameters.
Finally, we examine the binned residuals for the logisticnormal model.
= binned_residuals(foul_called, ln_trace['p'])
bin_obs, bin_p, bin_counts
binned_residual_plot(bin_obs, bin_p, bin_counts)
These binned residuals are much smaller than those of the betaBernoulli model, which is further confirmation that the logisticnormal model is preferable.
Below we plot the posterior distribution of \(\operatorname{logit}^{1}(\mu)\), and we see that the observed foul call rate of approximately 25.1% lies within its 90% interval.
= pm.plot_posterior(ln_trace, ['μ'],
ax, =0.1, transform=expit, ref_val=obs_rate,
alpha_level=0., alpha=0.75)
lw
;
ax.xaxis.set_major_formatter(pct_formatter)
r"$\operatorname{logit}^{1}(\mu)$"); ax.set_title(
With this posterior for \(\operatorname{logit}^{1}(\mu)\), we see the desired posterior shrinkage of each \(p_i\) toward the observed foul call rate.
= to_param_df(team_player_map, ln_trace, ['p']) ln_df
= sns.FacetGrid(ln_df, col='team', col_wrap=2,
grid =False,
sharey=4, aspect=1.5)
sizemap(plot_p_helper,
grid.'p_mean', 'p_low', 'p_high',
'disadvantaged_rate', 'disadvantaged_plays', 'name',
=obs_rate);
league_mean
r"$p$", "Player");
grid.set_axis_labels(
for ax in grid.axes:
0, 1, 5));
ax.set_xticks(np.linspace(=True)
ax.set_xticklabels(ax.get_xticklabels(), visible;
ax.xaxis.set_major_formatter(pct_formatter)
;
grid.fig.tight_layout(); grid.add_legend()
The inferences in these plots are markedly different from those of the betaBernoulli model. Most strikingly, we see that estimates have been shrunk towards the league average foul call rate, and that players that were never disadvantaged have posterior foul call probabilities quite close to that rate. As a consequence of this more reasonable shrinkage, the range of values taken by the posterior \(p_i\) estimates is much smaller for the logisticnormal model than for the betaBernoulli model. Below we plot the top and bottomten players by \(p_i\).
= plt.subplots(nrows=2, sharex=True, figsize=(8, 10))
fig, (top_ax, bottom_ax)
= (ln_df.drop_duplicates(['player_id'])
by_p 'p_mean'))
.sort_values(= by_p.iloc[10:]
p_top
'p_low', 'p_high']].values.T,
plot_params(p_top.p_mean.values, p_top[[
p_top.name.values,=top_ax);
ax1, 10,
top_ax.vlines(obs_rate, 'k', '',
=r"League average");
label
;
top_ax.xaxis.set_major_formatter(pct_formatter)
"Player");
top_ax.set_ylabel(
"Top ten");
top_ax.set_title(
= by_p.iloc[:10]
p_bottom
'p_low', 'p_high']].values.T,
plot_params(p_bottom.p_mean.values, p_bottom[[
p_bottom.name.values,=bottom_ax);
ax1, 10,
bottom_ax.vlines(obs_rate, 'k', '',
=r"League average");
label
;
bottom_ax.xaxis.set_major_formatter(pct_formatter)r"$p$");
bottom_ax.set_xlabel(
"Player");
bottom_ax.set_ylabel(
;
fig.tight_layout()=1);
bottom_ax.legend(loc"Bottom ten"); bottom_ax.set_title(
Itemresponse (Rasch) model
The hierarchical logisticnormal model is certainly an improvement over the betaBernoulli model, but both of these models have focused solely on the disadvantaged player. It seems quite important to understand the contribution of not just the disadvantaged player, but also of the committing player in each play to the probability of a foul call. Itemresponse theory (IRT) provides generalizations the logisticnormal model that can account for the influence of both players involved in a play. IRT originated in psychometrics as a way to simultaneously measure individual aptitude and question difficulty based on testresponse data, and has subsequently found many other applications. We use IRT to model foul calls by considering disadvantaged players as analagous to test takers and committing players as analagous to questions. Specifically, we will use the Rasch model for the probability \(p_{i, j}\), that a foul is called on a play where player \(i\) is disadvantaged by committing player \(j\). This model posits that each player has a latent ability, \(\theta_i\), that governs how often fouls are called when they are disadvantaged and a latent difficulty \(b_j\) that governs how often fouls are not called when they are committing. The probability that a foul is called on a play where player \(i\) is disadvantaged and player \(j\) is committing is then a function of the difference between the corresponding latent ability and difficulty parameters,
\[ \begin{align*} \eta_k & = \theta_{i(k)}  b_{j(k)} \\ p_k & = \frac{1}{1 + \exp(\eta_k)}. \end{align*} \]
In this model, a player with a large value of \(\theta_i\) is more likely to get a foul called when they are disadvantaged, and a player with a large value of \(b_j\) is less likely to have a foul called when they are committing. If \(\theta_{i(k)} = b_{j(k)}\), there is a 50% chance a foul is called on that play.
To complete the specification of this model, we place priors on \(\theta_i\) and \(b_j\). Similarly to \(\eta\) in the logisticnormal model, we place a hierarchical normal prior on \(\theta_i\),
\[ \begin{align*} \mu_{\theta} & \sim N(0, 100) \\ \sigma_{\theta} & \sim \textrm{HalfCauchy}(2.5) \\ \theta_i & \sim N(\mu_{\theta}, \sigma^2_{\theta}). \end{align*} \]
with pm.Model() as rasch_model:
= pm.Normal('μ_θ', 0., 10.)
μ_θ = pm.Normal('Δ_θ', 0., 1., shape=n_players)
Δ_θ = pm.HalfCauchy('σ_θ', 2.5)
σ_θ = pm.Deterministic('θ', μ_θ + Δ_θ * σ_θ) θ
We also place a hierarchical normal prior on \(b_j\), though this prior must be subtley different from that on \(\theta_i\). Since \(\theta_i\) and \(b_j\) are latent variables, there is no natural scale on which they should be measured. If each \(\theta_i\) and \(b_j\) are shifted by the same amount, say \(\delta\), the likelihood does not change. That is, if \(\tilde{\theta}_i = \theta_i + \delta\) and \(\tilde{b}_j = b_j + \delta\), then
\[ \tilde{\eta}_{i, j} = \tilde{\theta}_i  \tilde{b}_j = \theta_i + \delta  (b_j + \delta) = \theta_i  b_j = \eta_{i, j}. \]
Therefore, if we allow \(\theta_i\) and \(\beta_j\) to be shifted by arbitrary amounts, the Rasch model is not identified. We identify the Rasch model by constraining the mean of the hyperprior on \(b_j\) to be zero,
\[ \begin{align*} \sigma_b & \sim \textrm{HalfCauchy}(2.5) \\ b_j & \sim N(0, \sigma^2_b). \end{align*} \]
with rasch_model:
= pm.Normal('Δ_b', 0., 1., shape=n_players)
Δ_b = pm.HalfCauchy('σ_b', 2.5)
σ_b = pm.Deterministic('b', Δ_b * σ_b) b
We now specify the Rasch model’s likelihood and sample from it.
= df.committing_id.values committing_id
with rasch_model:
= θ[disadvantaged_id]  b[committing_id]
η = pm.Deterministic('p', pm.math.sigmoid(η))
p
= pm.Bernoulli('y_obs', p, observed=foul_called) y
= sample(rasch_model, N_TUNE, N_SAMPLES, SEED) rasch_trace
Autoassigning NUTS sampler...
Initializing NUTS using advi...
Average ELBO = 3,729.5: 11%█▏  22583/200000 [00:19<02:33, 1156.17it/s]Median ELBO converged.
Finished [100%]: Average ELBO = 3,037.1
100%██████████ 4000/4000 [02:01<00:00, 32.81it/s]
Again, the energy plot for this model gives no cause for concern.
energy_plot(rasch_trace)
Below we show the WAIC of our three models.
= waic_df.append(get_waic_df(rasch_model, rasch_trace, "Rasch")) waic_df
waic_plot(waic_df)
The Rasch model represents a moderate WAIC improvement over the logisticnormal model, and unsurprisingly has many more effective parameters (since it added a nominal parameter, \(b_j\), per player).
The Rasch model also has reasonable binned residuals, with very few events having residuals above 5%.
= binned_residuals(foul_called, rasch_trace['p'])
bin_obs, bin_p, bin_counts
binned_residual_plot(bin_obs, bin_p, bin_counts)
For the Rasch model (and subsequent models), we switch from visualizing the perplayer call probabilities to the latent parameters \(\theta_i\) and \(b_j\).
= rasch_trace['μ_θ'].mean()
μ_θ_mean
= to_param_df(team_player_map, rasch_trace, ['θ', 'b']) rasch_df
def plot_params_helper(mean, low, high, names, league_mean=None, league_mean_name=None, ax=None, **kwargs):
if ax is None:
= plt.gca()
ax
= mean.values
mean = names.values
names
= mean.argsort()
argsorted_ix = np.row_stack([low, high])
interval
plot_params(mean[argsorted_ix], interval[:, argsorted_ix], names[argsorted_ix],=ax, **kwargs)
ax
if league_mean is not None:
1, names.size,
ax.vlines(league_mean, 'k', '',
=league_mean_name) label
= sns.FacetGrid(rasch_df, col='team', col_wrap=2,
grid =False,
sharey=4, aspect=1.5)
sizemap(plot_params_helper,
grid.'θ_mean', 'θ_low', 'θ_high', 'name',
=μ_θ_mean,
league_mean=r"$\mu_{\theta}$");
league_mean_name
r"$\theta$", "Player");
grid.set_axis_labels(
;
grid.fig.tight_layout(); grid.add_legend()
= sns.FacetGrid(rasch_df, col='team', col_wrap=2,
grid =False,
sharey=4, aspect=1.5)
sizemap(plot_params_helper,
grid.'b_mean', 'b_low', 'b_high', 'name',
=0.);
league_mean
r"$b$", "Player");
grid.set_axis_labels(
;
grid.fig.tight_layout(); grid.add_legend()
Though these plots are voluminuous and therefore difficult to interpret precisely, a few trends are evident. The first is that there is more variation in the committing skill (\(b_j\)) than in disadvantaged skill (\(\theta_i\)). This difference is confirmed in the following histograms of the posterior expected values of \(\theta_i\) and \(b_j\).
def plot_latent_distributions(θ, b):
= plt.subplots(nrows=2, sharex=True, figsize=(8, 6))
fig, (θ_ax, b_ax)
= np.linspace(0.9 * min(θ.min(), b.min()),
bins 1.1 * max(θ.max(), b.max()),
75)
=bins,
θ_ax.hist(θ, bins=0.75)
alpha
'top')
θ_ax.xaxis.set_label_position(r"Posterior expected $\theta$")
θ_ax.set_xlabel(
θ_ax.set_yticks([])"Frequency")
θ_ax.set_ylabel(
=bins,
b_ax.hist(b, bins=green, alpha=0.75)
color
b_ax.xaxis.tick_top()r"Posterior expected $b$")
b_ax.set_xlabel(
b_ax.set_yticks([])
b_ax.invert_yaxis()"Frequency")
b_ax.set_ylabel(
fig.tight_layout()
plot_latent_distributions(rasch_df.θ_mean, rasch_df.b_mean)
The following plots show the top and bottom ten players in terms of both \(\theta_i\) and \(b_j\).
def top_10_plot(trace_df, μ_θ=0):
= plt.figure(figsize=(16, 10))
fig = fig.add_subplot(221)
θ_top_ax = fig.add_subplot(222)
b_top_ax = fig.add_subplot(223, sharex=θ_top_ax)
θ_bottom_ax = fig.add_subplot(224, sharex=b_top_ax)
b_bottom_ax
# necessary for players that have been on more than one team
= trace_df.drop_duplicates(['player_id'])
trace_df
= trace_df.sort_values('θ_mean')
by_θ = by_θ.iloc[10:]
θ_top = by_θ.iloc[:10]
θ_bottom
'θ_low', 'θ_high']].values.T,
plot_params(θ_top.θ_mean.values, θ_top[[
θ_top.name.values,=θ_top_ax)
ax1, 10,
θ_top_ax.vlines(μ_θ, 'k', '',
=(r"$\mu_{\theta}$" if μ_θ != 0 else "League average"))
label
=False)
plt.setp(θ_top_ax.get_xticklabels(), visible
"Player")
θ_top_ax.set_ylabel(
=2)
θ_top_ax.legend(loc"Top ten")
θ_top_ax.set_title(
'θ_low', 'θ_high']].values.T,
plot_params(θ_bottom.θ_mean.values, θ_bottom[[
θ_bottom.name.values,=θ_bottom_ax)
ax1, 10,
θ_bottom_ax.vlines(μ_θ, 'k', '',
=(r"$\mu_{\theta}$" if μ_θ != 0 else "League average"))
label
r"$\theta$")
θ_bottom_ax.set_xlabel(
"Player")
θ_bottom_ax.set_ylabel(
"Bottom ten")
θ_bottom_ax.set_title(
= trace_df.sort_values('b_mean')
by_b = by_b.iloc[10:]
b_top = by_b.iloc[:10]
b_bottom
'b_low', 'b_high']].values.T,
plot_params(b_top.b_mean.values, b_top[[
b_top.name.values,=b_top_ax)
ax0, 1, 10,
b_top_ax.vlines('k', '',
="League average");
label
=False)
plt.setp(b_top_ax.get_xticklabels(), visible
=2)
b_top_ax.legend(loc"Top ten")
b_top_ax.set_title(
= b.argsort()[:10]
b_bottom_player_id
'b_low', 'b_high']].values.T,
plot_params(b_bottom.b_mean.values, b_bottom[[
b_bottom.name.values,=b_bottom_ax)
ax0, 1, 10,
b_bottom_ax.vlines('k', '')
r"$b$")
b_bottom_ax.set_xlabel(
"Bottom ten")
b_bottom_ax.set_title(
fig.tight_layout()
=μ_θ_mean) top_10_plot(rasch_df, μ_θ
We focus first on \(\theta_i\). Interestingly, the topten players for the Rasch model contains many more toptier stars than the logisticnormal model, including John Wall, Russell Westbrook, and LeBron James. Turning to \(b\), it is interesting that the while the top and bottom ten players contain many recognizable names (LaMarcus Aldridge, Harrison Barnes, Kawhi Leonard, and Ricky Rubio) the only truly toptier player present is Anthony Davis.
Time remaining model
As basketball fans know, there are many factors other than the players involved that influence foul calls. Very often, sufficiently close NBA games end with intentional fouls, as the losing team attempts to stop the clock and force another offensive possesion. Therefore, we expect to see in increase in the foul call probability as the game nears its conclusion.
= 121
n_sec = (df.seconds_left
sec round(0)
.
.values .astype(np.int64))
= plt.subplots(figsize=(8, 6))
fig, ax
(df.groupby(sec)
.foul_called
.mean()='k', label="Observed foul call rate", ax=ax));
.plot(c
0, 120, 5));
ax.set_xticks(np.linspace(;
ax.invert_xaxis()"Seconds remaining in game");
ax.set_xlabel(
;
ax.yaxis.set_major_formatter(pct_formatter)"Probability foul is called");
ax.set_ylabel(
=2); ax.legend(loc
The plot above confirms this expectation, which we can use to improve our latent skill model. If \(t \in \{0, 1, \ldots, 120\}\) is the number of seconds remaining in the game, we model the latent contribution of \(t\) to the logodds that a foul is called with a Gaussian random walk,
\[ \begin{align*} \lambda_0 & \sim N(0, 100) \\ \lambda_t & \sim N(\lambda_{t  1}, \tau^{1}_{\lambda}) \\ \tau_{\lambda} & \sim \textrm{Exp}(10^{4}). \end{align*} \]
This prior allows us to flexibly model the shape of the curve shown above. If \(t(k)\) is the number of seconds remaining during the \(k\)th play, we incorporate \(\lambda_{t(k)}\) into our model with
\[\eta_k = \lambda_{t(k)} + \theta_{i(k)}  b_{j(k)}.\]
This model is not identified until we constrain the mean of \(\theta\) to be zero, for reasons similar to those discussed above for the Rasch model.
with pm.Model() as time_model:
= pm.Exponential('τ_λ', 1e4)
τ_λ = pm.GaussianRandomWalk('λ', tau=τ_λ,
λ =pm.Normal.dist(0., 10.),
init=n_sec)
shape
= pm.Normal('Δ_θ', 0., 1., shape=n_players)
Δ_θ = pm.HalfCauchy('σ_θ', 2.5)
σ_θ = pm.Deterministic('θ', Δ_θ * σ_θ)
θ
= pm.Normal('Δ_b', 0., 1., shape=n_players)
Δ_b = pm.HalfCauchy('σ_b', 2.5)
σ_b = pm.Deterministic('b', Δ_b * σ_b)
b
= λ[sec] + θ[disadvantaged_id]  b[committing_id]
η = pm.Deterministic('p', pm.math.sigmoid(η))
p
= pm.Bernoulli('y_obs', p, observed=foul_called) y
We now sample from the model.
= sample(time_model, N_TUNE, N_SAMPLES, SEED) time_trace
Autoassigning NUTS sampler...
Initializing NUTS using advi...
Average ELBO = 1.0533e+05: 15%█▌  30300/200000 [00:31<02:52, 982.03it/s] Median ELBO converged.
Finished [100%]: Average ELBO = 3,104.5
100%██████████ 4000/4000 [03:10<00:00, 20.99it/s]
The energy plot for this model is worse than the previous ones, but not too bad.
energy_plot(time_trace)
= waic_df.append(get_waic_df(time_model, time_trace, "Time")) waic_df
waic_plot(waic_df)
We see that the time remaining model represents an appreciable improvement over the Rasch model in terms of WAIC.
= binned_residuals(foul_called, time_trace['p'])
bin_obs, bin_p, bin_counts
binned_residual_plot(bin_obs, bin_p, bin_counts)
The binned residuals for this model also look quite good, with very few samples appreciably exceeding a 1% difference.
We now compare the distribtuions of \(\theta_i\) and \(b_j\) for this model with those for the Rasch model.
= to_param_df(team_player_map, time_trace, ['θ', 'b']) time_df
plot_latent_distributions(time_df.θ_mean, time_df.b_mean)
The effect of constraining the mean of \(\theta\) to be zero is immediately apparent. Also, the variation in \(\theta\) remains lower than the variation than \(b\) in this model. We also see that the top and bottomten players by \(\theta\) and \(b\)value remain largely unchanged from the Rasch model.
top_10_plot(time_df)
Basketball fans may find it amusing that under this model, Dwight Howard has joined the topten in terms of \(\theta\) and Ricky Rubio is no longer the worst player in terms of \(b\).
While this model has not done much to change the rankordering of the most and leastskilled players, it does enable us to plot perplayer foul probabilities over time, as below.
= plt.subplots(ncols=2, sharex=True, sharey=True, figsize=(16, 6))
fig, (θ_ax, b_ax)
(df.groupby(sec)
.foul_called
.mean()='k', alpha=0.5,
.plot(c="Observed foul call rate",
label=θ_ax));
ax
= np.arange(n_sec)
plot_sec
'λ'].mean(axis=0)),
θ_ax.plot(plot_sec, expit(time_trace[='k',
c="Average player");
label
= time_df.ix[time_df.θ_mean.idxmax()].player_id
θ_best_id 'θ'][:, θ_best_id].mean(axis=0) \
θ_ax.plot(plot_sec, expit(time_trace[+ time_trace['λ'].mean(axis=0)),
=player_map[θ_best_id]);
label
= time_df.ix[time_df.θ_mean.idxmin()].player_id
θ_worst_id 'θ'][:, θ_worst_id].mean(axis=0) \
θ_ax.plot(plot_sec, expit(time_trace[+ time_trace['λ'].mean(axis=0)),
=player_map[θ_worst_id]);
label
0, 120, 5));
θ_ax.set_xticks(np.linspace(;
θ_ax.invert_xaxis()"Seconds remaining in game");
θ_ax.set_xlabel(
;
θ_ax.yaxis.set_major_formatter(pct_formatter)"Probability foul is called\nagainst average opposing player");
θ_ax.set_ylabel(
=2);
θ_ax.legend(locr"Disadvantaged player ($\theta$)");
θ_ax.set_title(
(df.groupby(sec)
.foul_called
.mean()='k', alpha=0.5,
.plot(c="Observed foul call rate",
label=b_ax));
ax
= np.arange(n_sec)
plot_sec
'λ'].mean(axis=0)),
b_ax.plot(plot_sec, expit(time_trace[='k',
c="Average player");
label
= time_df.ix[time_df.b_mean.idxmax()].player_id
b_best_id time_trace['b'][:, b_best_id].mean(axis=0) \
b_ax.plot(plot_sec, expit(+ time_trace['λ'].mean(axis=0)),
=player_map[b_best_id]);
label
= time_df.ix[time_df.b_mean.idxmin()].player_id
b_worst_id time_trace['b'][:, b_worst_id].mean(axis=0) \
b_ax.plot(plot_sec, expit(+ time_trace['λ'].mean(axis=0)),
=player_map[b_worst_id]);
label
0, 120, 5));
b_ax.set_xticks(np.linspace(;
b_ax.invert_xaxis()"Seconds remaining in game");
b_ax.set_xlabel(
=2);
b_ax.legend(locr"Committing player ($b$)"); b_ax.set_title(
Here we have plotted the probability of a foul call while being opposed by an average player (for both \(\theta\) and \(b\)), along with the probability curves for the players with the highest and lowest \(\theta\) and \(b\), respectively. While these plots are quite interesting, one weakness of our model is that the difference between each player’s curve and the league average is constant over time. It would be an interesting and useful to extend this model to allow player offsets to vary over time. Additonally, it would be interesting to understand the influence of the score on the foulcalled rate as the game nears its end. It seems quite likely that the winning team is much less likely to commit fouls while the losing team is much more likely to to commit intentional fouls in close games.
We now plot the perplayer values of \(\theta_i\) and \(b_j\) under this model.
= sns.FacetGrid(time_df, col='team', col_wrap=2,
grid =False,
sharey=4, aspect=1.5)
sizemap(plot_params_helper,
grid.'θ_mean', 'θ_low', 'θ_high', 'name',
=0.,
league_mean="League average");
league_mean_name
r"$\theta$", "Player");
grid.set_axis_labels(
;
grid.fig.tight_layout(); grid.add_legend()
= sns.FacetGrid(time_df, col='team', col_wrap=2,
grid =False,
sharey=4, aspect=1.5)
sizemap(plot_params_helper,
grid.'b_mean', 'b_low', 'b_high', 'name',
=0.,
league_mean="League average");
league_mean_name
r"$b$", "Player");
grid.set_axis_labels(
;
grid.fig.tight_layout(); grid.add_legend()
Salary model
Our final model uses salary as a proxy for “star power” to explore its influence on foul calls. We also (somewhat naively) impute missing salaries to the (log) league average.
= (salary_df.ix[np.arange(n_players)]
std_log_salary
.std_log_salary0)
.fillna( .values)
With \(s_i\) denoting the \(i\)the player’s standardized log salary, our model becomes
\[ \begin{align*} \theta_i & = \theta_{0, i} + \delta_{\theta} \cdot s_i \\ b_j & = b_{0, j} + \delta_b \cdot s_j \\ \eta_k & = \lambda_{t(k)} + \theta_{i(k)}  b_{j(k)}. \end{align*} \]
In this model, each player’s \(\theta_i\) and \(b_j\) parameters are linear functions of their standardized log salary, with (hierarchical) varying intercepts. The varying intercepts \(\theta_{0, i}\) and \(b_{0, j}\) are endowed with the same hierarchical normal priors as \(\theta_i\) and \(b_j\) had in the previous model. We place normal priors, \(\delta_{\theta} \sim N(0, 100)\) and \(\delta_b \sim N(0, 100)\), on the salary coefficients.
with pm.Model() as salary_model:
= pm.Exponential('τ_λ', 1e4)
τ_λ = pm.GaussianRandomWalk('λ', tau=τ_λ,
λ =pm.Normal.dist(0., 10.),
init=n_sec)
shape
0 = pm.Normal('Δ_θ0', 0., 1., shape=n_players)
Δ_θ0 = pm.HalfCauchy('σ_θ0', 2.5)
σ_θ0 = pm.Deterministic('θ0', Δ_θ0 * σ_θ0)
θ
= pm.Normal('δ_θ', 0., 10.)
δ_θ
= pm.Deterministic('θ', θ0 + δ_θ * std_log_salary)
θ
= pm.Normal('Δ_b0', 0., 1., shape=n_players)
Δ_b0 = pm.HalfCauchy('σ_b0', 2.5)
σ_b0 = pm.Deterministic('b0', Δ_b0 * σ_b0)
b0
= pm.Normal('δ_b', 0., 10.)
δ_b
= pm.Deterministic('b', b0 + δ_b * std_log_salary)
b
= λ[sec] + θ[disadvantaged_id]  b[committing_id]
η = pm.Deterministic('p', pm.math.sigmoid(η))
p
= pm.Bernoulli('y_obs', p, observed=foul_called) y
= sample(salary_model, N_TUNE, N_SAMPLES, SEED) salary_trace
Autoassigning NUTS sampler...
Initializing NUTS using advi...
Average ELBO = 1.0244e+05: 15%█▌  30998/200000 [00:32<02:57, 952.52it/s]Median ELBO converged.
Finished [100%]: Average ELBO = 2,995.3
100%██████████ 4000/4000 [03:45<00:00, 17.76it/s]
The energy plot for this model looks a bit worse than that for the time remaining model.
energy_plot(salary_trace)
The salary model also appears to be a slight improvement over the time remaining model in terms of WAIC.
= waic_df.append(get_waic_df(salary_model, salary_trace, "Salary")) waic_df
waic_plot(waic_df)
The binned residuals continue to look good for this model.
= binned_residuals(foul_called, salary_trace['p'])
bin_obs, bin_p, bin_counts
binned_residual_plot(bin_obs, bin_p, bin_counts)
Based on the posterior distributions of \(\delta_{\theta}\) and \(\delta_b\), we expect to see a fairly strong relationship between a player’s (standardized log) salary and their latent skill parameters.
'δ_θ', 'δ_b'],
pm.plot_posterior(salary_trace, [=0., alpha=0.75); lw
The following plots confirm this relationship.
= to_param_df(team_player_map, salary_trace, ['θ', 'θ0', 'b', 'b0']) salary_df_
= plt.subplots(ncols=2, sharex=True, figsize=(16, 6))
fig, (θ_ax, b_ax)
= (salary_df.ix[np.arange(n_players)]
salary
.salary
.fillna(salary_df.salary.mean())
.values)
θ_ax.scatter(salary[salary_df_.player_id], salary_df_.θ_mean,=0.75);
alpha
;
θ_ax.xaxis.set_major_formatter(million_dollars_formatter)"Salary");
θ_ax.set_xlabel(
r"$\theta$");
θ_ax.set_ylabel(
b_ax.scatter(salary[salary_df_.player_id], salary_df_.b_mean,=0.75);
alpha
;
b_ax.xaxis.set_major_formatter(million_dollars_formatter)"Salary");
b_ax.set_xlabel(
r"$b$"); b_ax.set_ylabel(
It is important to note here that these relationships are descriptive, not causal. Our original intent was to use salary as a proxy for the difficulttoquantify notion of “star power.” These plots suggest that the probability of getting a foul called when disadvanted and not called when committing are both positively related to salary, and therefore (a bit more dubiously) star power.
Importantly, \(\theta_i\) and \(b_j\) should no longer be interpreted directly as measuring latent skill, which is presumably intrinsic to a player and not directly dependent on their salary. In fact, it seems at plausible that NBA scouts, front offices, players, and agents would be somewhat able to appreciate these latent skills, place a value on them, and thereby price them into contracts. It would be interesting future work to refine this model to give an econometric answer to this question.
Since we shouldn’t interpret \(\theta_i\) and \(b_j\) as measures of latent skill in this model, we will not plot their perplayer distributions.
Conclusion
We set out to explore the relationship between players involved in a play and the probability that a foul is called, along with other factors related to that probability. Through a series of progressively more complex Bayesian itemresponse models, we have seen that
 foul call probability does vary appreciably across both disadvantaged and committing player,
 there is more variation in the latent skill of the committing player to avoid a foul call than there is in the variation in the latent skill of the disadvantaged player to draw a foul call,
 the amount of time remaining in the game is strongly related to the probability of a foul call, and
 there is a positive relationship between player salary and the probability that a foul is called when they are disadvantaged and not called when they are committing. With a bit of a leap, we can say that the probability a foul is called is at least loosely related to the “star power” of the players involved.
In this post we have only scratched the surface of Bayesian itemresponse theory. For a more indepth treatment, consult Bayesian Item Response Modeling.
This is is available as a Jupyter notebook here.