Putting it all together

Now that we’ve seen how to choose priors and construct a likelihood for a logit model, let’s fit it in Stan. We do this by first simulating some data from the model with known parameters, then make sure that we can recapture the parameters. We’ll set it up like one would a conjoint survey—that is, each product is fully described by the attribute vector \(X_{j}\), and so no unobserved taste shock \(\xi\) affects choices.

To illustrate how we might set up a problem where each decision-maker potentially faces a different number of choices, each decision-maker will face between 5-10 choices. As in a standard conjoint study, all choice attributes will be binary.

Simulating data

# Create softmax and Gumbel random number generator
set.seed(574)
softmax <- function(x) exp(x)/sum(exp(x))
rgumbel <- function(N, mu= 0, beta = 1) mu - beta * log(-log(runif(N))) 
# Number of decision-makers
I <- 1000
# Number of attributes
P <- 10
# Preferences
beta_true <- rnorm(10)
# Number of choices per decisionmaker (between 5-10)
decisionmakers <- data_frame(i = 1:I, 
                             choices = sample(5:10, I, replace = T))

# Function that takes x (which contains a number of choices)
# and returns some randomly conceived choice attributes as well
# as a binary indicating which row was chosen

make_choices <- function(x) {
    X <- matrix(sample(0:1, P*x$choices, replace = T), x$choices, P)
    X <- rbind(0, X) # Add outside good
    u <- as.numeric(X %*% beta_true)
    choice <- rmultinom(1, 1, softmax(u))
    data.frame(choice = choice,
               X = X)
}

# The fake data
decisionmakers_full <- decisionmakers  %>%
  group_by(i) %>% 
  do(make_choices(.)) %>% ungroup

# Get the indexes of the rows that correspond to each individual's
# first and last entries

indexes <- decisionmakers_full %>%
  mutate(row = 1:n()) %>% 
  group_by(i) %>% 
  summarise(start = first(row),
            end = last(row))

X <- as.matrix(decisionmakers_full %>% dplyr::select(X.1:X.10))

Now we have our data. Let’s do prior choice using the method in the previous section (code copied across). We’ll do it for just a single decisionmaker as the dimensions and scale of the choice attributes are fairly similar across individuals.

library(MCMCpack) # contains dirichlet density function

# Write out the negative log likelihood of a matrix of shares (each row a simplex)
dirichlet_n_log_lik <- function(alpha, share_matrix) {
  if(length(alpha) != ncol(share_matrix)) stop("alpha is not the same
                                               length as the number of choices")
  out <- NULL
  for(i in 1:nrow(share_matrix)) {
    out[i] <- log(ddirichlet(t(share_matrix[i,]), alpha))
  }
  -sum(out)
}

# Implement Dirichlet regression
dirichlet_reg <- function(share_matrix) {
  # This finds values of par to minimize the negative log likelihood function
  out <- optim(par = rep(1, ncol(share_matrix)), 
               fn = dirichlet_n_log_lik, 
               share_matrix = share_matrix,
               method = "L-BFGS-B", lower = 0.001)
  out
}

# Get individual 12's X
XX <- X[decisionmakers_full$i==12,]

N_rep <- 100
beta_scale_grid <- c(0.1, 0.2, 0.5, 0.75, 1, 1.2, 1.5, 2)
loss <- rep(NA, 8)

count <- 0
for(sig in beta_scale_grid) {
  count <- count + 1
  out <- matrix(NA, N_rep, nrow(XX))
  
  for(i in 1:N_rep) {
    beta <- rnorm(ncol(XX), 0, sig)
    out[i,] <- t(softmax(XX %*% beta))
  }
  
  reg <- dirichlet_reg(out)
  
  loss[count] <- sum((1 - reg$par)^2)
}



data_frame(`Prior scale` = beta_scale_grid,
           `Log of loss` = log(loss)) %>% 
  ggplot(aes(x = `Prior scale`, y = `Log of loss`)) +
  geom_line() +
  theme_hc() +
  labs(title = "Simulation-based prior scale choice")

It should not surprise us that it tells us to choose a similar prior for \(\beta\) as before, given the dimensionality of the attribute data is very similar.

Finally let’s write out the Stan model and fit it to make sure that our parameter estimates line up with the generative values. Here’s the Stan model:

// saved as basic_logit.stan
data {
  int N; // rows
  int I; // decisionmaker
  int P; // N attributes
  matrix[N, P] X;
  int start_index[I];
  int end_index[I];
  vector[N] choice;
}
parameters {
  vector[P] beta;
}
model {
  vector[N] probs;
  // priors 
  beta ~ normal(0, .75);
  
  // helper for likelihood
  for(i in 1:I) {
    probs[start_index[i]:end_index[i]] = softmax(X[start_index[i]:end_index[i]] * beta);
  }
  target += choice' *log(probs) + (1 - choice)' * log(1 - probs);
}

library(rstan)

compiled_model <- stan_model("basic_logit.stan")

data_list <- list(I = I, 
                  N = nrow(X), 
                  P = P, 
                  start_index = indexes$start,
                  end_index = indexes$end,
                  X = X,
                  choice = decisionmakers_full$choice
                  )

fitted_model <- sampling(compiled_model, data= data_list, 
                         cores = 4, iter = 1000)

Now let’s see how our estimates line up with the true generative values.

library(bayesplot)
dat <- as.data.frame(fitted_model, pars = "beta")
bayesplot::mcmc_intervals(dat) + 
  geom_point(aes(x= beta_true, y = names(dat)), colour = "red", size = 2)

Hey presto! We’ve fit the model and it’s returning the same values that we used to estimate it.

Putting it all together

Jim Savage

3/18/2019