The field of discrete choice modeling is concerned with modeling how people make choices between alternatives, typically as a function of the characteristics of the alternatives and the decision-maker. In doing so, we want to allow for various types of behavior we observe empirically. These are:

Ideally our models of decision-makers’ choices should allow for these empirical behaviors. A very simple model of a decision being made by people indexed by \(i\) across choice-alternatives indexed by \(j\) combines a score that each individual assigns to each possible choice, and a decision rule that maps the score to the decision that will be made. It is common in discrete choice to call the score utility \(u_{ij}\), and use the decision rule “make the choice that provides the highest utility to the decision-maker”. Out job is to come up with functions for this score that describe what we see in the data, and jell with what we know about human behavior.

If each individual \(i\) gives utility \(u_{ij}\) to good \(j\) and this value is fixed, then they will make the same choice whenever presented with the same options. This is not what we observe; rather people will tend to choose the things they like, but mix it up a bit. And so we typically divide utility into two additively separable components: a fixed part \(\mu_{ij}\) and random part \(\epsilon_{ij}\).

\[ u_{ij} = \mu_{ij} + \epsilon_{ij} \]

The fixed part of utility \(\mu_{ij}\) is the same each time person \(i\) is presented with choice \(j\). The random part \(\epsilon_{ij}\) mightn’t be. If we combine this simple model with the decision rule “choose good \(j\) that provides you with highest utility” then we have a model that describes the sort of behavior that we observe (people tending to make choices that they value highly— ie. choices with a high value of \(\mu_{ij}\), but sometimes making different choices). To make the model tractable—and make statements about the probability of \(i\) making choice \(j\)—we need to propose a distribution for \(\epsilon_{ij}\). If there was no limit on computing power, we could propose any distribution for this random component. Yet practically we use two distributions: the normal (Gaussian) distribution, which gives rise to Probit models of choice, or the Gumbel distribution, which gives rise to the Logit models of choice we cover in this chapter. Of these two, Logit models are less computationally expensive to estimate, but make slightly stronger assumptions.

Let’s illustrate with an example. Say consumer 1 is evaluating two choices. Choice 1 has \(\mu_{11} = 1\) and choice 2 has \(\mu_{12} = 3\). The random component \(\epsilon_{ij}\) is normally distributed with a mean of 0 and standard deviation of 1. We illustrate the distributions by drawing random values for \(\epsilon_{ij}\). The marginal distributions of these draws are illustrated on the axes; the orange region illustrates the draws where \(u_{11} > u_{12}\).