The field of discrete choice modeling is concerned with modeling how people make choices between alternatives, typically as a function of the characteristics of the alternatives and the decision-maker. In doing so, we want to allow for various types of behavior we observe empirically. These are:

- People tend to make choices that suit their own preferences. If Jane likes honey-roasted turkey sandwiches for lunch, we should expect to see Jane eat honey-roasted turkey sandwiches more than other foods that she dislikes.
- Decision-makers vary in their preferences. Some people like honey-roasted turkey sandwiches, others do not.
- Decision-makers’ preferences themselves might be affected by the context in which decisions are made. Jane mightn’t value a BLT sandwich after watching a re-run of
*Babe*(a film about a cute pig that herds sheep). - When presented with precisely the same possible choices several times, many people alter their choice. Though you might work in the same office building every day and can visit many possible cafes for lunch, you probably “mix it up” a bit, rather than eating precisely the same thing every day.
- People make choices due to systematic effects that we can observe, for example their own demographics or the attributes of the possible choices. But they also make choices due to systematic effects that we do not observe. And they make choices for completely random (non-systematic) reasons.

Ideally our models of decision-makers’ choices should allow for these empirical behaviors. A very simple model of a decision being made by people indexed by \(i\) across choice-alternatives indexed by \(j\) combines a *score* that each individual assigns to each possible choice, and a *decision rule* that maps the score to the decision that will be made. It is common in discrete choice to call the score *utility* \(u_{ij}\), and use the decision rule “make the choice that provides the highest utility to the decision-maker”. Out job is to come up with functions for this score that describe what we see in the data, and jell with what we know about human behavior.

If each individual \(i\) gives utility \(u_{ij}\) to good \(j\) and this value is fixed, then they will make the same choice whenever presented with the same options. This is *not* what we observe; rather people will tend to choose the things they like, but mix it up a bit. And so we typically divide utility into two additively separable components: a fixed part \(\mu_{ij}\) and random part \(\epsilon_{ij}\).

\[ u_{ij} = \mu_{ij} + \epsilon_{ij} \]

The fixed part of utility \(\mu_{ij}\) is the same each time person \(i\) is presented with choice \(j\). The random part \(\epsilon_{ij}\) mightn’t be. If we combine this simple model with the decision rule “choose good \(j\) that provides you with highest utility” then we have a model that describes the sort of behavior that we observe (people tending to make choices that they value highly— ie. choices with a high value of \(\mu_{ij}\), but sometimes making different choices). To make the model tractable—and make statements about the probability of \(i\) making choice \(j\)—we need to propose a distribution for \(\epsilon_{ij}\). If there was no limit on computing power, we could propose any distribution for this random component. Yet practically we use two distributions: the normal (Gaussian) distribution, which gives rise to Probit models of choice, or the Gumbel distribution, which gives rise to the Logit models of choice we cover in this chapter. Of these two, Logit models are less computationally expensive to estimate, but make slightly stronger assumptions.

Let’s illustrate with an example. Say consumer 1 is evaluating two choices. Choice 1 has \(\mu_{11} = 1\) and choice 2 has \(\mu_{12} = 3\). The random component \(\epsilon_{ij}\) is normally distributed with a mean of 0 and standard deviation of 1. We illustrate the distributions by drawing random values for \(\epsilon_{ij}\). The marginal distributions of these draws are illustrated on the axes; the orange region illustrates the draws where \(u_{11} > u_{12}\).