Multiple Response Data Formats
Each category of the multiple response question is represented as a separate variable. Typically a 1 is used when an item has been selected and a 0 otherwise.
Each variable can contain multiple categories, one for each of the options in the questionnaire. For example, if the question has 20 brands, but nobody selected more than five brands, it is possible to store the data as five variables, with the first containing one of the responses, the second another, and so on. There is no widely used term for this way of storing multiple response data; it is sometimes referred to as max-multi data.
Pros and cons of the different formats
The binary format is the standard and, in most situations, the preferable format. This is because it:
- Makes data analysis easier (as it is easier to compute new variables).
- Makes it possible to have missing values. For example, in a tracking study if a brand is in the questionnaire in some waves but not other waves then this can be reflected using a missing value in waves when it is not available. By contrast, where max-multi coding is used it is impossible to determine from the data whether an option was not selected, or, was not available.
The max-multi format's advantages are:
- It uses up less memory on the computer (i.e., because it takes less variables). In the 1970s this meant that the max-multi was the standard format. Now, however, it tends to only be used sensibly in situations where there are questions with extremely large code frames (e.g., a list of 6,000 brands of car).
- It is easier to create if manually entering data (i.e., it is much easier for a data entry person to type 2, 5 to indicate that the second and fifth categories were selected than it is for them to enter lots of 1s and 0s).