Coding Text Variables

From Market Research
Jump to: navigation, search

Coding refers to the process of creating a categorical variable

Text data

The following table shows the first ten responses to an open-ended question asking people why, in their opinion, people preferred different brands of colas.

People who drink Pepsi Max probably prefer sugar - free drinks
Big or large people drink Pepsi light and normal size people drink Pepsi Max
I prefer the flavour of Diet Coke to Pepsi and that is what I base my decision on
???????????????
Probably the diet or light colas appeal to those who want to reduce their sugar intake, which traditionally I suppose, is women. I found that once I stared to drink the light colas, I really disliked the taste of the other 'full strength' ones. If Coke or Pepsi are on special, I will buy the cheaper, because after the first drink, I adjust to the differences between the brands, and I like both.
Pepsi is full of sugar and Im a diabetic....
Pepsi gives the impression of being a "home brand" type drink and coke zero is used by people who want to impress
traditionalists versus rebels
health conscious
People who drink Coke Zero are health consitious

Developing a code frame

Reading through the text data we can identify some themes. For example, we can classify the responses as falling into one or more of the following responses. Such a list of categories created from text data is known as a code frame. The number of each category is typically called the code when doing coding but it is the same as the value of a variable.

Value Label
1 Sugar-free
2 Taste
3 Weight/Health
4 Price
5 Image
6 Other

The creation of a code frame is a highly subjective process. There's a good chance that if you read through the list of text data above you would have come up with a different code frame. For example, in the above code frame Sugar-free is distinct from Weight/Health, but an argument can be made that they are the same, with Sugar-free being a feature that ladders to Weight/Health. One approach to resolving this is to get a few different people to independently code the data and then resolve discrepancies by discussion, but such a rigorous approach is extremely rare outside of small academic studies.

Coding

The final stage of coding, which is itself called coding, involves assigning each response to one or more codes (one if it is to be interpreted as a single response question and more if it is interpreted as a multiple response question). Again, as with creation of the code frame this is a subjective process and greater accuracy can be obtained by getting multiple people to code the same data (but, such rigor is extremely rare in commercial studies).

People who drink Pepsi Max probably prefer sugar - free drinks 1
Big or large people drink Pepsi light and normal size people drink Pepsi Max 3
I prefer the flavour of Diet Coke to Pepsi and that is what I base my decision on 2
??????????????? Missing Value
Probably the diet or light colas appeal to those who want to reduce their sugar intake, which traditionally I suppose, is women. I found that once I stared to drink the light colas, I really disliked the taste of the other 'full strength' ones. If Coke or Pepsi are on special, I will buy the cheaper, because after the first drink, I adjust to the differences between the brands, and I like both. 1, 2, 4
Pepsi is full of sugar and Im a diabetic.... 3
Pepsi gives the impression of being a "home brand" type drink and coke zero is used by people who want to impress 5
traditionalists versus rebels 5
health conscious 3
People who drink Coke Zero are health consitious 3

Previous page

Recoding Variables

Next page

Creating New Variables