Weighting is used to adjust the results of a study to bring them more in line with what is known about a population. For example, if a sample contains 40% males and the population contains 49% males weighting can be used to correct the data to correct for this discrepancy.
Consider the following data showing 10 people’s favorite celebrity:
Brad Pitt Brad Pitt Brad Pitt Brad Pitt Brad Pitt Brad Pitt Brad Pitt Brad Pitt Tiger Woods Tiger Woods
In this sample of ten, 80% of people have nominated Brad Pitt as their favorite celebrity. When we conduct market research studies, the whole point is to draw conclusions about the population, rather than just our sample, so we should go one step further and, rather than say “80% of the sample nominate Brad Pitt as their favorite celebrity” we should instead be conducting research so that we can say “Brad Pitt is the most popular celebrity, with 80% of people nominating as their favorite (we should also communicate the precision, as discussed in Determining The Sample Size.
Now, consider the impact of some additional information about the gender of our ten respondents:
Brad Pitt Female Brad Pitt Female Brad Pitt Female Brad Pitt Female Brad Pitt Female Brad Pitt Female Brad Pitt Female Brad Pitt Female Tiger Woods Male Tiger Woods Male
From this data we can see that the sample is unrepresentative in terms of gender, with eight of ten respondents being female 80%) and two being male (20%), compared to the true representation in the world of about 50% men and 50% women. Furthermore, gender seems to be the sole determinant of preference. As the sample is not representative in terms of gender and gender is correlated with our measure of favorite celebrity, it follows that any estimate of the favorite celebrity will only be valid if we take into account the over-representation of women in the sample.
We can improve our estimate by weighting. A weight is computed for every respondent in a sample, and it is computed by dividing the correct proportion by the observed proportion. The correct proportion of males in our population is 50% and the observed proportion is 20%, so the weight for each male is 50%/20%=2.5 and the weight for each female is 50%/80%=0.625. Thus, our data becomes:
Favourite Celebrity Weight Brad Pitt 0.625 Brad Pitt 0.625 Brad Pitt 0.625 Brad Pitt 0.625 Brad Pitt 0.625 Brad Pitt 0.625 Brad Pitt 0.625 Brad Pitt 0.625 Tiger Woods 2.5 Tiger Woods 2.5
We now compute our estimate of the proportion of people to have Brad Pitt as their favorite celebrity by summing up the weights of each of the respondents to prefer Brad Pitt and dividing this by the sum of all of the respondents’ weights:
The approach described here for computing a weight is a relatively simple case, but the basic idea can be extended to deal with much more complicated cases (e.g., it is routine to simultaneously weight by geography, gender and age).
Weighting using standard analysis software
- Have them computed using data processing software. Most companies that collect data offer the creation of weights as a service and it is commonplace to have weights included in the data file.
- Compute them manually. That is, you need to:
- Compute the observed proportion of people that have selected each answer.
- Obtain the targets (i.e., the results believed to apply in the population; in the example above the targets were that 50% of the population was male and 50% was female.
- Compute the weights by dividing targets by the observed proportions (note that this is only one approach to weighting and more complicated approaches to weighting, such as rim weighting cannot be done this way.
When weighting does and does not work
Consider the situation where a survey was well managed but due to some anticipated quirk of how the data was collected it over-represent males. If gender is known to relate to other variables of interest in the survey then it follows both that:
- The data needs to be weighted or otherwise any analyses will be misleading (as they will be biased by the misrepresentation of gender).
- The weighting will fix the data.
However, what if the gender problem is one of many unknown problems in the data. Imagine that the survey is also biased towards low income people, towards people that like to do surveys and towards people that do not get out of doors much. Fixing the gender problem by weighting will not fix any of these other problems. Furthermore, in general we will not know that we have these other problems, as we will have no data to check for representativeness.
The distinction between these two cases is crucial. Where the data is observed to be unrepresentative, weighting will only fix the data if there is good reason to believe that this is the only way in which the data will be representative.