Checking Representativeness

From Market Research
Jump to: navigation, search

Checking the representativeness of a study involves trying to work out if the data from the study is consistent with what is known about the population that the study seeks to understand.

Basic process for checking representativeness

  1. Collect secondary data on your population (e.g., age, gender, place-of-residence, brands purchased).
  2. Ensure that the questionnaire is written so that the survey data can be easily compared with the secondary data.
  3. Compare the secondary data with the results of the survey.
  4. Choose to either:
    • Weight the data.
    • Discard the data.
    • Use the data as it is.


The table below shows the proportion of people in different age bands in a survey of the phone market. A quick examination of the data reveals that something is wrong. For example, only 3% of the sample are aged 30 to 34, whereas approximately 10 times this number are aged 20 to 24. This suggests that the sample may not be representative of the population at large.

Age Survey
15 and under 0.1%
16-19 yrs 10.2%
20-24 yrs 32.0%
25-29 yrs 14.8%
30-34 yrs 3.5%
35-44 yrs 4.9%
45-54 yrs 27.7%
55-64 yrs 5.0%
65 and over 1.8%
Total 100%

In order to be confident that a sample is or is not consistent with what is known about a population we need some other data about the population. Ideally this will be from a source with known reliability, such as sales statistics or government studies of the market. However, it can also be obtained from other surveys where there was a good reason to believe in their representativeness. As an example, the following data shows the required age in the population for the survey.

Age Population # Population %
15 and under 10,670,000 3.3%
16-19 yrs 20,914,000 6.5%
20-24 yrs 25,619,000 8.0%
25-29 yrs 25,107,000 7.8%
30-34 yrs 24,605,000 7.7%
35-44 yrs 48,225,000 15.0%
45-54 yrs 47,261,000 14.7%
55-64 yrs 46,316,000 14.4%
65 and over 72,623,000 22.6%
Total 321,340,000 100%

Putting the sample results side-by-side with the known data for the market quickly reveals the scale of the problem with this particular study. Looking at the first row of numbers in the table below we can see that 0.1% of the sample were aged 15 and under, which compares to 3.3% in the population and thus the true proportion in the population is approximately 23.8 times that observed in the study (as all the numbers are rounded you will get a slightly different answer if you attempt to reproduce this calculation). Reading down the right-most column of the table we can see that there are no instances where the ratio of the percentages is at 1.0, which is approximately what is required for the study to be representative (small deviations are acceptable due to sampling error).

Age Survey (A) Population # Population % (B) (B)/(A)
15 and under 0.1% 10,670,000 3.3% 23.8
16-19 yrs 10.2% 20,914,000 6.5% 0.6
20-24 yrs 32.0% 25,619,000 8.0% 0.2
25-29 yrs 14.8% 25,107,000 7.8% 0.5
30-34 yrs 3.5% 24,605,000 7.7% 2.2
35-44 yrs 4.9% 48,225,000 15.0% 3.1
45-54 yrs 27.7% 47,261,000 14.7% 0.5
55-64 yrs 5.0% 46,316,000 14.4% 2.9
65 and over 1.8% 72,623,000 22.6% 12.4
Total 100% 321,340,000 100% 1.0

A similar process would then need to be performed for any other secondary data. Then, once all the comparisons have been done, a decision needs to be made, where the options are:

  • Conclude that the data from the sample is consistent with what is known about the population. In this example such a conclusion is clearly not warranted.
  • Weight the data, which involves adjusting how the data is analyzed so as to take into account the nature of differences between the sample and the population. This option only makes sense if any deviations between the survey's results and the population are considered to be 'sensible'. Typically, results as disparate as those shown above would indicate that the study was fundamentally flawed. However, there are scenarios when such discrepancies may be considered plausible. For example, it is commonly the case that women are more likely to respond to surveys than men and so if a survey exhibited an over-representation of women and all the discrepancies between the survey and secondary data could be explained by such a phenomena, then weighting would be a sensible remedy.
  • Discard the data. That is, conclude that the discrepancies are so great as to render the survey untrustworthy. The discrepancies shown in the example above would suggest that such a conclusion would be justified, as if the survey is so inaccurate in terms of reflecting the population's age then it is reasonable to assume it will be similarly inaccurate for other questions in the survey where there is no way of checking. In practice, it is extremely rare to ever discard a survey, as most people take the view that it is better to have a bad survey than no survey.

Previous page

Deleting Respondents

Next page