Latent Class Analysis
Latent class analysis is essentially an improved version of Cluster Analysis and it is used for the same types of things as is cluster analysis. In the main, in survey analysis, this involves findings segments. Latent class analysis improves on cluster analysis in two important ways:
- It is able to handle many different types of data (e.g., rankings, ratings, Choice Modeling).
- It automatically addresses Missing Values.
The output below has been created by Q (and is almost identical to that created by DataCracker). Seven variables from the Mobiles study have been used in the analysis: six are numeric variables measuring attitudes (see Cluster Analysis for a discussion of this data), and the seventh is a categorical variable measuring the average monthly bill. Q has automatically identified two segments. The first segment contains 60% of the market and they are people with relatively lower average bills whose attitudes reveal that they are relatively cost focused. The second segment predominantly contains people with higher monthly spends who are less concerned with price but are more likely to be surprised by their bill size.
Selection of the number of segments
A trade-off needs to be made when selecting the number of segments. The more segments one has, the greater the extent to which the analysis reflects the diversity observed in the data. This suggests that having a large number of segments is desirable. But, the more segments one has, the greater the risk that the diversity that is identified is meaningless diversity, just reflecting the properties of the specific data used in the analysis rather than the true diversity in the world at large. This suggests it is preferable to have fewer segments.
Various heuristics have been developed that attempt to find sweet spot in this trade-off. The most widely used is the Bayesian Information Criterion (BIC), which computes a value for each possible number of segments and recommends that the best segmentation is the one with the lowest BIC. It should be kept in mind that heuristics such as this are just rules of thumb, and it is not appropriate to assume that the number of segments identified by the BIC is in any sense optimal. While caution should be exercised if choosing a greater number of segments than suggested by the BIC, there is often good reason to select a smaller number of segments, as this is often more practical in terms of implementing the segmentation.
Although latent class analysis has been in-use for more than 60 years, it has only become popular in recent years and there are still a relatively small number of programs that contain are designed with the goal of using latent class analysis for general market research data. The main software options for conducting latent class analysis are:
- Latent Gold. This is the gold-standard for latent class software. The program and the documentation assume that the user is highly technical.
- Q. It's latent class routines are primarily for segmentation (whereas Latent Gold can be used for other applications of latent class analysis).
- DataCracker. This is an entry-level program, designed for people that want to push a button and not get involved in modifying technical assumptions.
- Using packages and macros developed for in SAS and R.