## Cluster sampling

(go to Outline)

In general, you must preserve random selection in order to select a sample which is representative of the entire population.

But what is required to do simple random sampling or systematic random sampling?

### Cluster sampling

... sampling in which sampling units (that is, households) at some point in the selection process are collections, or clusters, of population elements (or households)

In situations where simple random sampling or systematic random sampling is not possible, one of the most common methods of sampling is using cluster sampling. Therefore, one reason to select cluster sampling over simple or systematic random sampling is if there is no list of ALL households in the population, it is impossible to create such a list, and households are not arranged in any order. Such situations are very common in normal populations, both stable and emergency-affected.

Another way to think of cluster sampling is:

Cluster sampling is just a way to randomly choose smaller and smaller geographic areas until you get to a small enough area so that you can find or create a list of all households in order to do simple or systematic random sampling. For example, you may first choose districts from a list of all districts in the country. But at the district level, authorities don't have lists of all households and there are too many households in each district to create a list of households. As a result, within selected districts, you have to choose smaller geographic units, such as villages, which are small enough that local authorities already have a list of households or you can make a new list of all households.

Imagine that the blue egg below is a population (or sampling universe) from which you want to choose a random sample. The dots are households. The tiny dots are households NOT selected for the survey while the larger dots are households which were selected for the survey sample. If you had a list of households in this population and selected 30 using simple random sampling (for example, using a random number table), you might end up sampling the households represented by the larger dots. Note that these households are relatively evenly spread throughout the population.

Now imagine that the area with the population is 1500 kilometres across. Selecting 30 households very far apart from one another will require a long trip after each household to get to the next household. Can you think of a major advantage of cluster sampling in such a situation?

1

Which of the following are good reasons to do cluster sampling instead of simple or systematic random sampling? (you may select more than one answer)

 a) Because my collaborators want to. b) Because there is no list of households available and the households are not arranged in any order on the ground. c) Because I forgot my random number table at home. d) Because simple and systematic random sampling can never be done in emergency-affected populations. The correct answer is because you have to do cluster sampling. This is usually because there is no list of households available and the households are not arranged in any order on the ground, or because transport between households is very difficult. The other reasons are not valid reasons to do cluster sampling.

Ah, but alas, there are always disadvantages. Cluster sampling usually requires a greater sample size than simple or systematic random sampling to achieve the same precision (not to worry, we talk about precision in a few pages). Also, the calculation of this precision (usually in the form of confidence intervals) is more complex and can be done correctly by only a few computer programmes.

So, in sum, for cluster sampling (vs. simple random or systematic random sampling):