Unit Author: Professor Nick J Fox
Having successfully completed the work in this unit, you will be able to:
- distinguish between random and non-random methods of sample selection
- describe the advantages of random sample selection
- identify different methods of sample selection
- match different methods of sample selection to research design
- describe the factors influencing sample size
- describe how to estimate the appropriate sample size.
Sampling is a crucial issue in social research design.
When undertaking a quantitative approach, researchers are striving to be able to generalise their findings to the wider world (achieve external validity). It is therefore essential that both the sampling method used and the sample size are appropriate, in order that results are representative of the population under investigation. This is essential if inferential statistics (statistics that generalise findings to a population) are to be used to identify significant associations or differences.
In qualitative research, issues of sampling are different, although no less important. Here, the objective of research is more concerned with internal validity – with providing an accurate representation of reality – than with the external validity that will enable generalisation to other settings.
The different approaches to sampling in qualitative and quantitative research can be summarised as:
[table id=27 /]
In this unit, we will concentrate on the different sampling techniques that can be used in social research to achieve the desired validity objectives. (For more on validity, see Unit 4.) We will first examine random and non-random methods of sample selection, which are typically used in quantitative and qualitative research respectively. In the second part of this unit, we will discuss issues and techniques of sample size, including statistical significance and power.
Why do we need to select a sample anyway?
In some circumstances it is not necessary to select a sample. If the subjects of your study are so rare, for example former Presidents of the United States then you might want to study every case you can find! However, more generally you are likely to find yourself in a situation where the potential subjects of your study are more common and you cannot include everyone.
If you were to include everybody who is eligible in your study, for example every UK meat eaters, then this would be defined as a census. In many instances, a census is simply too large to handle. It would take too long and cost too much money. If a census is too large, it will be necessary to use a sample to carry out the research.
Before looking at different sampling techniques it is important to differentiate between two terms: population and sample.
A population refers to the whole collection of ‘units’ (i.e. people, schools, cities) that you wish to make a statement about, whereas the sample refers to the group of units selected from the population from which information will be collected. For example imagine you wish to carry out a study investigating the height of seven-year-old boys in the UK. The population would be all seven-year-old boys, whereas the sample would be a group of seven-year-old boys from geographical locations across the nation from which you could gather data.
There are a number of different sampling techniques, but overall they can be split into two main groups: probability and non-probability sampling, or as they are more commonly referred to: random and non-random sampling methods.
1. Probability or random sampling
Random sampling methods allow generalisations from sample to population, and permit the use of inferential statistics.
There are five different types of random sampling:
[table id=28 /]
(adapted from Henry, 1990: 27)
Before we look at examples of these sample designs in more detail, it is important to clarify the difference between random sampling and randomisation.
The term ‘random’ may imply to you that it is possible to take some sort of haphazard or ad hoc approach, for example stopping the first 20 people you meet in the street for inclusion in your study. This is not random in the true sense of the word. To be a ‘random’ sample, every individual in the population under study must have an equal probability of being selected.
For random sampling to take place, you must first have defined your potential subjects or population. The group from which you will extract your sample is also known as the ‘sampling frame’.
So, for instance, you may be interested in doing a study of men over 50 who are vegetarians, or of people living in a defined geographical region who migrated to the UK form overseas during childhood. Having defined your potential sampling frame, random sampling is a way of selecting a sample of people who will be representative of this population.
Randomisation, by contrast, is a way of ensuring control over confounding variables. So, for example, in a Randomised Controlled Trial (RCT), potential subjects are randomly allocated to either the intervention treatment group or the control group. By randomly allocating subjects to the groups, confounding variables (i.e. variables you haven’t thought of or controlled for) will be equally distributed between each of the groups and will be less likely to influence the outcome or dependent variables in either of the groups. This allows the researcher to have a greater confidence in identifying real associations between an independent variable (the cause) and a dependent variable (the effect or outcome measure).
Random sampling techniques can be split into simple and systematic random sampling.
1.1 Simple random sampling
If selections are made purely by chance this is known as simple random sampling. So, for instance, if we had a population containing 5,000 people, we could allocate every individual a different number. If we wanted to achieve a sample size of 200, we could achieve this by pulling 200 of the 5,000 numbers out of a hat. Another way of selecting the numbers would be to use a table of random numbers, which can be found in the appendices of most statistical textbooks.
Simple random sampling, although valid as a methodology, is a very laborious way of getting a representative sample. A simpler and quicker way is to use systematic random sampling.
1.2 Systematic random sampling
Systematic random sampling is a more commonly-employed method. After numbers are allocated to everybody in the population frame, the first individual is picked using a random number table and then subsequent subjects are selected using a fixed sampling interval, i.e. every nth person.
Assume, for example, that we wanted to carry out a survey of men in a certain town, to investigate their eating habits. As we cannot interview everyone, we will need to select a representative sample. If there are 8,000 men on the electoral register, and we require a sample of 400, we would need to:
- calculate the sampling interval by dividing 8,000 by 400 to give a sampling fraction of 20;
- select a random number between one and 20 using a set of random tables;
- if this number were 13, we select the individual allocated number 13 and then go on to select every 20th person after that (e.g. #33, #53 and so on);
- this will give us a total sample size of 400 as required.
Care needs to be taken when using a systematic random sampling method, in case there is some bias in the way that lists of individuals are compiled.
1.3 Stratified random sampling
Stratified sampling is a way of ensuring that particular strata or categories of individuals are represented in the sampling process.
For example, if we know that approximately 4% of our population frame was made up of a particular ethnic minority group then there may be a chance that with simple or systematic random sampling, we could end up with no ethnic minorities in our sample. If we wanted to ensure that our sample was representative of the population frame, then we would employ a stratified sampling method:
- First, we would split the population into the different strata, in this case, separating out those individuals with the relevant ethnic background.
- We would then use random sampling techniques, to sample each of the ethnic groups separately, using the same sampling interval in each group.
- This would ensure that the final sampling frame was representative of the minority group we wanted to include, in a pro-rata basis to the actual population.
If, however, we actually want to be able to compare the results of our minority group with the larger group, then we would have difficulty in doing so, using this form of proportionate stratified sampling, because the numbers achieved in the minority group, although pro-rata those of the population, would not be large enough to demonstrate statistical differences.
If we really want to be able to compare the survey results of the minority individuals with those of the larger group, then it is necessary to use a disproportionate sampling method.
With disproportionate sampling, the strata selected are not selected pro-rata for their size in the wider population.
For instance, if we are interested in comparing the views and behaviour of particular minority groups with other larger groups, then it is necessary to over sample the smaller categories in order to achieve statistical power, i.e. in order to be able to demonstrate statistically significant differences between groups.
Then, if during data analysis we wished to make inferences from the total sample to the total wider population, we will need to re-weight the sub-categories (the different ethnic groups) so they reflect the proportions in the population.
For example, if we wanted to compare the views and satisfaction levels of women who gave birth in a birth pool compared with those who gave birth ‘conventionally’ in a bed, a systematic random sample, although representative of all women giving birth would not produce a sufficient number of women giving birth in water to be able to compare the groups, unless the total sample was so big that it would take many years to collate. We would also end up interviewing more women than we needed who have given birth conventionally. In this case it would be necessary to over sample or over represent those women giving birth in water, using disproportionate stratified random sampling.
The important thing to note here is that random sampling is still taking place within each stratum or category.
1.4 Cluster sampling
Cluster sampling is a method frequently employed in national surveys where it is uneconomic to carry out interviews with individuals scattered across the country. Cluster sampling allows individuals to be selected in geographic batches. So, for instance, before selecting at random, the researcher may decide to focus on certain towns, electoral wards or neighbourhoods.
The following paper is an example of a piece of research that used a cluster sampling method:
Eaton, D.K. (2006) Youth risk behavior surveillance – United States, 2005. Journal of School Health, 77 (7) 353-372.
The aim of the study was to assess the kinds of risky behaviours manifested by school students, sampling across schools in the entire US. There sampling frame was all public and private secondary schools, and the aim was to achieve a nationally representative sample.
Stage 1: Geographical areas were classified into strata based on their degree of urbanisation and ethnic mix, and a representative sample of 57 out of 1261 selected.
Stage 2: From this sample, 203 schools were selected to provide a sample representative in terms of school size.
Stage 3: One or two classes were selected from each of the 203 schools and all students from these classes were eligible to take part in the study.
The authors argued that – via this process – the final sample of 13,953 students was representative of the entire population of school students.
1.5 Multi-stage sampling
Multi-stage sampling is similar to cluster sampling, but individuals within the selected cluster units are selected at random. Obviously, care must be taken to ensure that the cluster units selected are generally representative of the population and are not strongly biased in any way.
An example of a piece of research that used a multistage sampling method is:
Hughes et al (1997) Young people, alcohol and designer drinks: quantitative and qualitative study. BMJ, 314: 414-
The aim of the study was to examine the appeal of designer alcoholic drinks to young people aged between 12-17 years.
Stage 1: From a list of all postcode sectors in the Argyll and Clyde Health Board, rural parts of the area, islands and those with fewer than 500 households were excluded.
Stage 2: A random sample of 30 postcode sectors was chosen.
Stage 3: From each of the 30 postcode sectors, 40 people aged between 12-17 years were identified, using a random procedure which stratified for age and sex.
SAQ 5.1 Random sampling
What kind of sample selection is being used in each of the following studies?
[table id=29 /]
2. Non-random sampling
Non-random or non-probability sampling refers to sampling methods that do not adhere to the principles of probability sampling, i.e. that not everyone in the population (or a stratum) has an equal chance of being selected. Non-random sampling techniques are used in market research and commissioned studies such as political opinion polling, when specific groups are being targeted for attention.
Within social research, non-random sampling methods are often used in qualitative research, where the aim is not to quantify and generalise to a wider population, but to gain in-depth understanding, and give meaning to a social process. Because the techniques used in qualitative inquiry are often very labour-intensive (three hour interviews, for example), a large sample may be impossible to achieve. For both these reasons, the aim in sampling for qualitative studies is to maximise the range of responses provided, rather than attempting to obtain a representative sample.
There are three main techniques used to generate non-random sampling: quota sampling, convenience sampling and snowball sampling.
2.1 Quota sampling
Quota sampling is a technique for sampling whereby the researcher decides in advance on certain key characteristics that they will use to stratify the sample. Interviewers are often set sample quotas in terms of age and sex. So, for example, with a sample of 200 people, they may decide that 50% should be male and 50% should be female; and 40% should be aged over 40 years and 60% aged 39 years or less.
The difference from a stratified sample is that the respondents in a quota sample are not randomly selected within the strata. The respondents may be selected simply because they are accessible to the interviewer (for instance, every shopper aged between 20 and 50 in a particular shopping street is approached for interview until a quota is reached). Because random sampling is not employed, it is not possible to apply inferential statistics and generalise the findings to a wider population. Market research often uses quota sampling (e.g. sampling all people in an age group walking down a street), throwing doubt on the validity of such findings if generalised.
2.2 Convenience or opportunistic sampling
Selecting respondents purely because they are easily accessible is known as convenience sampling. Although quantitative researchers generally frown upon this technique, it is an acceptable approach when using a qualitative design, since generalisability is not a main aim of qualitative approaches.
In fact qualitative data are often collected using a convenience or opportunistic sampling approach, for instance where the researcher selects volunteers amongst his or her work colleagues. However, as this is rather a haphazard method, many qualitative researchers employ a purposive sample to identify specific groups of people who exhibit the characteristics of the social process or phenomenon under study. For example, a researcher might seek out for interview people who have recently been bereaved, or are students from working-class backgrounds, or who have recently experienced long-term unemployment.
Occasionally it is useful to deliberately include people who exhibit the required characteristics in the extreme. Close examination of extreme cases can sometimes be very illuminating, when trying to formulate a theory. However, caution must be adopted with this approach, to ensure these extreme cases are not used to generalise.
2.3 Snowball or network sampling
Snowball sampling gains it name from an analogy with a snowball that – rolling down a hill – picks up more snow and more momentum as it goes. Snowballing can be used where it is hard to find respondents for a study (for instance, people engaged in criminal or sub-cultural activities, or with rare or socially undesirable characteristics). For example, imagine you wish to explore the safe sex practices of sex workers working in a particular city. Once a sex worker had participated in the study, they could then approach their associates to see if they would also be interested in participating in the research. A snowballing sampling method may be particularly useful where a research topic is sensitive, or the target group are hard to reach.
2.4 Theoretical sampling
In addition to these three approaches, there is a further approach to non–random sampling that is used in qualitative social inquiry. In theoretical sampling, the sample is chosen on the basis of and emerging ‘theory’. It is often used in association with ‘grounded theory’ – an approach to qualitative data analysis that aims to generate new theoretical understanding (see Unit 9).
The idea in theoretical sampling is that the researcher selects the subjects, collates and analyses the data to produce an initial theory that is then used to guide further sampling and data collection, from which further theory is developed. For example, a study may wish to discover whether people in same-sex marriages experienced prejudice and stigma in the workplace. To answer this question, people in these forms of partnership – but working in different sectors (manufacturing, education, public service) – were interviewed. Sampling was guided by a speculative theory or hypothesis: that prejudice and stigma might vary according to occupational background. The findings from this initial round of data collection is used to refine the theory and to guide further data sampling.
Clearly this kind of approach is very different from selecting a random sample with the aim of generalisation or representativeness.
SAQ 5.2 Non-random sampling
[table id=30 /]
So far, this unit has introduced you to random and non-random methods of sampling. Key points to remember when deciding on sample selection are:
- In quantitative research, use a random method where possible. Random selection does not mean haphazard, but that everybody in your sampling frame has an equal opportunity of being included in your study.
- In qualitative research, non-random sampling may be more appropriate, and may be exploited to test hypotheses or develop theory, by selecting respondents with specific characteristics.
However, as well as correctly using the most appropriate sampling technique, it is also important to consider another aspect of sampling: sample size.
3. Calculating sample size (quantitative studies)
From common sense, we may conclude that the larger a sample is, the more likely it is to be representative of the population from which it is drawn. In any sample, random variations or unknown confounding variables may make it un-representative of the population. However, as sample size increases, the chance of this decreases, until of course eventually the sample comprises 100 per cent of the population!
So how large should a sample be?
At first glance, many pieces of research seem to choose a sample size merely on the basis of what ‘looks’ about right: or perhaps simply for reasons of convenience: ten seems a bit small, and hundred would be difficult to obtain, so 40 is a happy compromise! Unfortunately, a lot of published research uses precisely this kind of logic. Choosing the correct size of sample is not a matter of preference, it is a crucial element of the research process without which you may well be spending months trying to investigate a problem with a tool which is either completely useless, or too expensive in terms of time and other resources.
To identify the correct sample size for a study, we need first to think a little about the relationship between the conclusions of a study and the ‘reality’ it purports to describe, in terms of Type I and Type II errors.
3.1 Type I and Type II errors
Most (but not all) quantitative studies aim to test a hypothesis, and we looked at the logic of hypothesis testing in Unit 2. A hypothesis is a kind of ‘truth claim’ about some aspect of the world: the employment patterns of men and women, the gendering of care, climate change activism, or whatever. Research sets out to try to prove this truth claim right (or more properly, to disprove the null hypothesis – a truth claim phrased as a negative).
For example, let us think about the following hypothesis for a quantitative study:
‘Concern over climate change causes teenagers to adopt a vegetarian diet.’
And the related null hypothesis:
‘Concern over climate change does not cause teenagers to adopt a vegetarian diet.’
Let us imagine that we have undertaken a survey, exploring teenagers’ attitudes to climate change and their dietary choices. We may or may not find an association. However, there are four possible outcomes of the research.
- We find an association in our sample, and there really is such an association between climate change concerns and diet in the wider population of teens. This is a correct result.
- We find no association in our sample, and there really is no such association between climate change concerns and diet in the wider population. This is a correct result.
- We find an association in our sample, but in fact there is no such an association between climate change concerns and diet in the wider population. This is a Type I error or false positive.
- We find no association in our sample, but in fact there is such an association between climate change concerns and diet in the wider population. This is a Type II error or false negative.
Clearly we want one of the first two outcomes to be case. In either of the others, the research has given a false result, which could have negative consequences, either developing policy in the belief climate change is a factor (as opposed to reasons such as ethics or health concerns) (Type I error), or ignoring the effects of climate change on diet (Type II error).
3.2 Statistical significance and statistical power
Statistical significance assesses the likelihood of committing a Type I error. This is the error of finding an association that does not exist (rejecting a true null hypothesis).
It is measured by the alpha value of the test, which is also known as the quoted p level of significance of a test. The p value marks the probability of committing a Type I error; thus a p value of 0.05 indicates a five per cent (or one in 20) chance of committing a Type I error. A p value of 0.01 indicates a one per cent (or one in 100) chance.
Statistical power assesses the likelihood of committing a Type II error. This is the error of missing an association that does exist (accepting a false null hypothesis).
The likelihood of committing a Type II error is the beta value of a statistical test, and the value (1 – beta) is the statistical power of the test. Conventionally, power is set at 80% or 0.8, meaning that a study has an 80 per cent (4 out of 5) likelihood of detecting a difference or association that actually exists. If you have a power of only 0.5 or 50%, this means an evens chance of detecting an effect. In such a case, you may as well flip a coin as waste all the time of doing a piece of research!
All research should seek to avoid both Type I and Type II errors, which lead to incorrect inferences about the world beyond the study. In practice, there is a trade-off. Reducing the likelihood of committing a Type I error by increasing the level of significance at which one is willing to accept a positive finding reduces the statistical power of the test, increasing the possibility of a Type II error, and vice versa.
Fortunately the chances of gaining a statistically significant result and obtaining acceptable statistical power both increase as sample size increases.
3.3 How to maximise statistical significance and statistical power
When a researcher uses a statistical test of inference, what she is doing is testing her results against a gold standard. If the test gives a positive result (this is usually known as ‘achieving statistical significance’), then she can be relatively satisfied (e.g. 95 or 99 per cent of the time) that her results are ‘true’, and represent the circumstances in the wider population. Increasing sample size allows higher p values to be used, making the chance of a random error less likely.
However, if the test does not give significant results (non-significant or NS), then there are two possible reasons.
- There actually is no association or ‘effect’ of the independent variable on the dependent variable. Although this is a negative finding, this is still important to discover.
- There is an association or effect, but the study did not have sufficient power to discover it. This is a disastrous outcome, which means the research has been a waste of time.
What can be done to avoid this latter outcome?
The power of a study depends upon three things.
- The size of the association, known as the effect size (ES). Effects can vary greatly in size: large effects are those so obvious that no one would think to research them (for instance the difference in height between adults and children under seven). Medium effects, such as the effect of gender on adult height, are visible to the naked eye, although there is still significant variability. Many social research effects are small: these are clearly the hardest to detect, so reducing the power of a study.
- The level of statistical significance (alpha) chosen to detect an effect. The higher the level of significance chosen (e.g. p = 0.01 rather than 0.05), the harder it is to discover an effect, reducing the power of the study).
- The sample The larger the sample, the more representative of the underlying population it is. Increasing sample size as a proportion of a population increases power.
Effect size cannot be changed, and significance level cannot be lowered without the risk of of false positives. So sample size is the key to increasing power: increasing sample size increases the power of studies seeking to reveal small effect sizes.
This might imply that you should always aim for the biggest sample possible. But this can be wasteful of time and resources, and could even mean some research does not get done because the study is over-powered and therefore too costly to fund.
3.4 Selecting a sample size in quantitative research
Before starting research, we need to know:
- The level of statistical significance required (i.e. the importance of avoiding a Type I error)
- The effect size.
Knowing this, you can calculate sample size.
The following table shows sample sizes for some common statistical tests. (Degrees of freedom relate to numbers of groups in a study (e.g. a comparison of men and women = 2 groups). Usually degrees of freedom = groups – 1. You may be surprised how big some of these samples need to be.
Table 5.1 Necessary sample sizes for statistical tests where alpha (p) = 0.05
[table id=31 /]
Statistical power is important for you to consider when you plan a quantitative research study. If in doubt, you should get expert statistical advice before planning your research protocol.
- A Type I error is the error of falsely rejecting a true null hypothesis. The likelihood of committing a Type I error is known as alpha. The conventional level for alpha or p is usually 0.05 or 0.01.
- A Type II error is the error of failing to reject a false null hypothesis or wrongly accepting a false null hypothesis. The likelihood of committing a Type II error is known as beta. The conventional level of statistical power is set at 0.8 or 80%.
- There is a trade-off between committing a Type I error and a Type II error, but historically science has placed the emphasis on avoiding Type I errors.
- Increasing the sample size reduced both Type I and Type II error, but remember that it is costly and unethical to have too large a sample size.
- To calculate statistical power, you need to estimate the effect size.
SAQ 5.3: Sample Size
Which of the following statements are true?
[table id=32 /]
Reflective Exercise 5.1
[table id=33 /]
For details of sampling techniques
Emmel, N. (2013) Sampling and Choosing Cases in Qualitative Research: A Realist Approach. London; Sage.
Rea, L.M. and Parker, R.A. (2014) Designing and Conducting Survey Research: A Comprehensive Guide. Chichester: John Wiley.
For details of power calculations
Campbell MJ et al. (1995) Estimating sample sizes for binary, ordered, categorical and continuous outcomes in two group comparisons. BMJ, 311: 1145-8.
Dattaio, P. (2008) Determining Sample Size: Balancing Power, Precision, and Practicality. Oxford University Press.
Answers to SAQ 5.1
- Stratified random sample. The sample has been selected to ensure that two different strata (couples with and without children) are represented proportionately within the overall sample.
- Disproportionate stratified random sample. This sample is stratified to ensure that children in each category are selected; however the two groups are equal in size and thus not representative of the size of the sub-populations of those fostered and those awaiting fostering.
- Quota. The sample is not randomly selected but the respondents are selected to meet certain criteria.
- Systematic random sample.
- Cluster sample because the children are selected only from certain classes.
Answers to SAQ 5.2
- As a sampling technique it is not representative, because of a) possible interviewer bias in selecting the respondents (e.g. they may assume that a person is younger than required for the study and therefore not approach her/him); b) respondents may be atypical, as only those that are available and in the interviewer’s vicinity are typically selected, c) self-selected categories (e.g. social class, political beliefs) may not be accurate as respondents may not be truthful; and d) it may be very hard to achieve quotas for some categories.
- Snowball sampling may be needed if the study population is hard to reach or may be resistant to participating in research. Effectively it uses early respondents in a study to help recruit subsequent respondents to the study sample.
- Data collection and data analysis need to go on simultaneously, so that analysis of early data informs the next phase of data collection and so on. This requres some flexibility to the study design and timetable.
Answers to SAQ 5.3
- False. Internal validity is concerned with the appropriateness of the design, it cannot assess the validity of the data generated.
- True. External validity indicates the capacity of findings to be generalised to a population.
- False. In many circumstances, missing an effect is very serious (for example, a factor causing a disease or the side-effect of a new drug).
- False. Both are important to assure external validity of a study.
- False. Most clinical effects are small.
- False. Increasing the level of significance testing reduces the power of a study.
- False. For a correlation test with a small effect size, you need at least 618 in the sample. It required very large epidemiological samples to demonstrate the link between smoking and mortality.