Unit Author: Professor Nick J Fox

Learning objectives

Having successfully worked through this unit, you will be able to:

  • define the principal types of reliability and validity as applied to scientific research
  • describe how to increase the reliability and validity of a study or of particular measures
  • apply this understanding to your own research, to maximise validity and reliability
  • recognise the realist underpinning to these conceptions of reliability and validity.

Introduction

In the last unit we identified the different research designs that are typically used in social research, recognising that a design should match the kind of question that is being asked.

However, before we move on to discuss specific research methods in detail, we need to also recognise the need for high quality research. To achieve this, our research design needs not only to be appropriate, but also adequate, and this means making sure it is of a suitable quality to achieve its goal.

These issues have been applied in social research (in particular, within a realist epistemology – see unit 10) in terms of two concepts: validity and reliability. These two concepts are often regarded as the main criteria by which we can evaluate a piece of social research.


1. What do we mean by reliability and validity?

The validity of a study is the extent to which the findings are accurate.

The reliability of a study is the extent to which the findings are consistent or repeatable.

Both validity and reliability are important aspects of research: to lack reliability or validity is as good as saying a piece of research is un-trustworthy or has no worth.

Let’s try to get to grips with these two ideas.

Imagine that we are in a room filled with pieces of scientific equipment: some are mechanical devices that look like they were designed to measure physical properties of objects. There is also an array of electronic gadgetry with all sorts of computer screens and so on. There is a digital recorder and microphone. And in a cupboard there are piles of questionnaires and other tests used by social scientists to measure attitudes or personality.

You want to conduct an investigation of something or other – let’s take a simple example: the heights of men and women working in your office. Now most of us would have some sort of idea what kind of instrument we would use to generate data for this study. But imagine also that we don’t know anything about ‘height’ or how to measure it. So we wander round this room of research equipment, and randomly select some instruments. We discover a digital thermometer, and we decide that we will try to use this in our study.

What is the consequence of this decision? Well, without beating around the bush, we have obviously chosen an instrument that is going to be of little use in generating useable data. Or to put it more technically, we are not going to be able to generate valid data.

Our chosen instrument is not going to possess validity for measuring height – because its purpose is to measure body temperature. And because of this, our study will be invalid, no-one in their right senses will accept that we are measuring what we are claiming to measure, or that it can tell us anything about the relative heights of men and women working in your office.

Let us go back to the room and choose again. Unfortunately, however much we look, there is nothing as simple as a ruler there: nor anything else which could be used to measure heights: heart rate monitors – yes, GPS devices – yes, even some calipers for measuring cranial capacity, but no rulers! But in the cupboard we come across a questionnaire that has this question on it: “How tall are you?”

Perhaps we can use a questionnaire. We could ask your colleagues how tall they are, and from that we can gain the data we need.


SAQ 4.1 A height questionnaire

You decide that you will measure height using a questionnaire.

What do you think of this idea? Write down any thoughts you have about problems of using this way of measuring height. In particular, think about how you would ask the question.

How did you get on? Here are our thoughts. First, we could argue that ‘how tall are you?’ is a simple and straightforward question that people will easily understand. This claim is an evaluation of the validity of the question for measuring height.

You may also have commented upon some problems over the consistency of the data that you have generated by using this measuring tool. Unlike a ruler, which gives relatively consistent or reliable data, regardless of who is using it, asking a question about height could introduce all sorts of random errors into the measurements. Some people will have little idea of their height – and will guess. Other people may intentionally over-estimate their height, while others may under-estimate it, for various reasons. And perhaps it makes a difference who is doing the questioning: a male questioner might gain different results from a female: men might over-estimate their heights when talking to a female questioner!

The consistency of the data generated by an instrument is also known as its reliability. This is related to validity, inasmuch as an instrument that is unreliable cannot give valid results. This is because if we produce inconsistent data, then it cannot be accurate either. Also, if data is accurate (valid) then, by definition, it is also consistent (reliable).

(However, as we will see in the coming sections, a reliable instrument does not necessarily produce valid results.)


2. The relationship between validity and reliability

This introductory consideration of validity and reliability of a height questionnaire starts to unpack the two concepts. The relationship between validity and reliability can be represented diagramatically, as follows in Figure 4.1:

4-1
Fig 4.1 The relationship between validity and reliability

This diagram shows a two-by-two table of the measurements obtained in four different hypothetical studies. In each of the cells, the black circle represents the ‘true’ value of something or other. Let’s move on from height to a more interesting social research topic: the number of times in the course of a month that people eat their ‘5-a-day’ fruit and vegetable target. The crosses mark a series of measurements which we have undertaken using some kind of instrument (maybe a questionnaire, an interview or even an observation) which we have adopted for our study. Like on a dart board, the distance from the ‘bulls-eye’ indicates how successful the study has been at getting at the ‘truth’ of the event being studied.

In cell 1, we can see that the crosses are clustered around the circle – each data reading is consistent with previous readings, and the instrument is giving accurate readings. Let’s imagine that in this study, each research participant was observed in their daily life over a one-month period. The observers record the type and quantity of all fruit and vegetables consumed by the participants, using software which calculates when the target is met. This study has both reliability and validity: this observer method both records accurate (valid) data, and works consistently (reliably).

Looking at cell 2, here there is consistency (reliability) of measurements: all the crosses are clustered tightly, meaning there are no random errors in measurement due to the instrument or the operator. But the crosses are all highly inaccurate, they consistently give readings which are far too high. Let’s imagine this data had been collected by asking study participants to respond ‘Yes’ or ‘No’ to a daily SMS text prompt asking them if they had met their ‘5-a-day’ target. It is likely that participants would prefer to answer ‘Yes’ to this question, since it is the ‘desirable’ response.

This is a systematic error, which deviates from the true value, tending to give a reading that is too high (or low) by a certain amount. So in this situation, the instrument has reliability, but no validity.

In cell 3, the crosses are spread around the circle fairly evenly, but the spread is actually far too wide, some are too high, some are too low. Although we could take an average, individual readings diverge randomly from the true reading. If we were to use a single reading, we would have little idea what the true number of times the daily target was achieved. Let’s say that this was a result taken by a researcher checking the grocery receipts of research participants.

This is a very poor study because the tool used is unreliable: although the shopping receipts might give a general indication of how much fruit and vegetables are bought each week, we have no idea who is consuming it, on what days, and how much is thrown out uneaten. The calculations of how often the ‘5-a-day’ targets were met may be too high or too low. Because, the reliability of the data is low, neither can the study be considered to have adequate validity.

In cell 4, the crosses once again have diverged due to some random error in measurement. But furthermore, they are highly inaccurate and skewed. Even an average would not get close to the true value in the sample. This instrument is neither reliable nor valid – perhaps this data was gathered by asking research participants to complete a retrospective questionnaire – that is, recalling how many times over the past month they met their ‘5-a-day’ target. It is likely that they would all have to guess, and probably over-estimate a lot of the time. It is both an inaccurate and an inconsistent way of collecting data and a useless study!

These are the four situations that you might find yourself in when doing research. Obviously, only cell 1 provides trustworthy data, achieving both reliability and validity. In the next sections, we will consider how to ensure we end up in cell 1, with both a reliable and a valid study.


3. Reliability

To recap what we have learnt so far about reliability:

  1. it is also known as consistency
  2. it is affected by random error
  3. without reliability, a study will not possess validity.

There are four aspects to reliability, and we will consider each in turn. To illustrate each kind, let us think about a specific study with the research question: ‘What is the effect of television advertising on the purchase of a new brand of ‘breakfast biscuits’?’. A feasible design to answer such a question might involve a structured questionnaire, completed by a trained interviewer in the respondent’s home.

3.1 Internal reliability

Technically, this can be defined as assessing the responses of an instrument to equivalent stimuli. For this kind of reliability, what is needed is an instrument or measure that will consistently give the same results when testing a number of equivalent events. In our example, we will have internal reliability if the questionnaire provides consistent results across the board, when used to measure similar events.

In surveys, internal reliability is often considered in terms of split-half reliability of a questionnaire. Respondents might be asked early on ‘Have you recently bought any breakfast biscuits after seeing them on television adverts?’ In a later part of the questionnaire, they are asked ‘Do you think that your choice of breakfast biscuits was influenced by recent television adverts?’ Given these are actually the same question phrased differently, we would expect consistent answers, and if we get consistent responses then that indicates the internal reliability of the question. Using split-half reliability requires that each question in the first half of a questionnaire is repeated in a slightly different form in the second half. Internal reliability of a questionnaire is often reported using a statistic known as Cronbach’s alpha: a high value indicates consistency.

3.2 Test/re-test reliability

As the name implies, this kind of reliability is concerned with measuring something, and then measuring the same thing again at a later time: it can therefore also be described as repeatability. For instance, in the case of the study of TV advertising impact, we would need to get a reliable indication from each respondent of how many times they had seen an advert. We could gain a sense of a measure’s repeatability by asking respondents how many times they had seen the advert, and then checking back with them (using exactly the same question) a day later to see if they give the same response. If responses are very similar, we can be assured that the question is reliable.

3.3 Inter-observer reliability

Imagine two people are carrying out interviews in houses on the same street. If at the end of the day their questionnaire results diverge widely, then we have to ask if they are obtaining reliable data. In such circumstances, we may not possess inter-observer reliability.

Sometimes this kind of error is not the ‘fault’ of the observer, but is a consequence of who the observer is. Earlier we mentioned the situation of people asked about their heights, and the possible different responses to male and female interviewers. The gender, race and age of an interviewer may have a significant effect on how people reply: in designing a study this source of error may need to be controlled by ensuring different interviewers do not affect the kinds of responses.

3.4 Intra-observer reliability

Finally, consider the situation in which the researcher her/himself produces some kind of bias in reporting data. For example, the interviewer might believe that organic foods are healthier than processed breakfast biscuits. Consciously or unconsciously, this personal belief could lead to bias, such as under-recording the purchase of the advertised brand.

Such subjective bias is a problem in much research, because researchers may have particular commitments to proving or disproving particular hypotheses. It may be tempting to report measurements in a direction that will support the researcher’s desired outcome, or even to exclude cases that do not fit the hypothesis, for spurious reasons. One solution available in experimental or clinical research is the double blind study, in which neither the people collecting the data nor the participants knows if an active drug or a placebo has been given, ruling out the possibility of observer bias. However, this design is rare in social research: it would be hard to blind an interviewer to the purpose of the research, for example.

So, these four types of reliability are all concerned with repeatability and consistency. Not all will be applicable in all circumstances, but you need to consider all four: good internal reliability alone is no use without good inter-observer reliability too.

To confirm that you have understood these four types of reliability, now complete the following exercise.


SAQ 4.2

You attend a research seminar given by Dr. Wolff of the University of Cambridge, who has conducted a study of attitudes to road traffic, finding that 62% of people interviewed were concerned about the health aspects of traffic and wanted more public transport.What type of reliability is being claimed in the responses that Dr. Wolff gives to questions at the seminar?

[table id=24 /]


4. Validity

Reading texts on validity can be confusing, as they can refer to a range of the many different kinds of validity. We will look at all the various types of validity in this section, although not all will necessarily be relevant to your current research. At the end of this unit, you will be asked to work through the different types, to establish which ones you need to consider in your present research.

As we can see from Figure 4.2, we need to look at two kinds of validity: the validity of the overall study, and that of the instruments (indicators) that we use.

4-2
Figure 4.2 Types of validity

4.1. Study validity

Measures of study validity ask two questions:

  1. Does the study measure what it set out to measure (internal validity)?
  2. Can the findings be generalised (external validity)?

Study validity is thus related to research design. We will explore this kind of validity in relation to the question used as an example earlier: ‘What is the effect of television advertising on the purchase of a new brand of ‘breakfast biscuits’?’

4.1.1. Internal validity

Internal validity is the extent to which a study’s measurements truthfully reflect the variables being explored. In this research, the study has internal validity if the data that are collected (viewing of television advert, subsequent purchase decisions) accurately reflect the real incidence of advert viewing and consideration/purchase of the particular brand of breakfast biscuit.

Clearly the extent to which there is internal validity depends on whether appropriate and adequate indicators (questions) have been chosen. Think back to the start of the unit – the choice of an instrument to measure body temperature is obviously useless for measuring height, and any data collected will be meaningless. We will come back to this issue when we look at instrument validity later.

4.1.2. External validity

A study may measure what it sets out to measure, and thus have internal validity as far as the subjects or samples tested go. But the point of research is to find out what goes on, not in the laboratory or among the sample of population tested, but in the ‘real world’ beyond the research. So findings of research need to be generalisable. The extent to which findings can be generalised from a sample to the real world setting is a study’s external validity. This is dependent upon the study achieving a statistically representative study.

Once again consider the example of TV adverts and subsequent purchases. In this study, interviews identified the numbers of times respondents saw the breakfast cereal TV ads, and also whether they subsequently bought the product. But the point of the study is not that on one Friday in May 2014, in Manchester, someone saw a TV ad and then the next day went out to buy the product. Rather, the aim is to generalise, so that we may conclude that there is or there is not an association between these two variables. We want to ensure it is not a ‘fluke’ result, due to the specific circumstances of a few respondents who by chance were very susceptible to this TV ad.

However, to generalise in this way, we need to be assured that the research design is adequate: for instance, that our sample is representative of the population to which we want to generalise, and that the design has not introduced biases that will prevent generalisation of our findings.


SAQ 4.3 Internal and external validity

[table id=25 /]


4.2 Instrument validity

Instrument validity assesses how good instruments are for their intended purpose. Measures of instrument validity ask:

Do the instruments (e.g. interview schedule, questionnaire) used in a study accurately measure what they are supposed to measure?

We looked in unit 1 at some of the problems of choosing an appropriate indicator to measure a concept, and really the issue here is one of validity. Somehow or another, we need to be convinced that the indicator which we choose is adequate and appropriate to do the job of measuring what we wish to measure.

As we will see in a moment, the ability of an indicator to measure what it is intended to measure is a matter of judgment. As researchers, we need not only to convince ourselves of the validity of our indicators, but also those who will judge our research, which includes our colleagues, the public, editors and reviewers of journals or examiners! No indicator is absolutely valid, but some are more established than others, and we will look in turn at the three ways in which we can make claims concerning the validity of an instrument or indicator. Obviously, the more established and accepted an indicator, the easier it is to justify its use, and thus determine the validity of a study.

So, in addition to internal and external study validity, there are a number of kinds of validity associated with the instruments or indicators used in a study. We can validate a measure or indicator in one of four ways.

4.2.1 Face and content validity

The first and most straightforward route to establishing instrument validity is by the choice of an instrument that possesses face validity (a so-called ‘gold standard’). This, simply, is a validity achieved because ‘on the face of it’ the indicator measures what it claims to measure by definition. A good example is the thermometer, which has face validity for measuring temperature, because it has been established for so long. Similarly, it could be argued that a question which asks people ‘how did you just vote’ when they emerge from an election polling booth (an ‘exit poll’), has face validity for the concept of ‘voting behaviour’. On the face of it, this indicator is very closely associated with the concept it purports to measure (although, of course, some people may still lie or refuse to answer).

When a new instrument is developed, there is usually a period during which its ‘face’ validity may be debated by the social research community. If this is the case, then other forms of validity may be required (see below).

Related to face validity is content validity, which is particularly relevant to indexes (multiple indicators). Once again, this kind of validity is achieved by the consensus of the social research community. What is accepted is that a range of indicators between them, adequately reflect all the important aspects of a concept. For example, a recent index of social class developed by Savage et al. (2013) might be accepted as valid because it asks questions concerning income, education, leisure activities and occupations of friends and associates. There is general acceptance in sociological circles that each of these four aspects reflect elements of the concept or construct of ‘social class’.

Note that each of the elements must possess face validity in themselves: the content validity of the index is achieved because each is accepted as valid.

One final comment is worth making concerning face validity. As mentioned, this kind of validity is accepted because the indicator defines the concept: it is a ‘gold standard’ against which the concept is measured. Occasionally, the indicator actually plays a part in defining a construct. A classic example is ‘intelligence’ in psychology: a concept that is embodied by the tests that purport to measure it. As such, intelligence tests have face validity, but it is a rather pointless kind of validity because the construct of ‘intelligence’ is – by definition – that which is measured by the tests. In such circumstances, the validity of an instrument has to be sought elsewhere, in criterion or construct validity.

4.2.2 Criterion validity

Researchers may choose to seek validity for their instruments by external criteria, either because the social research community is unwilling to grant face validity, or in order to establish a construct more firmly. Two closely related kinds of criterion validity exist, concurrent validity and predictive validity. Both of these assess the adequacy of an indicator to measure what it claims to measure by reference to some other indicator which has face validity but which, for some reason or other, cannot be used.

Consider a study of people’s food consumption, which uses a self-recorded ‘food diary’ to measure what a person eats. The validity of this indicator could be challenged on the basis people may be untruthful or forget what they actually eat. So we could attempt to validate the food diary by also using direct observation of respondents via CCTV for a week. This is an example of concurrent criterion validity.

The other kind of criterion validity is predictive validity, in which the indicator under question is validated in relation to some future outcome. We mentioned intelligence tests earlier: these can be validated by their ability to predict future achievements in examinations. Similarly, we might take polling by political parties amongst a sample of electors concerning their attitudes on key issues (eg crime, economy, public services), as predictive of the outcome of a forthcoming election.

Once criterion validity has been accepted for an instrument, it can be adopted without further validation, though it still may not be accepted as having face validityuntil many other studies have also used it.

4.2.3 Construct validity

The final kind of instrument validity is known as construct validity, and researchers may have to have recourse to this form of validity if face or criterion validities are not accepted by the social research community.

You should recall from Unit 2 on concepts and indicators that a ‘construct’ is a term used for a high-level or abstract concept: such as ‘socioeconomic status’ (SES) – a conceptualisation of social class based upon occupation. To understand construct validation, let us imagine that we want to test a theory that looks at the likelihood of different people possessing a construct that we call the ‘Not In My Back Yard’ syndrome (or NIMBY), which can be defined as ‘the inclination of a person to feel negatively about new road developments in the vicinity of their house’.

Clearly, no face or criterion validity exists for NIMBY-testing instruments, as it is a new construct. In order to explore which kinds of people score highly as NIMBYs, we need to operationalize this construct. It may not be adequate to simply ask people how they feel about a new road development, as this may not discriminate NIMBYs from those who are just resistant to any change. We might establish instead a ‘NIMBY Index’ comprising various indicators.

The way we proceed to then validate this NIMBY-Index is to consider the kinds of other variables with which we would theoretically expect NIMBY-Index to correlate (convergent construct validity), and those, again at the level of theorising, we would expect NIMBY-Index not to correlate (divergent or discriminant construct validity).

For example, upon what kinds of variables might people who are high NIMBY scorers also score highly? Well, we might theorise that they would be environmentally conscious, or that they are concerned about road pollution. Or maybe that they are home owners who have large mortgages (and are thus concerned about the value of their property). These are all theoretical correlations, based on the ‘common-sense’ associations of NIMBY with other things we know about people. Consequently, convergent construct validity may be argued for the NIMBY-Index based upon these correlations, and we can argue that our questionnaire tests NIMBY and not something else.

Similarly, NIMBY is unlikely to be associated with what we will argue (from ‘common-sense’ or previous studies) are theoretically un-related constructs, such as gender, attitudes to vegetarianism or distance travelled to work. If our NIMBY-Index questionnaire does not show any correlation with these variables, we can argue we have discriminant construct validity, we are measuring NIMBY and not some other construct.

So, unlike the other kinds of instrument validity, construct validity depends on the theoretical framework in which a construct and its indicators reside. It will be necessary to argue at this level for all new constructs in social sciences.

One final comment on construct validity: while such arguments are needed for new constructs, once a construct and its indicators become very well established, the indicators are also accepted and ascribed face validity. Such theoretical arguments are an essential part of the social research process whereby paradigms become established.


SAQ 4.4

You are at another of Professor Wolff’s seminars. This time she is discussing her study of children who walk to school. What types of validity does Professor Wolff use when she replies to the following questions?

[table id=26 /]


5. An epistemological interlude

As you have read this unit, you may have been surprised to the extent that we casually talk about ideas such as ‘truth’, ‘accuracy’ and ‘trustworthiness’. If you are at all familiar with the kinds of debates in the social sciences concerning epistemology, then you are right to wonder if we have been too willing to use such concepts, which seem more appropriate when talking about chemistry or biological science.

In the first unit of this course, we considered issues of ontology (the nature of social ‘reality’), and epistemology (what we can know about this ‘reality’), and these are important issues for validity and reliability.

In a nutshell, social scientists have not all been willing to accept the kind of ‘positivist’ model of the world used by many natural scientists, in which empirical (based on observation) research inquiry is used to create data revealing the ‘laws’ and regularities that can explain processes and cause-and-effect associations. (In the social sciences, this perspective is also sometimes described as realism). These sceptical social scientists have questioned both that the view that there is a stable reality governed by underlying mechanisms (a matter of ontology), and that empirical observation can supply the means to know this reality (epistemology).

Constructionism has by contrast argued that the social world is mutable and unstable, mainly because it is produced by human concepts or constructs generated in language (for example, ‘race’ is not an objective description of human difference but a concept constructed by humans to make arbitrary and sometimes value-laden distinctions between peoples). For constructionist research, the object of study is not an underlying reality, but the multiple human constructions that produce it (hence the term ‘constructionism’).

Furthermore, constructionists argue that research is unable to provide a single trustworthy description of what is going on in the social world, because all research observation is context-dependent. So, for example, Kitzinger (1967) documented how historically psychology described lesbianism first as psychopathology, then deviance, and then positive life-style choice, as the interpretive frameworks surrounding psychological theory changed over time.

From this kind of perspective, what should we think about issues of reliability and validity? If the social world is mutable and continually in flux, then we should not expect findings to be repeatable or consistent (the principles behind reliability), nor should we aim to assess a study’s ‘accuracy’ in terms of how it reveals the single ‘truth’ of social reality (the basis for validity). Additionally, constructionism regards many research designs (including experiments, RCTs and surveys) as imbued with researchers’ and other contextual values that skew findings irretrievably . It follows that from a constructionist epistemology, reliability and validity have be re-thought not as absolutes to be achieved in the pursuit of the perfect project, but as concepts that problematise the very project of doing research (Lather, 1993).

These issues are clearly important for all who doubt the realist approach to social science research. We leave it to individual readers to assess their own positions vis-a-vis the analysis of validity and reliability set out in this unit, and you may wish to return to these matters in Unit 1.


Summary

In this unit, we have explored what a researcher needs to do to make claims about the accuracy and trustworthiness of her study. To do this requires both that a study has validity and that indicators are also valid and reliable. The key points to remember from this unit are:

  • Studies need to be reliable and valid.
  • Reliability can be summarized as the extent to which measurement of similar things provides constant results.
  • Validity can be summarized as the extent to which measurements of different things distinguishes between them.
  • A study that is not reliable cannot be valid.
  • A study that does not use valid instruments cannot be valid.
  • Valid instruments do not guarantee a valid study.
  • These conceptions of repeatability and truthfulness depend upon an underlying realist ontology and epistemology.

Reflective exercise 4.1

Write down a research question (Either your own, or one from a paper you have recently read.)For each of the following list of types of reliability and validity, write down what you need to do to establish them (those marked * will not necessarily be relevant to your study, but you should justify why they are not applicable).

Reliability

  • Internal/split half
  • Test/re-test*
  • Inter-observer*
  • Intra-observer

Study Validity

  • Internal
  • External

Instrument Validity

  • Face*
  • Content*
  • Criterion: concurrent*
  • Criterion: predictive*
  • Construct*

References

Lather, P. (1993) Fertile obsession: validity after poststructuralism. The Sociological Quarterly, 34 (4): 673-693.

Savage, M. et al (2013) A new model of social class? Findings from the BBC’s Great British Class Survey experiment. Sociology, 47: 219-250.

Further reading

Drost, E.A. (2011) Validity and reliability in social science research. Education Research and Perspectives, 38 (1): 105-123.

Oluwatayo, J.A. (2012) Validity and reliability issues in educational research. Journal of Educational and Social Research, 2 (2): 391-400.


Answers to SAQ 4.2

  1. Test/re-test
  2. Intra-observer
  3. Internal (Split half)
  4. Inter-observer

Answers to SAQ 4.3

  1. Respondents may not always be willing to reply honestly about their voting intentions, because of the way certain parties, controversial policies or their supporters are regarded, and may fear being judged by the interviewer for admitting to such an allegiance. This can be partially addressed by an interviewer stating at the beginning of the interview that they themselves have no party allegiance, and that the interview is anonymous and confidential. A non-face-to-face approach such as postal or online questionnaires might reduce this bias. Indirect questioning (for instance, to elicit respondents’ general political views) may also be a useful technique.
  2. Certain subjects or samples may be more liable to dropping out of a study than others. Avoid this by matching subjects/samples on a range of variables, and if one of a pair drops out/is lost, exclude the paired subject/sample.
  3. A ‘before’ test can sensitise respondents to a topic, even before an intervention. So in this case, just having answered questions about their diet may lead to respondents thinking about making healthy choices, which will be reflected in the ‘after’ test. Consequently the effect of the advert will not be separable from the effect of being asked the initial questions. It therefore will be hard to generalise from this study about the effects of the advert on the wider population: a threat to external validity. The solution is not to use a ‘before’ questionnaire.

Answers to SAQ 4.4

  1. Concurrent criterion: the measure is compared with an indicator possessing face validity.
  2. Predictive: the findings were supported by children’s subsequent results in the standard attainment tests conducted by the school.
  3. Face: the researcher used a monitor accepted as measuring blood pressure, under clinical conditions.
  4. Content: the elements included in the Savage et al index (see Unit 2) are accepted as covering the key dimensions of social class.
  5. Convergent construct: Professor Wolff argues that her assessment measures well-being because it correlates highly with other variables known to be associated with well-being.

[box type=”info”]This unit is part of our course on Social Research Methods. You must be registered and logged in to access course content. Back to courses Welcome page.[/box]