**Glossary**

**Action research **usually employs a qualitative approach. The essence of action research is that it is problem centred and problem solving. Action research often includes a number of stages where the results of the research are fed back to the participants to improve practice.

**Anonymity **is the protection of the identity of research subjects such that even the researcher cannot identify the respondent to a questionnaire. Questionnaires in an anonymous survey do not have an identification number and cannot be linked back to an individual. Anonymity should not be confused with **confidentiality **where individuals can be identified by the researcher.

**ANOVA (Analysis of Variance) **is a test of statistical significance for assessing the difference between two or more sample means.

**Association **between two variables represents some sort of relationship. An association can be a causal one or it might be spurious. Associations can be positive or negative.

**Bar chart **is a graphical display of discrete data, either nominal or ordinal data. Usually frequency is placed on the Y-axis and the categories run along the X-axis. Bar charts can be stacked or un-stacked.

**Bias **is a deviation of the results from the truth. This can either be due to random error or, more likely, due to systematic error. The latter could be caused by, for example, sampling or poor questionnaire design.

**Blind study **is one in which subjects in an experiment do not know if they have received an intervention or a placebo (control). Blind studies are commonly undertaken in controlled trials, to avoid the possibility of a placebo effect if a subject knows they are getting an intervention. They are rare in social research. **Case **is a unit of analysis. Usually this takes the form of an individual subject but it could be a different unit of analysis altogether such as a household or an entire city.

**Categorical data **see nominal data

**Chi-square (X2) test **is a non-parametric test of statistical significance. It is usually applied to cross-tabulated nominal data. It is used as a measure of association between two nominal variables.

**Closed question **is one where the possible answers have been defined in advance and so the respondents’ answers will be restricted to pre-coded responses offered. A pilot study should be carried out to decide on the correct pre-codes.

**Cluster analysis **is a statistical technique for identifying groupings of observations or respondents with strongly correlated characteristics.

**Coding **is the process by which responses to questionnaires or other data is assigned a numerical value or code in order that the data can be transferred to a computer for data analysis. See also ‘pre-codes’, ‘closed questions’, ‘open-ended questions’ and ‘re- coding’.

**Cohort design **is a longitudinal design where the same individuals are interviewed or observed repeatedly over time. Respondents usually share a common characteristic.

**Concept **is an abstract idea or mental construct representing some event or object in reality.

**Conceptual baggage** is a notion in qualitative analysis, particularly **grounded **theory, referring to a researcher’s previously held knowledge and understanding, preconceptions and biases about a particular research topic. Grounded theory suggests that, if themes and categories are to emerge from the data, then researchers must identify their own conceptual baggage to avoid imposing analytical categories from a pre-figured frame of reference or theory.

**Confidentiality **is the protection of the identity of research subjects so that identities cannot be revealed in the research findings and the only person who can link a respondent’s completed questionnaire to a name and address is the researcher. A questionnaire with just a coded identification number is confidential. This should not be confused with **anonymity**, where not even the researcher can identify the subjects.

**Confounding variable **is one that systematically varies with the independent variable and also has causal effect on the dependent variable. The influence of a confounding variable may be difficult to identify, since it is sometimes difficult to separate out the independent variable from any confounding variables in real life. Confounding is from the Latin for ‘to mix together’.

**Construct validity **is the extent to which the measurement corresponds to the theoretical concepts (constructs) concerning the object of the study. There are two kinds of construct validity: convergent and divergent.

**Constructionism **is a research** epistemology **based on the premise that the events that people experience in their lives are not direct indicators of some underlying social reality, but are a result of individual and collective interpretations. Because each interpretation has equal value, none of these representations (including that of the researcher) can be claimed to be ‘the truth’. (See also** realism**.)

**Content analysis **is the systematic examination of text or conversational transcripts to identify and group common themes, and to develop categories for analysis.

**Content validity **is a set of operations or measures which together operationalise all aspects of a concept.

**Contingency table **is a table of frequency counts (and percentages) of two variables, with one variable on each axis. The row variable is displayed on the horizontal axis and the column variable is displayed on the vertical axis. Where each category from each variable meets, the frequency is shown in a box or cell.

**Control group **is the group in an experiment which is not exposed to the intervention or independent variable. The control group exists to provide a baseline comparison for the intervention group so as to measure the influence of the independent variable.

**Correlation **is the degree to which two variables change together. A correlation may be linear or curvilinear. They may also be positive or negative (also known as inverse correlations).

**Criterion validity **is the extent to which measurement correlates with an external indicator of the phenomenon. There are two types of criterion validity concurrent and predictive: i) concurrent validity is a comparison against another external measurement at the same point in time ii) predictive validity is the extent to which the measurement can act as a predictor of the criterion.

**Cross-sectional design **is analogous to a snap-shot. A cross-sectional design is one which focuses on a single fixed period in time, and can provide a description of respondents that differ on a number of variables.

**Deduction **is the process of inference from the General to the Particular. Deductive reasoning begins with a general theory and generates hypotheses for testing.

**Delphi technique **is a method for obtaining expert or consensus opinion on a particular topic, by using multiple ‘rounds’ or waves of questions whereby the results from the previous rounds are continually fed back to the same respondents to bring about a group consensus.

**Dependent variable **is also known as the outcome variable. The value of a dependent variable is dependent on other independent variables and its value will change as the independent variable or intervention changes. Statistical techniques can be used to predict the value of the dependent variable.

**Descriptive design **is one that seeks to describe the distribution of variables for a particular topic. Descriptive studies can be quantitative, for instance, a survey, but they do not involve the use of a deliberate intervention. However, it is possible to carry out correlational analysis of the existing variables in a descriptive study.

**Descriptive statistics **are used to describe and summarise variables within a data set including describing relationships between variables. They do not seek to generalise the findings from the sample to the wider population, unlike inferential statistics.

**Emergent understanding **is a concept in qualitative analysis, whereby as the researcher analyses the data, codes, categories and themes are developed. Some kind of understanding of ‘what the data means’ will begin to emerge as the data is progressively categorised.

**Empiricism **is the view that gathering of data and observations is sufficient to explain the social world.

**Epistemology **is a key concept underpinning the philosophy of science and the practice of social research. Epistemology is the study of how social entities can be known by researchers. (See also** ontology** and **methodology**.)

**Error **can be due to two sources: random error and systematic error. Random error is due to chance, whilst systematic error is due to an identifiable source such as sampling bias or response bias.

**Ethnography **is a qualitative research approach and is used to study other cultures. The ethnographic approach was first developed by anthropologists. The term ‘ethnography’ comes from the Greek and means ‘writing culture’.

**Ethnomethodology **is an arm of sociology first developed in the 1940s. It is based on a critique of how the social world is constructed. In ethnomethodology, nothing is taken for granted and the *minutiae *of daily life is broken down and examined in detail.

**Experimental design **is one in which there is direct control over the use of an intervention. In a classic experimental design, the subjects are randomised into intervention and control group and the dependent variable is assessed before and after intervention.

**External validity **relates to the extent to which the findings from a study can be generalised (from the sample) to a wider population (and be claimed to be representative).

**Extraneous variable **is a variable other than the independent variable that may have some influence on the dependent variable and may be a potential confounding variable if it is not controlled for.

**Face validity **is the extent that the measure or instrument being used appears to measure what it is supposed to. For example, a thermometer might be said to possess face validity.

**Focus groups **are a method of collecting qualitative data from a group of people. It takes the form of a group discussion, ideally with 6-8 respondents. The group discussion is directed by a moderator.

**Frequency distribution **is the number of cases in each category or value for a single variable.

**Grounded theory **is a technique for analysing qualitative data and generating concepts and theories, inductively, using a constant comparative method. This approach was developed by Glaser and Strauss in 1967.

**Hawthorne Effect **is the changes that occur in a subjects’ behaviour or attitude as a result of being included in the study and being placed under observation. The term derives from industrial psychological studies that were carried out at the Hawthorne plant of the Western Electric Corporation in Illinois in the 1920s and were reported by Mayo. He found that whatever experimental environmental conditions were tried out on the workers, productivity always went up. He realised that it was the effect of actually being under study that resulted in a change of behaviour and so increased productivity.

**Histogram **is a graphical display of interval data. It is similar to a bar chart in layout, with each line or bar proportionately representing each value. The values however can be continuous rather than discrete. See ‘bar chart’.

**Hypothesis **is a statement about the relationship between the dependent and the independent variables to be studied. Traditionally the null hypothesis is assumed to be correct, until research demonstrates that the null hypothesis is incorrect. See ‘null hypothesis’.

**Incidence **can be defined as the number of new occurrences of a phenomenon (for example, arrests for drug use) in a defined population in a specified period. An incidence rate would be the rate at which new cases of the phenomena occur in a given population. See ‘prevalence’.

**Independent variable **is one that ‘causes’ or precedes the dependent variable. The independent variable takes the form of the intervention or treatment in an experiment and is manipulated to demonstrate change in the dependent variable.

**Indexing **is a process of collating indicators to create a single index of a particular phenomenon such as mental health, quality of life, daily functioning, etc.

**Indicator **is a measure: the operationalised form of a concept. In research, concepts need to be tightly defined so that they can be measured.

**Induction **is the process of inference from the Particular to the General. Inductive reasoning begins with empirical observations that form the basis of theory building.

**Inferential statistics **are statistics that are used to make generalisations from a sample to a population.

**Instrument validity **is the extent to which an instrument or indicator measures what it purports to measure.

**Internal validity **is the extent to which a study measures what it claims to measure, and depends on both the design and the instruments used.

**Interval data **is measured on an interval scale where the distance between each value is equal and the distance between values is the same anywhere on the scale. Interval level data does not possess a true zero, unlike ratio data.

**Intervening variable **occurs in the causal pathway between the independent variable and the dependent variable. It is statistically associated with both the independent and the dependent variable.

**Intervention **is the independent variable in an experimental or quasi-experimental design. Those subjects selected to receive the intervention are placed in the ‘intervention’ group.

**Longitudinal **study is one in which groups of people are interviewed repeatedly over a period of time. Where the same group of people are followed up over time this is known as a cohort study. However, if a group of different people are interviewed in each wave of a survey this is known as a trend design.

**Matching **is a technique used in experimental design to control for extraneous variables. Subjects in the control group may be matched to subjects in the intervention on certain factors such as age or education. Matching can be in pairs of individuals or in groups.

**Mean **is a measure of central tendency. It is calculated by summing all the individual values and dividing this figure by the total number of individual cases to produce a mean average. It is a descriptive statistic that can only be applied to interval data.

**Median **is a measure of central tendency. It is the mid-point or middle value where all the values are placed in order. It is less susceptible to distortion by extreme values than the mean, and is a suitable descriptive statistic for both ordinal and interval data.

**Methodology **describes the practical research designs that researchers use to gain knowledge of the social or natural world, based within a specific **epistemology**.

**Mode **is a measure of central tendency. It is the most frequently occurring or most common value in a set of observations. It can be used for any measurement level but is most suited for describing nominal or categorical data.

**Multivariate analysis **is a statistical technique to describe and explain the simultaneous variation of several variables; in particular, the effect of one or more independent variables on two or more dependent variables.

**Naturalism **is an approach to qualitative research which sets itself against the **positivist **approach of much quantitative research. It acknowledges that the world, especially the social world, is constantly changing, and accepts that observer bias is a fact of life. It offers an alternative to positivist notions of validity and reliability.

**Nominal **data, also known as categorical data, is a set of unordered categories. Each category is represented a different numerical code but the codes or numbers are allocated on an arbitrary basis and have no numerical meaning. See also ‘ordinal’ and ‘interval data’.

**Non-parametric statistics**, unlike parametric statistics, do not make any assumptions about the underlying distribution of data. Non-parametric statistics are therefore suitable for skewed data and nominal and ordinal levels of measurement.

**Null hypothesis **is the alternative hypothesis. It usually assumes that there is no relationship between the dependent and independent variables. The null hypothesis is assumed to be correct, until research demonstrates that it is incorrect. This process is known as falsification.

**Ontology **is the study of the nature of being, and deals with questions concerning what entities exist or can be said to exist. (See also **epistemology** and **methodology**.)

**Open-ended question **is one that allows the respondent the freedom to give their own answer to a question, rather than forcing them to select one from a limited choice.

Open-ended questions are commonly used in in-depth interviews, but they can also be used in quantitative structured interviews as well.

**Ordinal data **is composed of a set of categories that can be placed in an order. Each category is represented by a numeric code, which in turn represents the same order as the data. However, the numbers do not represent the distance between each category. For instance, a variable describing social class may be coded as follows: professional = 5; managerial = 4; skilled non-manual = 3; skilled manual = 2; unskilled = 1. The code 4 cannot be interpreted as being twice that of code 2.

**Panel study **is another term for a longitudinal or cohort study, where individuals are interviewed repeatedly over a period of time.

**Paradigm **is a set of beliefs, values and assumptions shared by a given community, sometimes known as a ‘world view’ or ‘super theory’. Kuhn (1922-1996) developed the idea of paradigm shifts.

**Parametric statistics **are based on the assumption that the data follows a normal distribution, i.e. the data when plotted follows a bell-shaped curve. Examples of parametric statistics are t-tests and analysis of variance (ANOVA).

**Participant observation **is a methodology that is based on direct observation of a setting, with the researcher as a participant in this process, immersed in the setting rather than being separate from what is going on. (See also** ethnography**.)

**Pie chart **is a circular diagram (like a pie) in which each category is represented proportionately by a segment of the pie. A pie chart is useful for presenting nominal data.

**Positivism **is an approach to research that is concerned with proving the truth or falsity of the statement. It does this using observation or experimental design to prove or disprove the truth. Logical positivism is the fundamental concept underlying modern science.

**Population **is a term used in research, which refers to __all __the potential subjects or units of interest who share the same characteristics that would make them eligible for entry into a study. The population of potential subjects is also known as the sampling frame.

**Power **Statistical power is a measure of the extent to which a study is capable of discerning differences or associations that exist within the population under investigation, and is of critical importance whenever a hypothesis is tested by statistics. Conventionally studies should reach a power level of 0.8, such that 4 times out of 5, a null hypothesis will be rejected by a study. Statistical power may be most easily increased by increasing the sample size.

**Prevalence **is the number of events in a population at any given point in timeSee Incidence.

**Prospective study **is one that is planned from the beginning and takes a forward looking approach. Subjects are followed over time and interventions can be introduced as appropriate.

**Qualitative **research deals with the human experience and is based on analysis of words rather than numbers. Qualitative research methods seek to explore rich information, often collected from a fairly small sample, and includes methods such as in-depth interviews, focus groups, action research and ethnographic studies.

**Quantitative **research is concerned with numerical measurement and numerical data. Quantitative research tends to be based on larger sample sizes in order to produce results that can be generalised to a wider population.

**Quasi-experimental design **are ‘natural experiments’ in which there are intervention and control groups, but the researcher has no control over who receives the intervention and who does not. In these studies, subjects may be matched, as it is not possible to randomise them.

**Questionnaire **is a set of questions used to collect data. Questionnaires can be administered face to face by an interviewer, over the telephone or self-completion (online or postal). Questionnaires can include closed an open-ended questions.

**Quota sample **is a form of non-random sampling and one that is commonly used in market research. The sample is designed to meet certain quotas, set usually to obtain certain numbers by age, sex and social class. The sample selected within each quota is selected by convenience, rather than random methods.

**Randomisation **is the random assignment of subjects to intervention and control groups. Randomisation is a way of ensuring that chance dictates who receives which treatment. In this way all extraneous variables should be controlled for. Random allocation does not mean haphazard allocation.

**Random error **is non-systematic bias that can negate the influence of the independent variable. Reliability is affected by random error.

**Randomised control trial (RCT) **is an experimental design in which subjects are randomly allocated to either the intervention or the control group.

**Ratio level data **is similar to interval data in that there is an equal distance between each value but there is a true zero on the scale. An example of ratio data would be age.

**Realism **The premise of realism is that social researchers can aspire to know and understand an underlying social reality that exists independently of human concepts. (See also **Constructionism**.)

**Re-coding **is the process of altering the codes assigned to a particular variable, usually by aggregating categories. For instance, continuous interval data such as age may be re-coded into age bands, thus making it ordinal data. Re-coding allows data to be analysed and compared in different ways than in its original state.

**Regression **analysis is a statistical procedure for studying the relationship between one dependent variable and a number of independent variables. It is a form of multivariate analysis used to show the correlation between interval data.

**Reliability **is concerned with the extent to which a measure gives consistent results. It is also a pre-condition for validity.

**Representativeness **is the extent to which a sample of subjects is representative of the wider population. If a sample is not representative, then the findings may not be generalisable.

**Response rate **is the proportion of people who have participated in a study or completed a question. It is calculated by dividing the total number of people who have participated by those who were approached or asked to participate.

**Retrospective design **is one that looks backwards over time, often using data already collected by others. It usually takes the form of correlational research identifying relationships between independent and dependent variables.

**Sample **is a group or subset of the chosen population. A sample can be selected by random or non-random methods. Findings from a representative sample can be generalised to the wider population.

**Sampling frame **is the pool of potential subjects that share similar criteria for entry into a study. The sampling frame is also known as the ‘population’.

**Scatter plot **or **scattergram **is a graphic visualisation of data, showing the distribution of two variables in relation to each other. One variable is plotted along the horizontal axis and the other along the vertical axis.

**Semi-structured interviews **fall between structured and unstructured interviews: questions are specified, but the interviewer can explore in detail particular topics that emerge during the interview.

**Significance level **is usually stated as the ‘p’ value. A significance level is commonly set at either 0.01 (a one in a hundred change of being incorrect) or 0.05 (a one in twenty chance of being incorrect).

**Significance tests **are used to assess the extent to which a finding could have occurred due to chance.

**Snowballing **is a non-probability method of sampling commonly employed in qualitative research. Recruited subjects nominate other potential subjects for inclusion in the study.

**Spurious correlation **is an apparent correlation between two variables when there is no causal link between them. Spurious relationships are often accounted for a third confounding variable. Once this third variable is controlled, the correlation between the two variables disappears.

**SPSS **(Statistical Package for the Social Sciences) is a popular and easy- to-use software package for data analysis.

**Standard deviation **is a summary measure of dispersion. It is a summary of how closely clustered or dispersed the values are around the mean. For data that is normally distributed, 68% of all cases lie within one standard derivation either side of the mean and 95% of all cases are within two standard deviations either side of the mean.

**Stratified sample **is one where the sample is divided up in to a number of different strata based on certain criteria such as age or sex or ethnic group. The sample selection within each stratum is however based on a random or probability method. A stratified sample is a way of ensuring that the sample is representative rather than leaving it to chance.

**Structured interviews **contain closed questions with pre-coded answers, in order to produce quantitative data.

**Survey **is a method of collecting large scale quantitative data but does not use an experimental design. With a survey there is no control over who receives the intervention or when. Instead a survey design can examine the real world and describe existing relationships. A survey can be either simply descriptive or correlation’s.

**Thematic analysis **is a method used to make sense of qualitative data. The researcher immerses themselves in the data, generating themes, coding the data, testing and interrogating their understanding.

**Theoretical sampling **is a sampling method used in qualitative research, whereby the sample is selected on the basis of the theory and the needs of the emerging theory. It does not seek to be representative.

**Type I error **is the error of falsely rejecting a true null hypothesis and thereby accepting that there is a statistical difference when one does not exist. The chance of committing a Type I error is known as alpha and is expressed as a ‘p’ value.

**Type II error **is the error of failing to reject a false null hypothesis or wrongly accepting a false null hypothesis. The likelihood of committing a Type II error is known as beta. The conventional level of statistical power (1- beta) and is usually set at 80% or 0.8.

**T-test **is a test of statistical significance for assessing the difference between two sample means. It can only be used if the data distribution follows a normal distribution and if the two sets of data have similar varieties.

**Unstructured interviews **are an in-depth method of qualitative data collection. Featuring open-ended questions, they allow the flexibility to explore topics, and for respondents to follow their own trains of thought and lines of reasoning.

**Validity **is the extent to which a study measures what it purports to measure. There are many different types of validity.

A** variable **is an operationalised concept. A variable is a phenomenon that varies and must be measurable. An outcome variable is known as the dependent variable and the effect variable is known as the independent variable. The independent variable has a causal effect on the dependent variable.

**Weighting **is a correction factor that is applied to data in the analysis phase to make the sample representative. Weighting is also used to correct for non-response, when the respondents are known to be biased in a systematic way.

** **