Unit Author: Professor Nick J Fox

Learning objectives

Having successfully completed the work in this unit, you will be able to:

  • explain the purpose of qualitative data analysis
  • offer some principles of qualitative data analysis
  • describe the main approaches to qualitative data analysis
  • describe what is meant by grounded theory
  • undertake a basic thematic analysis.


The aim of all data analysis is to provide researchers and their audiences with a means to draw conclusions and/or inferences from a piece of research. The quantity of data collected in a study may obscure these conclusions from being immediately obvious: data analysis must provide a means to get to valid conclusions by finding a way to summarise the data.

In quantitative data analysis, a large quantity of data is typically displayed either graphically (for example as a chart showing percentages or changes over time), or in terms of some numerical representations of quantity or distribution (for example, a ‘P’ value or a confidence interval).

In both cases, what has been achieved is the reduction in the overall amount of data to a simple chart or statistic that summarises the data and allows a conclusion to be drawn that is acceptable to the research community as a legitimate reflection of what the data ‘says’. This permits researchers to disseminate the summary findings rather than the mass of raw data, based on accepted analytical approaches.

Qualitative data analysis does not permit this simple reduction of data to a chart or a statistic, but the objectives of qualitative data analysis are the same:

  • Data reduction: finding ways to make the vast mass of data collected by qualitative techniques more manageable; and
  • Data display: finding a means to share the data that is accepted by the research community, policy-makers and public as valid.

In addressing these two objectives, qualitative data analysis is more problematic than quantitative analysis for two reasons.

First, there is no immediate way to represent qualities, in contrast to the simple arithmetical functions of means and distributions in quantitative analysis.

Second, because typically qualitative data is more extensive, comprising possibly millions of words spoken by interviewees during the course of a study, or video streams documenting behaviour and interactions.  How can we possibly display even a hundredth of this data in a way that can be comprehended? And how may we be confident that a researcher has provided us with a summary that is representative of the whole set of data?

This unit will address this problem. But, if this seems daunting, then rest assured that the methods of qualitative data analysis are based upon simple strategies. The methods can be summarised as:

  1. Sense-making: drawing out the main messages being conveyed by the data, be it an interview, an observation or a document that we are analysing. Typically, this involves coding data and organising this coded data into themes or other patterns and associations;
  2. Sampling the data to provide examples that in some way represent the broader dataset;
  3. Writing up the data analysis to draw out and illustrate the key themes.

1. Thematic approaches to qualitative data analysis

Although there are also ‘holistic’ approaches (for example’ phenomenology’, which tries to gain an overall picture of a respondent’s world-view) we will focus here on the most common approach to analysing qualitative data: thematic analysis.  Thematic data analysis may be carried out by ‘hand’, using a basic ‘cut-and-paste’ method (this is how qualitative data analysis always used to be done: coded data was literally cut up and then pasted on to large sheets of paper to organise it into themes).  A more sophisticated version of this uses a basic software solution such as a word-processor (e.g. Word) or spreadsheet (e.g. Excel) to manage the data within themes.  This is the most straightforward way if you only have a small amount of data to analyse.

However, thematic analysis is now more typically undertaken using bespoke computer applications, and most researchers now use specialist qualitative data analysis packages, known generically as Computer Assisted/Aided Qualitative Data Analysis Software (CAQDAS).  Well-known packages are Atlas-TI, NVivo and Ethnograph.  CAQDAS software typically provide functionality enabling data management, coding, subsequent thematic analysis, and even complex data modelling.  CAQDAS comes into its own when there is a large mass of data to analyse, or when there are multiple researchers working on the same project.

2. Interpretation in qualitative data analysis

Qualitative data analysis is not merely a mechanical task.  Rather, it requires an act of interpretation on the part of the researcher. This occurs in two ways:

  • through the meanings the researcher attaches to what respondents have said; and
  • in how the data is written-up: how the summary data is presented and what ‘story’ the researcher develops to link together the themes that emerge.

Because interpretation is required in qualitative analysis, it raises issues of epistemology. Epistemology is concerned with questions about what we can know about the world around us. This is a key issue in social research, see Unit 1.

3. Preparing data for analysis

There are some basic procedures that need to be undertaken before data is ready for analysis.

3.1 Anonymising data

In almost all cases, it is important that research data presented in a report or for publication is anonymised, so that readers cannot identify individual respondents.  Occasionally, anonymisation is pointless as the data refers to specific individuals, but this is very rare.

It is possible to go through a completed report and alter names. However, it is best if anonymisation occurs prior to analysis. This means that if additional materials are added to a report, there are no problems with matching original and pseudonymous tags to quotations, forgetting to anonymise quotations and so forth.

It is therefore good practice to establish an anonymised coding frame as soon as transcripts of interviews are to be produced. The simplest way to do this is to allocate a unique number to each respondent, and use this to identify the transcript of an interview or other record. When producing a final draft, pseudonyms can then be attached to the numbered extracts. Alternatively, appropriate pseudonyms can be used for transcripts.

Of course, you need to ensure you can trace back from anonymised transcript to the respondent, for example, if you wanted to check the accuracy of the transcription with a respondent.

When creating pseudonyms, make sure that these are not so obvious that they could be linked back to respondents.

Pseudonyms should also be used for identifiable aspects of a setting, including names of organisations, locations and so on.

3.2 Preparing the text

Make sure that interview transcripts clearly differentiate between respondent and interviewer contributions, and use a standard code for the interviewer (eg. ‘Int.’).

If you are using a manual approach such as ‘cut and paste’, the following tips on transcript preparation should be followed:

  • Ensure that the document name is cross-referenced, so you can check back on the identity of the respondent or source.
  • Format text with wide margins, if you are going to print out and annotate the transcripts by hand. Double spacing will also aid coding.
  • Add line numbers (these can be added by word-processing software; however, CAQDAS packages will automatically add numbering continuously. If you are using cut and paste, make sure you leave the page number attached to the text.

If using a CAQDAS package, you will need to follow its instructions for importing transcripts.

4. Thematic analysis

In their introduction to thematic analysis of qualitative data, Marshall and Rossman (2015: 214) suggest that data analysis is the process of

‘…bringing order, structure and interpretation to a mass of collected data … a search for general statements about relationships and underlying themes …’ ’

Thematic analysis may be based either on themes identified prior to the analysis process, or on categories that become clear to the researcher only as the analysis proceeds (as in ‘grounded theory’, which we consider later in the unit).

For example, in a study of the work done by a health visitor, a researcher’s literature review revealed that the work done could be divided into casework with individual clients, community development work, public health activities and administration. These four aspects of workload can be directly adopted as the ‘themes’ that the researcher uses to analyse interview data once it has been collected from respondents. This would be an example of a pre-figured strategy for thematic analysis, and is often used in approaches such as ‘framework analysis’ (Gale et al, 2013).

On the other hand, the researcher might decide that s/he cannot decide the themes in advance. For example, a researcher conducted interviews with British Asian women concerning healthy eating. No research had been conducted on this topic previously. Only when the data was read through could the researcher begin to identify what themes were common to the interviewees. She found emerging themes related to cultural preferences, individual preferences, knowledge and affordability. This is an example of an emergent or intuitive strategy (the intuition refers to the researcher’s capacity to discern the important themes in the collected data). We will look more closely at the latter kind of analysis when we consider grounded theory.

4.1 Stages in thematic data analysis

Marshall and Rossman (2015) suggest that thematic analysis can be divided into six phases:

  1. organise the data
  2. generate categories or themes
  3. code the data
  4. test emergent understandings of the data
  5. search for alternative explanations of the data
  6. write up the data analysis.

We will look at each of these in turn.

4.1.1. Organising the data

The first and crucial step in thematic analysis is that of familiarisation with the data. This cannot be skipped without great risks to validity of the analysis, and there is no shortcut to reading the data transcripts, possibly a number of times. This work precedes any efforts to identify themes or test theories, but during the reading process, some broader understanding of the data will begin to emerge.

There is of course, the practical issue of organising data: this may take the form of a mass of documents: not only transcripts of interviews but also field notes, documents, photographs, diagrams and scribbled ideas or notes. It is important that you find some kind of way of archiving this material, and this probably requires indexing every document, with a key index so you can quickly lay your hands on individual documents. Where these can be entered into a computer-aided qualitative data analysis package, this can provide a neat way to archive all your data, but it may still be the case that some kind of physical archive is also required. However, the main task in this phase of analysis is reading the data. It is during this process that you can start to understand the data.

During reading, you may wish to take notes, perhaps using index cards to remind yourself of something that strikes you. In some software packages, this is known as a memo. Usually these notes are not formal themes but more general insights into what is being said, or examples of something unexpected or remarkable.

There are three kinds of reading, according to Mason (2002: 148-150): literal, interpretive and reflexive. Literal reading concerns itself with the structure of transcript, document or other data, simply focusing on how it is constituted (for example, by noting that a document is written for internal consumption within an organisation and takes the form of a confidential report).

Interpretive reading will ‘… involve you in constructing or documenting a version of what you think the data mean or represent, or what you think you can infer from them’ (ibid: 149). You may focus on the respondent’s own interpretations or impose your own meanings, depending on your epistemological stance and your research question.

Reflexive reading ‘… will locate you as part of the data you have generated, and will seek to explore your role and perspective in the process of generation and interpretation of data’ (ibid).

4.1.2. Generating categories and themes

This is the main way to ‘reduce’ the complexity of the data.  There is a hierarchy of codes, categories and themes that progressively group data

  • Codes are simply ways to group together similar bits of data (for instance, whenever an interviewee refers to a family member, the code ascribed could be ‘relative’).
  • Categories (sometimes known in CAQDAS as ‘code families’ and ‘supercodes’) are higher-level groupings of data that combine related codes to help focus your research area (e.g. ‘parental messages about food’).
  • Themes draw together categories, to define the main ideas emerging from the data (e.g. ‘food as reward and celebration in the family’). They can be understood as high-level categories that provide an overall structure to the data. You should aim to identify about four or five themes from the data, to give a manageable number for writing up the report.

The process of category generation involves noting patterns in the data, perhaps relating to the topics described by interviewees, or in how they describe aspects of what they are describing. For example, as you read an interview, you are reminded by something said by another interviewee, or a recurrent theme in what one interviewee has to say.

If you pre-select the categories and themes that you will use to analyse your data, these will then provide the structure of the analysis and any subsequent research reports.  If your categories and themes emerge as you analyse the data, it is important that they are adequate to fully describe all the relevant data: it may take a considerable time to find the right categories to sufficiently explain the data.

Categories should be internally consistent and externally divergent (Marshall and Rossman 2015: 225). This means that a category should link together things that are the same as each other, but be distinct from each other.

4.1.3. Coding data

The purpose of coding is two-fold:

  1. to categorise the data according to the interpretive framework you are applying, and
  2. to cross-reference the transcripts, to enable examples of the data to be easily identified, to be used in the write-up of the qualitative data analysis.

CAQDAS packages have the advantage over manual approaches that once a) has been completed, it is easy to recover data for b).

Coding is a straightforward process once the codes and categories have been agreed.

During manual coding, a code can be written alongside the passage, and highlighting in different colours can be used to identify categories or themes.

In computer-aided analysis, coding and higher-level categorisation is done electronically. Software packages have the advantage that it is easy to tag a phrase or even a single word with multiple codes.  For example, interviewee A commented:

‘… I usually buy food from the local shop that sells Asian ingredients, but this is more expensive and I may go to the supermarket for some food …’

This phrase could be tagged with two separate codes: culture and affordability, enabling the phrase to provide evidence for two different aspects of the analysis.

4.1.4. Testing emergent understandings

As codes, categories and themes are developed, some kind of understanding of ‘what the data means’ will begin to emerge as the data is progressively categorised.

Part of your understanding of a dataset will derive from the emergence or application of relevant theoretical constructs (for instance, ‘patriarchy’ or ‘stigma’) to the data. What this means is that you will tentatively attempt to ‘explain’ your data from within a context of theory; if the theoretical constructs you apply seem to fit the data well, then this will help to refine your understanding.   (As we will see later, in ‘grounded’ data analysis, the theoretical constructs may be novel rather than derived from established theories.)

Marshall and Rossman (2015) suggest that in this phase of qualitative data analysis, a researcher should:

  • search the data to challenge the emergent understanding,
  • seek out negative instances that might challenge this understanding, and
  • start to draw categories of data together to establish the main themes.

4.1.5. Search for alternative explanations

During data analysis, a researcher should not commit too quickly to one explanation of the data, but should play ‘devil’s advocate’, seeking alternative understandings of the data, and even trying to undermine the understanding or theories that are being used for analysis.

4.1.6. Write up the data analysis

With your analytical framework of categories and themes established, you can easily use this to create a format for the findings section of a research report.

Each of your four of five main themes will form a sub-heading within the findings section, and within each of these you can describe the different elements that comprise the theme.  Together, these sub-sections should cover all the relevant issues that have emerged fro the data analysis.

Each sub-theme can then be illustrated with examples from the data.  These should be readily accessible from your coded transcript, whether this has been coded and categorises manually or in CAQDAS.  Arguably, writing up a qualitative study can be regarded as a narrative that provides the reader with an organised and summarised account of the data.  Part of telling a ‘believable story’ will depend upon illustrating your write-up with representative data.

These six phases of qualitative data analysis are fairly generic, and provide the basis for most approaches. On occasions (for example when documenting a ‘case study’), thematic analysis may be replaced by a more narrative structure (for instance, an account of a child’s day at school, or of an event such as a political uprising or protest).  Here, categorisation, themes and theory will not be so relevant. However, for most studies, this form of thematic approach will provide a good basis for qualitative data analysis.

SAQ 9.1 Stages of qualitative data analysis

Read the following article, which reports a qualitative research study.Ellis-Sloan, K. (2014) Teenage Mothers, Stigma and Their ‘Presentations of Self’ Sociological Research Online, 19 (1) 9.  http://www.socresonline.org.uk/19/1/9.html

[table id=49 /]

5. Grounded theory

This is an approach to qualitative analysis that has been very popular at times, and which you may come across in the social science literature.  Grounded theory is not in itself a methodology, but a particular approach to data.  It was established by Glaser and Strauss (1967) and applied by many qualitative researchers in the 1970s, 80s and 90s.  Grounded theory was instrumental in countering claims that qualitative research was ‘woolly’ and unsystematic, as it claimed to adopt a range of systematic measures to reduce researcher bias during data analysis.  It was also based on a realist epistemology, concerned with the possibility of knowing the ‘truth’ about the world.

The main principles of grounded theory are as follows:

  • theory should be grounded in the data gathered in a study rather than imposed from a previously existing framework;
  • theory can be refined by further data collection, so in a grounded theory approach data collection and analysis should be concurrent and iterative (in order to achieve the ‘constant comparison’ of data against theory);
  • theoretical sampling (non-random sampling that selects cases to supply a wide range of responses) is used to ensure that data is collected that can assist in the development of the grounded theory;
  • a researcher using the grounded theory approach needs to be aware of her ‘conceptual baggage’, which could bias the emergent theory;
  • theory emerges through immersion in the data and the development of a coding framework that is entirely based on the structure of the data.

The techniques of grounded theory are very similar to those that have already been described in thematic analysis, but avoiding categories imposed from a pre-figured frame of reference or theory.  Indeed, grounded theory researchers are encouraged not to conduct a literature review before commencing research, in case this adds to the conceptual baggage! Immersion in the field is the only way to gain a theoretical framework that is ‘true’ to the participants in the setting. As noted, collection and analysis of data should proceed hand-in-hand, so even after one interview, analysis might begin: categories and early theoretical constructs then inform the shaping of subsequent data collection.

The writing of grounded theory is a critical element of the process, as it is here that the theoretical framework that supplies the understanding of the data is elucidated. Typically, large examples of the raw data are included, to demonstrate that theory is truly grounded in the data: there is a sense in which the data will ‘speak for itself’ and the role of the qualitative data writer is simply to organise this in a comprehensible way.

5.1 Criticisms of grounded theory

The main criticisms of grounded theory are that:

  • in many cases, qualitative data analysis is not primarily concerned with theory generation, but is an opportunity to apply existing theory to a setting. (This is particularly relevant to case studies, where the intention is not to generalise but to document what goes on in a setting in great detail.)
  • Grounded theory assumes that a researcher can set aside her own interpretations, to allow the ‘truth’ of the situation to emerge. It is hard to see how this is possible, as the whole process of analysis is tied to the researcher’s interpretations and even to the concepts she uses to make sense of a transcript.
  • Grounded theory assumes there is a single truth. However, there may be many different interpretative frameworks in use by participants in a field setting, and it is these that should be described, not a single ‘reality’ that an external researcher ‘discovers’.

These latter two criticisms are made by constructionists and postmodernists, who argue that there are always multiple viewpoints, with multiple interpretations, and the researcher’s job is to present them clearly.  For a recent detailed assessment of grounded theory, see Gibson and Hartman (2013).


In this unit, we have looked at the principles involved in qualitative data analysis.  We have shown, just as in quantitative analyses that use charts or statistics, the main objective is to summarise and reduce data.

While this assessment provides the basis for understanding the processes, there is nothing like getting your hands dirty with some real data.  You may not have access to a CAQDAS package, but don’t forget small data sets can be analysed using cut and paste or a programme like WORD.   So if you are starting with a small research project, the next step will be to test out the principles we have described here.

However, I hope you have also seen just how tied up with interpretation is the process of qualitative data analysis.  This poses issues of how we know the world (epistemology): is the outcome of a thematic analysis an accurate and truthful description of the events studied, or a concocted though believable version generated by the interpretive work of the analyst?  These alternatives may be described as ‘realist’ and ‘constructionist’ epistemological stances. We looked at this in detail in Unit 1.

Now please complete the following reflective exercise (log book), which picks up on these issues.

Reflective exercise 9.1: Interpretation, bias and ‘conceptual baggage’

Even in qualitative research that sets out to be ‘grounded’ in the data, researchers face the challenge of their own biases affecting their interpretations.  Just by setting a research question, arguably this context introduces bias into the process of analysis, as this will affect the features in a transcript that relate to your question.For that reason, it is important to identify your own biases or ‘conceptual baggage’ before you begin analysis (perhaps before you begin data collection!).

In this exercise, we want you to think about the baggage you might bring to a research topic.

[table id=50 /]

Answers to SAQ 9.1

1. What approach to data analysis did the author take?

Answer: this was a thematic analysis, which progressively analysed the data into codes and eventually themes.

2. What are the main themes in the analysis?

Answer:  Experiences of stigma; defences against stigma: the latter was sub-divided into contraception use, reactions to pregnancy and decisions concerning the pregnancy.

3. What theoretical constructs are used by the author?

Answer: stigma; presentation of self; impression management; ‘face-work’.

4. Does the author discuss any issues in writing up the data analysis?

Answer: The writing up was guided by further reading, and this enabled further refinement of the themes.


Gale, N. et al. (2013) Using the framework method for the analysis of qualitative data in multi-disciplinary health research.  BMC Medical Research Methodology, 13: 117.  http://www.biomedcentral.com/1471-2288/13/117

Gibson, B. and Hartman, J. (2013) Rediscovering Grounded Theory.  London: Sage.

Glaser, B. and Strauss, A. (1967) The Discovery of Grounded Theory.

Marshall, C. and Rossman, G.B. (6th edition 2015) Designing Qualitative Research. Sage.

Further reading

Mason, J. (2002) Qualitative Researching. 2nd Edition. London: Sage. Chapter 8.

Ritchie J. and Lewis, J. (2012) Qualitative Research Practice. 2nd Edition. London: Sage. Chapters 10 and 11.

[box type=”info”]This unit is part of our course on Social Research Methods. You must be registered and logged in to access course content. Back to courses Welcome page.[/box]