Journey to my Ph.D.: Qualitative Data Analysis: Card Sorting

So you have some qualitative data -- maybe from interviews, maybe from an observation session -- and you want to do some data analysis. You know what you are looking for (i.e. causes of miscommunication or misunderstanding, resolution strategies), but have little to no hypothetical support for any themes amongst your data. One qualitative data analysis technique you can use is card sorting.

This blog post is written in collaboration with my colleague Justin Smith; it is based on our experiences doing research in our research group (Developer Liberation Front) and my time at Microsoft Research this past summer working with Tom Zimmermann in the Empirical Software Engineering group and what we have found to be most efficient. For those who are interested in how to do a card sort, or how others do card sorting, I'm going to talk about how to do a card sort based on our experiences.

First thing's first: before you can do a card sort, even before you look at your data, you should have an idea of what it is you're looking for in your data. For example, with the study I am working on, I'm interested in what makes tool output difficult to interpret; more specifically I want to identify the areas where there is either miscommunication, risk of miscommunication, or misunderstanding and see what causes each.

Recommendation #1: Come up with some criteria that can be used when extracting quotes. Your criteria should be based on whatever it is you're searching for in the transcript. For example, with Justin's study, we wanted to find implicit and explicit questions developers need answered when resolving security vulnerabilities; therefore, one trivial criteria was that the text should be an explicit question posed by the developer. The more defined and specific what you're looking for is, the easier it will be to extract data for your card sort.
Recommendation #2: Have at least two people extract quotes (including yourself obviously); you should all be using the criteria you put together. This will help validate your criteria as well as increase the validity of the quotes you extract. If you're using two people, both should extract quotes from all transcripts; once that's done, the two of you should sit down and determine where you agree and disagree. Two is usually enough, but if you decide to use three people, we recommend having 2 people work on different sets of data (i.e. person 1 and 2 on a set, person 1 and 3 on a set...); once finished, each couple will work out their disagreements. This is something you will want to report when you attempt to publish your findings :) (it's called inter-rater reliability).

Once you have your set of quotes from your transcript, you need to put them on notecards. Advice from someone who's been there: although having physical, paper notecards is a necessity for a card sort, we highly recommend having an electronic copy of your quotes (I put mine in Excel).

For our first card sorts, we manually made physical note cards -- they look nice and the card sort itself was okay, but keeping track of themes and sub-themes during the card sort, and after, was not trivial and sometimes led to confusion and the need for back tracking.

A pointer regarding using spreadsheets to store your data:

Anything you would include on your notecard should be a column in the spreadsheet; for example, the columns in my spreadsheet are Participant (P1, P2...), Tool Being Used, and Quote. I also have a column for a unique identifier for each quote (Card Number) and the Emergent Themes from each round of the card sort. It might also be beneficial to include a Timestamp column; this way, you can have an approximate location in your media to find the quote if needed.

There are a number of advantages to having your quotes in a spreadsheet, specifically Excel:

You don't have to worry (as much) about water or a random fire ruining your data. It's also harder to lose your data if it's electronically stored.
Spreadsheets are searchable; paper notecards are not.
Typically card sorts are often done in iterations, and you want to be able to report anything that happens (i.e. cards moving from one theme to another); this is MUCH easier if you can put the themes for each round of the card sort into a spreadsheet.
Organizing paper notecards can be tedious and error-prone (i.e. trying to find a card and messing up an entire pile); having an electronic copy you can easily organize your data. If you decide to organize your notecards, you at least know you have a proxy of how the data was in your electronic copy.
Actually making notecards is a time consuming process, especially when making them by hand. Instead, if you have your quotes in an Excel spreadsheet, and label the columns as mentioned earlier, you can use Mail Merge to create your notecards electronically :). Details regarding how to do this can be found in the following supplementary blog post.

Now, it's time to complete the card sort. This can be done in one phase, however, we recommend doing it in multiple phases. Minimally, there's a phase 1 for preliminary sorting into themes and phase 2 for sorting each theme into high-level themes. This is particularly useful for large datasets where you can wind up with a large number of themes after the first phase; typically there are common themes amongst those separate themes, thereby warranting another phase of sorting. You may also want to include a validation phase once you have determined all the low level emergent themes (after phase 1). This phase is to ensure that all quotes have been sorted into the best possible theme.

Recommendation #1: Include others in the card sort process; this lessens the bias behind the themes you find (and from our experience helps come up with distinct themes with clear definitions, which is super important when working with qualitative data). One thing to be aware of is the more cooks in the kitchen, the more time it might take to complete (more potential for disagreement and need for discussion), so plan accordingly. For our most recent card sort, 2 hours was our max at one time so, with a little over 300 notecards, we did four 2-hour sessions. Previously I've had shorter sessions; the longer time for this study, we believe, is a product of the type of data we're working with (non-interview).
Recommendation #2: As you're doing you're card sort, keep track of important information as it changes (i.e. how you define your themes/sub-themes). Also, keep track of quotes that you and your sorters believe best represent each theme. Doing these things will make reporting your findings much easier.

Once you've done all this, you're ready to start thinking about what your paper is going to look like and where the interesting stories are in your themes (the fun part). With that being said, good luck fellow qualitative researchers! :)

Thanks again to Justin for helping me put this together and the Developer Liberation Front and Tom for the experiences!

Journey to my Ph.D.

Sunday, November 22, 2015

Qualitative Data Analysis: Card Sorting

No comments:

Post a Comment