Processing Qualtrics output

From Lupyan Lab
Jump to: navigation, search

This example walks through compiling survey output from Qualtrics (in .csv format) in R. You can download the raw .R and .Rmd files used to make this report here. The .zip file also includes scripts for the same data manipulation in python (and accompanying ipython notebook report).

Contents

Step 1: Figure out which column headers are important

Qualtrics data is messy! There are two rows of possibly informative column names (R uses the first by default). Figure out which one has the information you need.

df <- read.csv("testloopmerge.csv")
head(df)
##                  V1                   V2        V3                    V4
## 1        ResponseID          ResponseSet      Name ExternalDataReference
## 2 R_8Dkb6ocEVBXkw85 Default Response Set Anonymous                      
## 3 R_8iREUUkJ6gx4eFf Default Response Set Anonymous                      
## 4 R_3gwo3YRvxJFKv1X Default Response Set Anonymous                      
## 5 R_9YteNghOWEw6WFf Default Response Set Anonymous                      
## 6 R_cV12rIO0Vclrj2R Default Response Set Anonymous                      
##             V5        V6     V7                  V8                  V9
## 1 EmailAddress IPAddress Status           StartDate             EndDate
## 2                             1 2014-02-14 14:09:49 2014-02-14 14:10:11
## 3                             1 2014-02-14 14:18:02 2014-02-14 14:18:23
## 4                             1 2014-04-28 22:46:08 2014-04-28 22:46:21
## 5                             1 2014-06-19 08:03:24 2014-06-19 08:04:04
## 6                             1 2014-06-24 13:10:13 2014-06-24 13:10:30
##        V10
## 1 Finished
## 2        1
## 3        1
## 4        1
## 5        1
## 6        1
##                                                                    Q1
## 1 This is a survey to test Loop & Merge functionality in / Qualtrics.
## 2                                                                   1
## 3                                                                   1
## 4                                                                   1
## 5                                                                   1
## 6                                                                   1
##         Q2.1.       Q2.2.           sight.1.           hearing.1.
## 1 urldemo (1) urldemo (2) sensetag (1)-Sight sensetag (1)-Hearing
## 2           4           2                                        
## 3           4           1                                        
## 4                                                                
## 5           4           4                                        
## 6           3           1                                        
##             touch.1.           taste.1.           smell.1.
## 1 sensetag (1)-Touch sensetag (1)-Taste sensetag (1)-Smell
## 2                                                         
## 3                                                         
## 4                                                         
## 5                                                         
## 6                                                         
## ...

In this example, the first row is the most informative, which means we don't want the second row. This is a bit harder in R than it is in python, but it simply requires reading the file in twice.

header <- read.csv("testloopmerge.csv", header = TRUE, nrows = 1, skip = 0)
header <- names(header)

df <- read.csv("testloopmerge.csv", header = FALSE, col.names = header, skip = 2)
head(df)
##                  V1                   V2        V3 V4 V5 V6 V7
## 1 R_8Dkb6ocEVBXkw85 Default Response Set Anonymous NA NA     1
## 2 R_8iREUUkJ6gx4eFf Default Response Set Anonymous NA NA     1
## 3 R_3gwo3YRvxJFKv1X Default Response Set Anonymous NA NA     1
## 4 R_9YteNghOWEw6WFf Default Response Set Anonymous NA NA     1
## 5 R_cV12rIO0Vclrj2R Default Response Set Anonymous NA NA     1
## 6 R_9SkOofHGhfCTR9H Default Response Set Anonymous NA NA     1
##                    V8                  V9 V10 Q1 Q2.1. Q2.2. sight.1.
## 1 2014-02-14 14:09:49 2014-02-14 14:10:11   1  1     4     2       NA
## 2 2014-02-14 14:18:02 2014-02-14 14:18:23   1  1     4     1       NA
## 3 2014-04-28 22:46:08 2014-04-28 22:46:21   1  1    NA    NA       NA
## 4 2014-06-19 08:03:24 2014-06-19 08:04:04   1  1     4     4       NA
## 5 2014-06-24 13:10:13 2014-06-24 13:10:30   1  1     3     1       NA
## 6 2014-06-24 13:12:09 2014-06-24 13:12:29   1  1     2    NA       NA
##   hearing.1. touch.1. taste.1. smell.1. sight.2. hearing.2. touch.2.
## 1         NA       NA       NA       NA       NA         NA       NA
## 2         NA       NA       NA       NA       NA         NA       NA
## 3         NA       NA       NA       NA       NA         NA       NA
## 4         NA       NA       NA       NA       NA         NA       NA
## 5         NA       NA       NA       NA       NA         NA       NA
## 6         NA       NA       NA       NA       NA         NA       NA
##   taste.2. smell.2. sight.3. hearing.3. touch.3. taste.3. smell.3.
## 1       NA       NA       NA         NA       NA       NA       NA
## 2       NA       NA       NA         NA       NA       NA       NA
## 3       NA       NA       NA         NA       NA       NA       NA
## 4       NA       NA       NA         NA       NA       NA       NA
## 5       NA       NA       NA         NA       NA       NA       NA
## 6       NA       NA       NA         NA       NA       NA       NA
## ...

Step 2: Determine which columns you want to keep

Qualtrics usually outputs way more columns than you need. The next step is to figure out which columns you want to keep. There are a number of ways to do this, but if you know regular expressions, you can make your job a lot easier and more flexible.

The first thing we'll do is set the column V1 to keep track of which person the responses came from. plyr has a useful function called rename that makes it easy to rename columns

library(plyr)
df <- rename(df, c(V1 = "subj_id"))

colnames(df)
##   [1] "subj_id"           "V2"                "V3"               
##   [4] "V4"                "V5"                "V6"               
##   [7] "V7"                "V8"                "V9"               
##  [10] "V10"               "Q1"                "Q2.1."            
##  [13] "Q2.2."             "sight.1."          "hearing.1."       
##  [16] "touch.1."          "taste.1."          "smell.1."         
##  [19] "sight.2."          "hearing.2."        "touch.2."         
##  [22] "taste.2."          "smell.2."          "sight.3."         
##  [25] "hearing.3."        "touch.3."          "taste.3."         
##  [28] "smell.3."          "sight.4."          "hearing.4."       
##  [31] "touch.4."          "taste.4."          "smell.4."         
##  [34] "sight.5."          "hearing.5."        "touch.5."         
##  [37] "taste.5."          "smell.5."          "sight.6."         
##  [40] "hearing.6."        "touch.6."          "taste.6."         
##  [43] "smell.6."          "sight.7."          "hearing.7."       
##  [46] "touch.7."          "taste.7."          "smell.7."         
##  [49] "sight.8."          "hearing.8."        "touch.8."         
##  [52] "taste.8."          "smell.8."          "sight.9."         
##  [55] "hearing.9."        "touch.9."          "taste.9."         
##  [58] "smell.9."          "sight.10."         "hearing.10."      
##  [61] "touch.10."         "taste.10."         "smell.10."        
##  [64] "sight.11."         "hearing.11."       "touch.11."        
##  [67] "taste.11."         "smell.11."         "sight.12."        
##  [70] "hearing.12."       "touch.12."         "taste.12."        
##  [73] "smell.12."         "sight.13."         "hearing.13."      
##  [76] "touch.13."         "taste.13."         "smell.13."        
##  [79] "sight.14."         "hearing.14."       "touch.14."        
##  [82] "taste.14."         "smell.14."         "sight.15."        
##  [85] "hearing.15."       "touch.15."         "taste.15."        
##  [88] "smell.15."         "sight.16."         "hearing.16."      
##  [91] "touch.16."         "taste.16."         "smell.16."        
##  [94] "sight.17."         "hearing.17."       "touch.17."        
##  [97] "taste.17."         "smell.17."         "sight.18."        
## [100] "hearing.18."       "touch.18."         "taste.18."        
## [103] "smell.18."         "sight.19."         "hearing.19."      
## [106] "touch.19."         "taste.19."         "smell.19."        
## [109] "sight.20."         "hearing.20."       "touch.20."        
## [112] "taste.20."         "smell.20."         "sight.21."        
## [115] "hearing.21."       "touch.21."         "taste.21."        
## [118] "smell.21."         "sight.22."         "hearing.22."      
## [121] "touch.22."         "taste.22."         "smell.22."        
## [124] "sight.23."         "hearing.23."       "touch.23."        
## [127] "taste.23."         "smell.23."         "sight.24."        
## [130] "hearing.24."       "touch.24."         "taste.24."        
## [133] "smell.24."         "sight.25."         "hearing.25."      
## [136] "touch.25."         "taste.25."         "smell.25."        
## [139] "sight.26."         "hearing.26."       "touch.26."        
## [142] "taste.26."         "smell.26."         "sight.27."        
## [145] "hearing.27."       "touch.27."         "taste.27."        
## [148] "smell.27."         "sight.28."         "hearing.28."      
## [151] "touch.28."         "taste.28."         "smell.28."        
## [154] "sight.29."         "hearing.29."       "touch.29."        
## [157] "taste.29."         "smell.29."         "sight.30."        
## [160] "hearing.30."       "touch.30."         "taste.30."        
## [163] "smell.30."         "sight.31."         "hearing.31."      
## [166] "touch.31."         "taste.31."         "smell.31."        
## [169] "sight.32."         "hearing.32."       "touch.32."        
## [172] "taste.32."         "smell.32."         "sight.33."        
## [175] "hearing.33."       "touch.33."         "taste.33."        
## [178] "smell.33."         "sight.34."         "hearing.34."      
## [181] "touch.34."         "taste.34."         "smell.34."        
## [184] "sight.35."         "hearing.35."       "touch.35."        
## [187] "taste.35."         "smell.35."         "sight.36."        
## [190] "hearing.36."       "touch.36."         "taste.36."        
## [193] "smell.36."         "sight.37."         "hearing.37."      
## [196] "touch.37."         "taste.37."         "smell.37."        
## [199] "sight.38."         "hearing.38."       "touch.38."        
## [202] "taste.38."         "smell.38."         "LocationLatitude" 
## [205] "LocationLongitude" "LocationAccuracy"  "X"

Next we'll use regular expressions to select the columns we want. The pipe character can be read as "or".

(keep_cols <- grep("^subj_id|sight|hearing|touch|taste|smell", colnames(df), 
    value = TRUE))
##   [1] "subj_id"     "sight.1."    "hearing.1."  "touch.1."    "taste.1."   
##   [6] "smell.1."    "sight.2."    "hearing.2."  "touch.2."    "taste.2."   
##  [11] "smell.2."    "sight.3."    "hearing.3."  "touch.3."    "taste.3."   
##  [16] "smell.3."    "sight.4."    "hearing.4."  "touch.4."    "taste.4."   
##  [21] "smell.4."    "sight.5."    "hearing.5."  "touch.5."    "taste.5."   
##  [26] "smell.5."    "sight.6."    "hearing.6."  "touch.6."    "taste.6."   
##  [31] "smell.6."    "sight.7."    "hearing.7."  "touch.7."    "taste.7."   
##  [36] "smell.7."    "sight.8."    "hearing.8."  "touch.8."    "taste.8."   
##  [41] "smell.8."    "sight.9."    "hearing.9."  "touch.9."    "taste.9."   
##  [46] "smell.9."    "sight.10."   "hearing.10." "touch.10."   "taste.10."  
##  [51] "smell.10."   "sight.11."   "hearing.11." "touch.11."   "taste.11."  
##  [56] "smell.11."   "sight.12."   "hearing.12." "touch.12."   "taste.12."  
##  [61] "smell.12."   "sight.13."   "hearing.13." "touch.13."   "taste.13."  
##  [66] "smell.13."   "sight.14."   "hearing.14." "touch.14."   "taste.14."  
##  [71] "smell.14."   "sight.15."   "hearing.15." "touch.15."   "taste.15."  
##  [76] "smell.15."   "sight.16."   "hearing.16." "touch.16."   "taste.16."  
##  [81] "smell.16."   "sight.17."   "hearing.17." "touch.17."   "taste.17."  
##  [86] "smell.17."   "sight.18."   "hearing.18." "touch.18."   "taste.18."  
##  [91] "smell.18."   "sight.19."   "hearing.19." "touch.19."   "taste.19."  
##  [96] "smell.19."   "sight.20."   "hearing.20." "touch.20."   "taste.20."  
## [101] "smell.20."   "sight.21."   "hearing.21." "touch.21."   "taste.21."  
## [106] "smell.21."   "sight.22."   "hearing.22." "touch.22."   "taste.22."  
## [111] "smell.22."   "sight.23."   "hearing.23." "touch.23."   "taste.23."  
## [116] "smell.23."   "sight.24."   "hearing.24." "touch.24."   "taste.24."  
## [121] "smell.24."   "sight.25."   "hearing.25." "touch.25."   "taste.25."  
## [126] "smell.25."   "sight.26."   "hearing.26." "touch.26."   "taste.26."  
## [131] "smell.26."   "sight.27."   "hearing.27." "touch.27."   "taste.27."  
## [136] "smell.27."   "sight.28."   "hearing.28." "touch.28."   "taste.28."  
## [141] "smell.28."   "sight.29."   "hearing.29." "touch.29."   "taste.29."  
## [146] "smell.29."   "sight.30."   "hearing.30." "touch.30."   "taste.30."  
## [151] "smell.30."   "sight.31."   "hearing.31." "touch.31."   "taste.31."  
## [156] "smell.31."   "sight.32."   "hearing.32." "touch.32."   "taste.32."  
## [161] "smell.32."   "sight.33."   "hearing.33." "touch.33."   "taste.33."  
## [166] "smell.33."   "sight.34."   "hearing.34." "touch.34."   "taste.34."  
## [171] "smell.34."   "sight.35."   "hearing.35." "touch.35."   "taste.35."  
## [176] "smell.35."   "sight.36."   "hearing.36." "touch.36."   "taste.36."  
## [181] "smell.36."   "sight.37."   "hearing.37." "touch.37."   "taste.37."  
## [186] "smell.37."   "sight.38."   "hearing.38." "touch.38."   "taste.38."  
## [191] "smell.38."
df <- subset(df, select = keep_cols)

Step 3: melt data from wide to long format

melt is a common operation, available in much the same format in R and python. It takes wide data and makes it long, that is, it takes the column names and transposes them to their own column.

library(reshape2)
df <- melt(df, id.vars = "subj_id", variable.name = "qualtrics_col", value.name = "likert")
df <- df[!is.na(df$likert), ]  # remove the null rows

Step 4: Extract useful information from qualtrics_col

Looking at the default names that Qualtrics assigns for each column, we see that there are multiple bits of information in there that we want to extract. Regular expressions make extracting this information into new columns very easy.

qualtrics_re <- "^([a-z]+).([0-9]+).$"
df$sense <- sub(qualtrics_re, "\\1", df$qualtrics_col)
df$row <- as.numeric(sub(qualtrics_re, "\\2", df$qualtrics_col))

Step 5: Turn row into something meaningful

We often use Loop & Merge in our Qualtrics surveys, so a really critical step in the data manipulation process is turning the uninformative row numer into the meaningful variables that it represents. This requires keeping the .csv used for the Loop & Merge process handy.

loopmerge <- read.csv("senses-demo.csv", stringsAsFactors = FALSE)
head(loopmerge)
##   row  category
## 1   1 alligator
## 2   2     apple
## 3   3       bee
## 4   4    bottle
## 5   5     camel
## 6   6     chair
df <- merge(df, loopmerge, all.x = TRUE)  # in R, the two input frames are referred to as x, y

Step 6: Clean up

Now we have a few things to clean up before we're done.

library(dplyr)
df <- select(df, -qualtrics_col, -row)  # get rid of the two unnecessary columns
df <- df[, c("subj_id", "category", "sense", "likert")]  # rearrange the columns
df <- arrange(df, subj_id, category, sense)  # sort rows

Step 7: Write to file

All done! Write to a file, and you have nicely formatted data to analyze.

write.csv(df, file = "testloopmerge-r.csv", row.names = FALSE)
Personal tools
Namespaces
Variants
Actions
Navigation
HowTos
Troubleshooting
Scripts
Toolbox