How to Flatten Survey Data in Google Sheets
Updated: Feb 25, 2022
This sequence of steps lets you easily “flatten” a data set in Google Sheets
Warning: this one is a little complicated... Here we go!
Ideally, data would always come in flat. Meaning, each row is unique, and each row represents a unique object or event. But alas, it doesn't. Surveys are often the culprit. You pop open the data, you see questions sprawled out horizontally as column headers, respondents as row titles in column A, and responses in a massive, difficult-to-query table. It works fine if you just want to just compare answers to a single question, but it gets complex if you try to do anything more than that.
I dealt with this in an NBA betting pool I ran last year:
In the tab “surveyData_raw” you can see the data exactly as it comes in from the Google Form. The first question of the survey asks respondents to identify themselves, and the subsequent questions ask them to place bets on the day’s games. They are not required to bet on each game, so you’ll see some people intentionally leave questions blank. My goal is to turn this ugly table into a simpler list of all bets from the day, with name, created_at, game_ID and selection as the fields for each bet object.
In order to do that, I go through a series of steps:
Step 1: identify the row number for each response (tab “transform 1” column D)
SUBSTITUTE ( ADDRESS ( MATCH ( searchValue , entireColumnRange , 0 ) , 1 , 4 ) , "A" , "" )
Step 2: join each row of raw survey data into a single string, attaching the respondent name and response date/time to each segment of the string (tab “transform 1”, column E)
JOIN ( objectDelimiter , ARRAYFORMULA ( date & fieldDelimiter & name & fieldDelimiter & FILTER ( rawData_headersRange , rawData_responsesRange* <> "" ) & fieldDelimiter & FILTER ( rawData_responsesRange* , rawData_responsesRange* <> "" ) ) )
This is the most complex step of the process. What we’re doing essentially is creating a string where bets are separated by “;” and fields are divided by “|”. With that in mind, it’s critical that your objectDelimiter (I used “;”) does not match your fieldDelimiter (I used “|”). And just as important, these two delimiters must be characters that you will not find anywhere else within the data. Secondly, I want to call out that I’ve placed an asterisk on “rawData_responseRowRange*” above because in the file I am actually using the INDIRECT formula in order to dynamically adjust this range to look at the proper row for each respondent, factoring in the rowValue we generated in step 1. Ok, moving on…
Step 3: split all of the strings into individual bets and transpose them into a single column (tab “transform 2”, cell B4)
TRANSPOSE ( SPLIT ( JOIN ( objectDelimiter , stringRange ) , objectDelimiter ) )
Step 4: Split each bet string into its 4 fields
SPLIT ( string , fieldDelimiter )
For aesthetics, I then pulled the data from columns C:F in tab “transform 2” into the front “surveyData_flat” tab. An unnecessary step, but I’m a little neurotic so...
Thanks for reading. Leave comments below