Updated: Feb 25, 2022
Consider adapting database function "Extract. Transform. Load" as a framework for structuring your next model or Sheets-based tool
Note: I am not an engineer. ETL - “Extract Transform Load” - refers to a specific suite of database functions, which I only understand in concept. So don’t quote me to any of your developer friends (keep this among us lay folk)
A few years back, a member of our company's tech team came to me for budget approval on an “ETL upgrade". I had no idea what it meant. Conversation ensued:
Me: "What’s ETL?" Him: "It’s the software we use to EXTRACT external data we need for the product, TRANSFORM it into a uniform structure and then LOAD it into our database or dashboards, in the format we like" Me: "Sounds important. Spend as much money as possible" Him: "Great!" ** crisp high five **
Soon thereafter I was building something in Google Sheets, and realized that “ETL” was essentially the same framework I had been taught to use when building models, just with a catchy title. I was taught: keep your raw data untouched and separate; create new tabs to combine and group your raw data into flat tables; and then pull your key data points out of those flat tables into summary tabs upfront. “Extract. Transform. Load.”
The more I tinkered, the more I liked it. Eventually, I landed on a small tweak “ETL(r)”: “Extract. Transform. Load. (Reference)”. This is how everything I build in Sheets is structured:
Extract: I begin with raw data, which I leave completely untouched in “data” tabs
Transform: Then, I create my “transform” tabs, where I join raw data tables and group data at the appropriate level in order to easily pull out key insights
Load: Finally, I fill out my summary tabs upfront.
Reference: Throughout the building process, I create “Reference” tabs where I manually enter assumptions or small lists that I reference with formulas
Here’s an example, from a brief consulting project I did for a micro-brewery whose growth was stalled. The founder asked me to take a look at his sales data and see if anything jumped out.
Note: In scrubbing and simplifying this model to publish here, I also killed all the insights. Just take my word for it, there was good stuff here.
In the black data tabs, I’ve “EXTRACTED” two chunks of raw data: a sales ledger (list of transactions) and a log of visits to retailers by “BeerCo’s” reps.
For this simplified example, I have only one green “TRANSFORM” tab, which groups revenue totals, pricing and rep visit data at the retailer level.
In the blue summary tab, I’ve opted to “LOAD” all my insights into a single tab, organized by the questions being explored. I could’ve also structured it as one-question-per-tab, and then had 4 summary tabs. Just a matter of personal preference.
Leave comments, questions and likes below.