Betting on the future of data modeling
Learnings from the evolution of BI
After nearly a decade of working in data modeling tools, I am enthusiastic to work on the next generation of model-based BI. This blog explores how I have experienced the evolution of business intelligence tools while helping hundreds of customers. I’ll dive into what previous generations of data modeling got right (and wrong), and how we’re using these learnings at Omni to bet on the future of data modeling and the next generation of BI.
A very brief history of modern BI
My involvement in the BI world began when I joined a 40-employee Looker. I taught myself SQL to get the job, but prior to that, the only database I’d ever interacted with was an Excel sheet. Much has been written on the extensive history of the data space (e.g., The Rise of the Semantic Layer: Metrics On-The-Fly, What happened to the Semantic Layer?), so I won’t attempt to replicate that. Instead, I’ll reflect on the world as I understood it upon joining Looker in 2014.
In those days, the competition fell into two very different categories:
- The first wave of BI - legacy data modeling tools (BOBJ, Microstrategy, OBIEE) that were highly configurable and feature-rich, with robust governance layers to protect expensive data warehouses from their users. However, these came with extraordinarily heavy upkeep and high costs.
- The second wave of BI - tools developed in response to those legacy monoliths were affordable Workbook BI and SQL-based dashboarding solutions. These democratized access to data by allowing anyone with access to a database connection or CSV to build beautiful dashboards, but these tended to create data chaos and mistrust.
Looker was impeccably timed to position something in between: a tool that enabled everyone in an organization to interact with data, built on a light(er)weight data model that didn’t require years of training to understand and edit. The solution was elegant, and made the particularly spot-on bets that a) SQL would remain the language of data, and b) databases would become steadily cheaper and faster.
The product was built on three core pillars:
- Web-based architecture for easy sharing and collaboration - especially compared to the workbook and legacy tools of the time
- A data model as the single source of truth - with LookML as a modern differentiator to legacy systems
- Computation in-database - a stark contrast to OLAP cubes, proprietary engines, and desktop exports
At the time, there was a "big data revolution" - a push for businesses to collect everything in a push to become more data-driven, and we were defining a new genre of data tools by providing non-technical people with fast, easy access to the data they needed to make decisions. Looker arrived just in time to help prove "the data-driven enterprise" was possible. We were well-positioned to ride the third wave of BI.
While all of this was happening dbt was born and became, to many, synonymous with data modeling. Similarly, their success was rooted in simplifying a historically heavy and painful process - suddenly anyone who could write (or generate) a SELECT statement could persist transformations in a database. In just a few years, dbt became the nearly universal tool of data organizations for in-database materialized transformations.
The undertow in the third wave
While Looker inspired a generation of BI tools built on similar approaches to data modeling, the approach was not without flaw.
One of the biggest hurdles to adoption remains the need to learn and then write quite a lot of LookML before actually getting value. The learning curve is steep. SQL, meanwhile, is viewed as the lingua franca of data, and for many of the analysts l worked with, learning a second language felt like too much extra work just to get a few dashboards up. This made it hard to show the promise of self-service before a heavy learning curve had been climbed.
Looker was centered around enabling self-service for non-technical people, but the analyst experience suffered and became increasingly restrictive. Let’s walk through a common workflow:
In Looker, asking a straightforward question about the tables and fields in an Explore can be done easily. But, to ask a next-level question that requires additional modeling, users must leave the Explore page and write some SQL/LookML. To do so, they enter developer mode, add a derived table to the model, and join it back to an Explore. Sometimes, they must create a new Explore dedicated to just this table or use case. After, you can’t just use this derived table in content; it must be promoted out of developer mode and into a now (more) bloated shared model.
In cases where the logic is reusable and interesting to other people, this is all well and good. Yet too often I saw that one-off questions generated single-purpose derived tables and/or Explores which lived in the Looker model, cluttering up already busy field pickers forevermore. To share a more personal example:
A few years ago, I had a memorably frustrating experience while building the dashboards for Looker’s Inside Sales team’s QBRs. I needed to add a few derived tables and columns that were critical for the dashboards, but realistically they were only useful for this single very specific purpose. After some negotiation and reviews, I was ultimately allowed to add the things I needed to because the dashboards were important. I clearly remember feeling that it just didn’t make sense for me to be cluttering up everyone else’s environment for a one-off.
This is not an uncommon challenge, and the result of all this is that many organizations adopt strict policies to protect the model: only certain people can touch LookML, fewer still can approve pull requests, and the rules about what can be included are fiercely enforced. While these best practices help keep models clean and Looker Explores uncluttered, they push away data scientists and analysts doing complex analysis. Instead, they seek out the more flexible and less governed green pastures of Python notebooks and Excel sheets to do their work in peace, usually losing any acceleration they might have enjoyed from their Looker model in the process.
All of this is to say that what we do in data is reusable and there’s value in capturing it, but sometimes data work really is one-off. By forcing absolutely everything through a shared, governed data model 100% of the time, we actually dilute the value of that model and slow everyone down.
So what about dbt?
The simplicity of the SQL-based layer enabled dbt’s transformation layer to annex data modeling. Where LookML enables self-service at the cost of up-front learning and configuration, dbt makes it simple to add a new table into the database for any purpose. By meeting people where they are and speaking their language, dbt captured the hearts of analysts everywhere.
To be clear, I have a lot of love for dbt and how incredibly easy it’s made it to transform data. Omni’s CEO Colin has written about how we use dbt for transformation. But I do worry that the ease of creating a new model can be a bit of a double-edged sword, leading to hundreds or thousands of models for potentially very specific use cases. When used in isolation, it can leave people dependent on expensive and over-taxed data engineering resources to answer questions and create rather static access to data. So while dbt is a powerful tool for transformation, this too, can lead to chaos.
Betting on the future of data modeling
So why did I leave Looker to work on a new BI tool based on yet another data model?
I have the opportunity to help build the tool I want to use: one that lets me curate intuitive data experiences for others (after all, data work is inextricably service work), yet remains flexible enough for my deeper and more creative explorations.
Our team shares a few core understandings that drive what we are building:
You have to meet people where they are.
The workforce has become incredibly data-savvy, and there’s no single right way to interact with data. In Omni, you can choose to query through a point & click interface or just write SQL; you can build a data model via the UI, SQL, or YAML. You can even use Omni’s model to generate dbt models. The idea behind all of this is to accelerate and empower the people who understand the data. We want to speak their language, whatever that might be, and really listen to people when they tell us how they want to work.
This also means providing first-class customer support. I got to watch (and sometimes help) Margaret Rosas build Looker’s Department of Customer Love. I strive to emulate what it was at its peak in the way my team will interact with and support the people who use Omni.
Some things just don’t need to be re-used.
… but that shouldn’t preclude your benefitting from the modeling you’ve already done. In Omni, I can seamlessly transition between modes of interaction, letting my Omni data model generate the base parts of my query (I never want to write another date extraction), adding fields from the UI, then slipping into SQL when it feels more flexible or natural. Meanwhile, I can promote the pieces of my work that are useful to others to a shared model, while keeping my one-offs out of the way. By keeping the interaction and modeling layers close, I’m never so far away that I can’t tie back in the valuable pieces of exploratory work.
It has to look and feel good.
Omni is being built with performance and usability at the forefront. We’ve learned how important it is to build elegance and empathy into our application. We’re investing early in design and visualization, and I think our fantastic designers, Sarah and Jared, and visualization engineer, Dirk, have already made Omni look and feel great. We’re using modern technologies for caching and query acceleration (thanks, DuckDB team!) that make Omni fast. We’re committed to remaining this way - many of us have experienced the challenges of supporting an aging codebase, and we are committed to building for the long run.
For me, there’s an underlying sense of unfinished business. I used to feel I could never work for a competitor of Looker, yet to me, Omni feels like a continuation of the story. We got a lot right the first time, but we made our share of mistakes, and the landscape has evolved substantially. I’m thrilled to be building for the next generation with this team, and 10 years of learning behind us.
If you’d like to see it in action, we’d love to have you explore Omni.