Transactional data models, and why they're suboptimal for analytical consumption

4:15 min

Expert

Transactional data models are designed to support day-to-day business operations by organizing data in a normalized table structure. As normalized data structures avoid data redundancies as much as possible, they ensure that the typical CRUD (Create, Read, Update, and Delete) operations of your application can be easily supported. However, when it comes to analytically consuming this data (e.g. in a dashboard widget), these models can be suboptimal due to the inherently different load of analytical queries.

Transactional Data Model Overview

Transactional data models are commonly used in applications to capture and store data related to business transactions. These models consist of multiple linked tables, each storing discrete and uniquely identifiable entity values. The purpose of a transactional data model is to maintain data integrity and ensure consistency in the format of the stored data.

In a transactional data model, each table represents a specific entity or concept, such as customers, orders, products, or transactions. These tables are designed to store all the relevant information about these entities, such as their attributes and relationships with other entities.

Transactional data models are optimized for transactional workloads, and are well-suited for consistently capturing and processing real-time data.

Transactional vs. Analytical Load

Transactional and analytical workloads have different data access patterns.

In short, typical transactional loads (OLTP or Online Transaction Processing) involve manipulating a small number of rows with most/all columns (e.g. creating a row, retrieving one or a few rows, updating a row, etc.)

While analytical loads (OLAP or Online Analytical Processing) focus on analyzing a small subset of columns across a large number of rows (e.g. summing numeric values from a column, grouped by a category column, and filtered by a date column).

Performing analytical queries on a transactional data model can be challenging and resource-intensive due to the need for multiple join operations. Typical analytical queries are slicing and dicing on all sorts of metrics, so optimizing your transactional data source for this load isn't always straightforward as it might be hard to impossible to predict all queries that might need to be handled.

Complexity: The normalized structure leads to complicated analysis due to extensive join operations. This complicates the design process of insightful dashboards, as dashboard designers need to fully comprehend the normalized structure.
Performance: Multiple (large) table joins can lead to performance bottlenecks. For customer-facing analytics, it's of uttermost importance that your insights are visualized in an acceptable manner (from sub-second level up to a few seconds, after that a typical application user tends to prematurely abandon important elements that are still loading).

Common pitfall: Pre-aggregated datasets

From time to time, it might be tempting to throw things together to get insights visualized as soon as possible. This typically results in cut corners, like pre-aggregating data together and exposing it to Luzmo. This does often result in hard-to-scale solutions due to the following reasons:

Pre-aggregated datasets are hard (or even impossible) to extend. For example, you'll need to carefully think about the influence of bringing in another dimension column. Such extensions might result in semi-additive measures (where you can only aggregate those on certain dimensions) rather than the preferred fully additive measures. This could mean that existing insights visualized in your application suddenly no longer visualize the expected data.
To work around the extensibility issue, multiple isolated datasets could be created but would result in a less maintainable setup. Additionally, these datasets typically cannot be linked to each other (i.e. cannot interactively filter each other in a dashboard), and understanding where to get which insights becomes harder and harder over time due to a growing amount of tables.

One of the few reasons to pre-aggregate your data before exposing it to Luzmo, is when you have extremely large amounts of data and the granularity of (historical) data is less important. A typical use-case is performantly querying historical data, where sales of several years ago might not require the same order-level granularity as insights on recently occurred sales.

To ensure easy-to-consume and performant customer-facing insights, we strongly recommend investing the proper resources into designing and setting up a scalable analytical data model. In the next article, we'll introduce you to a simple but well-proven analytical data structure: the "Star schema".

Need more information?

Do you still have questions? Let us know how we can help.

Send us feedback!

Course Outline

Introduction (2 lessons)

Designing effective data models for analytics