As an analyst, I’ve frequently worked with extensive datasets, designing data models, creating reports, and identifying trends. Tools like Power BI, Tableau, Looker, and SQL are integral to these tasks. However, the true foundation of efficient analysis lies in structuring data effectively. One proven method for organizing large datasets is the star schema.
A star schema simplifies data management by organizing it into a central fact table, surrounded by related dimension tables. This design streamlines querying in SQL and enhances performance in visualization tools like Power BI, Tableau, and Looker.
In this article, I’ll outline how to design a star schema for an e-commerce platform, with a focus on setting up dimension tables linked through SessionID
and ensuring dimensions are descriptive and not metric-based. We’ll see how this schema improves SQL queries and dashboard efficiency while delivering actionable insights into sales performance, customer behavior, and marketing impact.
What is a Star Schema?
A star schema is a database design that organizes data into a central fact table, containing measurable data, and multiple dimension tables, which provide descriptive context. The name “star schema” comes from the visual structure resembling a star, where the fact table is at the center, linked to dimension tables.
In e-commerce, the Orders Table serves as the fact table, holding transaction data. Key dimensions include Visitors, Marketing Campaigns, Page Views, A/B Testing, Referrers, and Events tables, all linked using SessionID
.
Step 1: Define the Fact Table – The Orders Table
The Orders Table is the core of the star schema, representing transactional data. Key columns include:
- Order ID: Unique identifier for each order.
- SessionID: Unique identifier linking the order to other tables.
- Order Date: Date of the transaction.
- Order Amount: Total value of the order.
- Payment Method: Method of payment.
- Shipping Address: Delivery location.
- Order Status: Current status (e.g., shipped, delivered).
This table is crucial for calculating metrics like total sales, average order value, and conversion rates.
Step 2: Define the Dimension Tables
Dimension tables describe the context of the transactions in the Orders Table, enabling detailed analysis. Each dimension table is keyed by SessionID
for consistency.
1. Visitors Table
- SessionID: Unique session identifier.
- Device Type: Type of device (desktop, mobile, tablet).
- Geography: Visitor’s location (country, city).
2. Marketing Campaigns Table
- SessionID: Identifies sessions influenced by campaigns.
- Campaign ID: Unique campaign identifier.
- Campaign Name: Descriptive campaign name.
- Campaign Type: Medium (e.g., email, social media).
3. Page Views Table
- SessionID: Identifies visitor interactions.
- Page Name: Describes the page viewed.
- Entry Time: Timestamp when the page was accessed.
4. A/B Testing Table
- SessionID: Links test results to visitor sessions.
- Test ID: Identifier for A/B test.
- Variant: Test variant (e.g., control, version A).
5. Referrers Table
- SessionID: Tracks visitor entry points.
- Referrer Type: Source of the visit (e.g., organic search, paid ads).
6. Events Table
- SessionID: Tracks events during a session.
- Event Type: Describes actions (e.g., add to cart, checkout).
- Event Timestamp: Time of the event.
Step 3: Connecting the Tables
The star schema uses SessionID
as the foreign key to connect the Orders Table with dimension tables. For example:
- SessionID in the Orders Table links to the Visitors, Page Views, and Marketing Campaigns tables.
- Event-based tables like the Events and A/B Testing tables are similarly connected using
SessionID
.
Step 4: Understanding the Customer Journey
With all tables linked, the customer journey can be analyzed step by step:
- A user starts a session (Visitors Table).
- The session’s page views are logged (Page Views Table).
- The user interacts with a marketing campaign (Marketing Campaigns Table).
- Events such as adding products to the cart are recorded (Events Table).
- If part of an experiment, the A/B Testing Table captures test results.
- The session concludes with an order (Orders Table), and the Referrers Table logs how the user arrived.
Step 5: Generating Reports
With this schema, insightful reports are easily generated by querying the Orders Table and joining it with dimension tables:
- Marketing Impact: Which campaign contributed most to conversions?
- A/B Testing: How did variants affect purchase behavior?
- Page Analysis: What pages drive engagement or drop-offs?
Conclusion
Implementing a star schema for your e-commerce data enables efficient data organization, streamlined analysis, and powerful visualization capabilities. By structuring your data into a central Orders Table surrounded by detailed dimension tables, you can capture the entire customer journey, from initial website interaction to final purchase.
This schema not only simplifies SQL queries but also boosts the performance of visualization tools like Power BI and Looker, which thrive on well-organized data models. By leveraging the star schema, you create a scalable framework that supports deeper insights into customer behavior, marketing strategies, and sales performance.
Feeding Reports in Power BI and Looker
Tools like Power BI and Looker integrate seamlessly with the star schema by connecting to your database and visualizing insights based on queries. Here’s how these tools can be used to generate impactful reports:
- Data Integration: Import tables from your star schema into Power BI or Looker. Use their built-in relationships feature to define connections between the fact and dimension tables.
- Pre-defined Metrics: Define calculated fields in the tool (e.g., conversion rates, ROI).
- Real-Time Dashboards: Create interactive dashboards for tracking performance, with slicers and filters to drill down into specific segments.