Time Series Analysis is the compass that guides us through the shifting sands of time, illuminating hidden signals and guiding us to data-driven decisions.”
Time-series aggregations are essential for analyzing and understanding trends, patterns, and historical behavior in time-ordered data, Data scientists and Analysts often encounter challenges when performing such aggregations, especially when dealing with raw transaction-level data. What makes them difficult is that window functions require the data to be formatted correctly.
Lets understand this :
When analyzing sales data to find insights like average sales in the last 14 days or the biggest purchase in the last 6 months, we face a challenge. The data is usually stored at the transaction level, which means each sale is recorded separately. To calculate such metrics using traditional methods, we would need to reformat the data, which is time-consuming and complex.
Here’s where time-series aggregation comes to the rescue! It allows us to create summarized views of the data without changing its original format. So, we can calculate metrics like average sales or the biggest purchase without going through the hassle of rearranging the whole dataset, it also comes in handy if we want to add multiple metrics at one time:
- Average sales in the past 14 days
- Biggest purchase in last 6 months
- Count Distinct product types in last 90 days
If you wanted to use window functions, each metric would need to be built independently with several steps.
A better technique called common table expressions (CTEs) helps us define these historical windows in a simple way. With CTEs, we can pre-calculate the results for each metric, making it easier to analyze different aspects of the data at once.
Consider a table Transaction
TransactionID | CustomerID | Product Type | Purchase Amt | Transaction Date |
---|---|---|---|---|
1 | 101 | Electronics | 500.00 | 2023-07-15 |
2 | 102 | Clothing | 250.50 | 2023-07-16 |
3 | 103 | Groceries | 120.75 | 2023-07-17 |
4 | 101 | Books | 35.20 | 2023-07-18 |
5 | 104 | Electronics | 800.00 | 2023-07-19 |
6 | 105 | Clothing | 150.00 | 2023-07-20 |
7 | 102 | Electronics | 450.30 | 2023-07-21 |
8 | 106 | Groceries | 65.50 | 2023-07-22 |
WITH BASIC_OFFSET_14DAY AS (
SELECT
A.CustomerID,
A.TransactionDate,
AVG(B.PurchaseAmount) as AVG_PURCHASEAMOUNT_PAST14DAY,
MAX(B.PurchaseAmount) as MAX_PURCHASEAMOUNT_PAST14DAY,
COUNT(DISTINCT B.TransactionID) as COUNT_DISTINCT_TRANSACTIONID_PAST14DAY
FROM
Transaction A
INNER JOIN Transaction B ON A.CustomerID = B.CustomerID
AND 1 = 1
WHERE
B.TransactionDate >= DATEADD(day, -14, A.TransactionDate)
AND B.TransactionDate <= A.TransactionDate
GROUP BY
A.CustomerID,
A.TransactionDate
),
BASIC_OFFSET_90DAY AS (
SELECT
A.CustomerID,
A.TransactionDate,
AVG(B.PurchaseAmount) as AVG_PURCHASEAMOUNT_PAST90DAY,
MAX(B.PurchaseAmount) as MAX_PURCHASEAMOUNT_PAST90DAY,
COUNT(DISTINCT B.TransactionID) as COUNT_DISTINCT_TRANSACTIONID_PAST90DAY
FROM
Transaction A
INNER JOIN Transaction B ON A.CustomerID = B.CustomerID
AND 1 = 1
WHERE
B.TransactionDate >= DATEADD(day, -90, A.TransactionDate)
AND B.TransactionDate <= A.TransactionDate
GROUP BY
A.CustomerID,
A.TransactionDate
),
BASIC_OFFSET_180DAY AS (
SELECT
A.CustomerID,
A.TransactionDate,
AVG(B.PurchaseAmount) as AVG_PURCHASEAMOUNT_PAST180DAY,
MAX(B.PurchaseAmount) as MAX_PURCHASEAMOUNT_PAST180DAY,
COUNT(DISTINCT B.TransactionID) as COUNT_DISTINCT_TRANSACTIONID_PAST180DAY
FROM
Transaction A
INNER JOIN Transaction B ON A.CustomerID = B.CustomerID
AND 1 = 1
WHERE
B.TransactionDate >= DATEADD(day, -180, A.TransactionDate)
AND B.TransactionDate <= A.TransactionDate
GROUP BY
A.CustomerID,
A.TransactionDate
)
SELECT
src.*,
BASIC_OFFSET_14DAY.AVG_PURCHASEAMOUNT_PAST14DAY,
BASIC_OFFSET_14DAY.MAX_PURCHASEAMOUNT_PAST14DAY,
BASIC_OFFSET_14DAY.COUNT_DISTINCT_TRANSACTIONID_PAST14DAY,
BASIC_OFFSET_90DAY.AVG_PURCHASEAMOUNT_PAST90DAY,
BASIC_OFFSET_90DAY.MAX_PURCHASEAMOUNT_PAST90DAY,
BASIC_OFFSET_90DAY.COUNT_DISTINCT_TRANSACTIONID_PAST90DAY,
BASIC_OFFSET_180DAY.AVG_PURCHASEAMOUNT_PAST180DAY,
BASIC_OFFSET_180DAY.MAX_PURCHASEAMOUNT_PAST180DAY,
BASIC_OFFSET_180DAY.COUNT_DISTINCT_TRANSACTIONID_PAST180DAY
FROM
Transaction src
LEFT OUTER JOIN BASIC_OFFSET_14DAY ON BASIC_OFFSET_14DAY.TransactionDate = src.TransactionDate
AND BASIC_OFFSET_14DAY.CustomerID = src.CustomerID
LEFT OUTER JOIN BASIC_OFFSET_90DAY ON BASIC_OFFSET_90DAY.TransactionDate = src.TransactionDate
AND BASIC_OFFSET_90DAY.CustomerID = src.CustomerID
LEFT OUTER JOIN BASIC_OFFSET_180DAY ON BASIC_OFFSET_180DAY.TransactionDate = src.TransactionDate
AND BASIC_OFFSET_180DAY.CustomerID = src.CustomerID;
TransactionID | CustomerID | Product Type | Purchase Amt | Transaction Date | AVG_PURCHASEAMOUNT_PAST14DAY | MAX_PURCHASEAMOUNT_PAST14DAY | COUNT_DISTINCT_TRANSACTIONID_PAST14DAY | AVG_PURCHASEAMOUNT_PAST90DAY | MAX_PURCHASEAMOUNT_PAST90DAY | COUNT_DISTINCT_TRANSACTIONID_PAST90DAY | AVG_PURCHASEAMOUNT_PAST180DAY |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 101 | Electronics | 500.00 | 2023-07-15 | 0.00 | NULL | 0 | 500.00 | 500.00 | 1 |
2 | 2 | 102 | Clothing | 250.50 | 2023-07-16 | 250.50 | 250.50 | 1 | 250.50 | 250.50 | 1 |
3 | 3 | 103 | Groceries | 120.75 | 2023-07-17 | 120.75 | 120.75 | 1 | 120.75 | 120.75 | 1 |
4 | 4 | 101 | Books | 35.20 | 2023-07-18 | 35.20 | 35.20 | 1 | 535.20 | 500.00 | 2 |
5 | 5 | 104 | Electronics | 800.00 | 2023-07-19 | 800.00 | 800.00 | 1 | 800.00 | 800.00 | 1 |
6 | 6 | 105 | Clothing | 150.00 | 2023-07-20 | 150.00 | 150.00 | 1 | 400.50 | 250.50 | 2 |
7 | 7 | 102 | Electronics | 450.30 | 2023-07-21 | 325.40 | 250.50 | 2 | 700.80 | 500.00 | 3 |
8 | 8 | 106 | Groceries | 65.50 | 2023-07-22 | 65.50 | 65.50 | 1 | 185.25 | 120.75 | 2 |
Let’s break down the query step by step to understand what it does:
- WITH Clause (Common Table Expressions – CTEs): The query starts with three Common Table Expressions (CTEs):
BASIC_OFFSET_14DAY
,BASIC_OFFSET_90DAY
, andBASIC_OFFSET_180DAY
. CTEs allow us to define temporary result sets that we can reference later in the main query. - CTE – BASIC_OFFSET_14DAY: This CTE calculates aggregated metrics for each customer for the past 14 days. It computes the average purchase amount (
AVG_PURCHASEAMOUNT_PAST14DAY
), the maximum purchase amount (MAX_PURCHASEAMOUNT_PAST14DAY
), and the count of distinct transactions (COUNT_DISTINCT_TRANSACTIONID_PAST14DAY
) for each customer within a 14-day window. It uses a self-join on theTransaction
table to match transactions for a specific customer that fall within the 14-day period before each transaction’s date. - CTE – BASIC_OFFSET_90DAY: Similar to the previous CTE, this one calculates aggregated metrics for each customer for the past 90 days. It computes the average purchase amount (
AVG_PURCHASEAMOUNT_PAST90DAY
), the maximum purchase amount (MAX_PURCHASEAMOUNT_PAST90DAY
), and the count of distinct transactions (COUNT_DISTINCT_TRANSACTIONID_PAST90DAY
) for each customer within a 90-day window. - CTE – BASIC_OFFSET_180DAY: This CTE calculates aggregated metrics for each customer for the past 180 days. It computes the average purchase amount (
AVG_PURCHASEAMOUNT_PAST180DAY
), the maximum purchase amount (MAX_PURCHASEAMOUNT_PAST180DAY
), and the count of distinct transactions (COUNT_DISTINCT_TRANSACTIONID_PAST180DAY
) for each customer within a 180-day window. - Main Query: The main query selects data from the
Transaction
table (aliased assrc
) and joins it with the three CTEs using LEFT OUTER JOINS. The goal is to combine the original transaction data with the aggregated metrics calculated for the 14-day, 90-day, and 180-day periods. - SELECT Statement: The SELECT statement retrieves columns from both the original
Transaction
table (src
) and the three CTEs. It includes all columns from theTransaction
table and the aggregated metrics from each CTE. - LEFT OUTER JOIN: The main query performs LEFT OUTER JOINS between the
Transaction
table and each CTE. This ensures that all rows from theTransaction
table are included in the final result, even if there are no matches in the CTEs for a specific customer and transaction date. - ON Clause: The ON clauses in the LEFT OUTER JOINS specify the conditions for matching rows between the
Transaction
table and each CTE. The matching is done based on theTransactionDate
andCustomerID
columns to align the transaction data with the respective aggregated metrics.
In summary, time-series aggregation and common table expressions simplify the process of analyzing sales data, enabling us to quickly get valuable insights without reformatting the entire dataset. It’s like having a neat summary of the important details without having to go through each transaction one by one.