Time Series Analysis is the compass that guides us through the shifting sands of time, illuminating hidden signals and guiding us to data-driven decisions.”

Time-series aggregations are essential for analyzing and understanding trends, patterns, and historical behavior in time-ordered data, Data scientists and Analysts often encounter challenges when performing such aggregations, especially when dealing with raw transaction-level data. What makes them difficult is that window functions require the data to be formatted correctly.

Lets understand this :

When analyzing sales data to find insights like average sales in the last 14 days or the biggest purchase in the last 6 months, we face a challenge. The data is usually stored at the transaction level, which means each sale is recorded separately. To calculate such metrics using traditional methods, we would need to reformat the data, which is time-consuming and complex.

Here’s where time-series aggregation comes to the rescue! It allows us to create summarized views of the data without changing its original format. So, we can calculate metrics like average sales or the biggest purchase without going through the hassle of rearranging the whole dataset, it also comes in handy if we want to add multiple metrics at one time:

  • Average sales in the past 14 days
  • Biggest purchase in last 6 months
  • Count Distinct product types in last 90 days

If you wanted to use window functions, each metric would need to be built independently with several steps.

A better technique called common table expressions (CTEs) helps us define these historical windows in a simple way. With CTEs, we can pre-calculate the results for each metric, making it easier to analyze different aspects of the data at once.

Consider a table Transaction

TransactionIDCustomerIDProduct TypePurchase AmtTransaction Date
1101Electronics500.002023-07-15
2102Clothing250.502023-07-16
3103Groceries120.752023-07-17
4101Books35.202023-07-18
5104Electronics800.002023-07-19
6105Clothing150.002023-07-20
7102Electronics450.302023-07-21
8106Groceries65.502023-07-22

WITH BASIC_OFFSET_14DAY AS (
  SELECT 
    A.CustomerID, 
    A.TransactionDate, 
    AVG(B.PurchaseAmount) as AVG_PURCHASEAMOUNT_PAST14DAY, 
    MAX(B.PurchaseAmount) as MAX_PURCHASEAMOUNT_PAST14DAY, 
    COUNT(DISTINCT B.TransactionID) as COUNT_DISTINCT_TRANSACTIONID_PAST14DAY
  FROM 
    Transaction A 
    INNER JOIN Transaction B ON A.CustomerID = B.CustomerID 
    AND 1 = 1 
  WHERE 
    B.TransactionDate >= DATEADD(day, -14, A.TransactionDate) 
    AND B.TransactionDate <= A.TransactionDate 
  GROUP BY 
    A.CustomerID, 
    A.TransactionDate
), 
BASIC_OFFSET_90DAY AS (
  SELECT 
    A.CustomerID, 
    A.TransactionDate, 
    AVG(B.PurchaseAmount) as AVG_PURCHASEAMOUNT_PAST90DAY, 
    MAX(B.PurchaseAmount) as MAX_PURCHASEAMOUNT_PAST90DAY, 
    COUNT(DISTINCT B.TransactionID) as COUNT_DISTINCT_TRANSACTIONID_PAST90DAY
  FROM 
    Transaction A 
    INNER JOIN Transaction B ON A.CustomerID = B.CustomerID 
    AND 1 = 1 
  WHERE 
    B.TransactionDate >= DATEADD(day, -90, A.TransactionDate) 
    AND B.TransactionDate <= A.TransactionDate 
  GROUP BY 
    A.CustomerID, 
    A.TransactionDate
), 
BASIC_OFFSET_180DAY AS (
  SELECT 
    A.CustomerID, 
    A.TransactionDate, 
    AVG(B.PurchaseAmount) as AVG_PURCHASEAMOUNT_PAST180DAY, 
    MAX(B.PurchaseAmount) as MAX_PURCHASEAMOUNT_PAST180DAY, 
    COUNT(DISTINCT B.TransactionID) as COUNT_DISTINCT_TRANSACTIONID_PAST180DAY
  FROM 
    Transaction A 
    INNER JOIN Transaction B ON A.CustomerID = B.CustomerID 
    AND 1 = 1 
  WHERE 
    B.TransactionDate >= DATEADD(day, -180, A.TransactionDate) 
    AND B.TransactionDate <= A.TransactionDate 
  GROUP BY 
    A.CustomerID, 
    A.TransactionDate
) 
SELECT 
  src.*, 
  BASIC_OFFSET_14DAY.AVG_PURCHASEAMOUNT_PAST14DAY, 
  BASIC_OFFSET_14DAY.MAX_PURCHASEAMOUNT_PAST14DAY, 
  BASIC_OFFSET_14DAY.COUNT_DISTINCT_TRANSACTIONID_PAST14DAY, 
  BASIC_OFFSET_90DAY.AVG_PURCHASEAMOUNT_PAST90DAY, 
  BASIC_OFFSET_90DAY.MAX_PURCHASEAMOUNT_PAST90DAY, 
  BASIC_OFFSET_90DAY.COUNT_DISTINCT_TRANSACTIONID_PAST90DAY, 
  BASIC_OFFSET_180DAY.AVG_PURCHASEAMOUNT_PAST180DAY, 
  BASIC_OFFSET_180DAY.MAX_PURCHASEAMOUNT_PAST180DAY, 
  BASIC_OFFSET_180DAY.COUNT_DISTINCT_TRANSACTIONID_PAST180DAY 
FROM 
  Transaction src 
  LEFT OUTER JOIN BASIC_OFFSET_14DAY ON BASIC_OFFSET_14DAY.TransactionDate = src.TransactionDate 
  AND BASIC_OFFSET_14DAY.CustomerID = src.CustomerID 
  LEFT OUTER JOIN BASIC_OFFSET_90DAY ON BASIC_OFFSET_90DAY.TransactionDate = src.TransactionDate 
  AND BASIC_OFFSET_90DAY.CustomerID = src.CustomerID 
  LEFT OUTER JOIN BASIC_OFFSET_180DAY ON BASIC_OFFSET_180DAY.TransactionDate = src.TransactionDate 
  AND BASIC_OFFSET_180DAY.CustomerID = src.CustomerID;
TransactionIDCustomerIDProduct TypePurchase AmtTransaction DateAVG_PURCHASEAMOUNT_PAST14DAYMAX_PURCHASEAMOUNT_PAST14DAYCOUNT_DISTINCT_TRANSACTIONID_PAST14DAYAVG_PURCHASEAMOUNT_PAST90DAYMAX_PURCHASEAMOUNT_PAST90DAYCOUNT_DISTINCT_TRANSACTIONID_PAST90DAYAVG_PURCHASEAMOUNT_PAST180DAY
11101Electronics500.002023-07-150.00NULL0500.00500.001
22102Clothing250.502023-07-16250.50250.501250.50250.501
33103Groceries120.752023-07-17120.75120.751120.75120.751
44101Books35.202023-07-1835.2035.201535.20500.002
55104Electronics800.002023-07-19800.00800.001800.00800.001
66105Clothing150.002023-07-20150.00150.001400.50250.502
77102Electronics450.302023-07-21325.40250.502700.80500.003
88106Groceries65.502023-07-2265.5065.501185.25120.752

Let’s break down the query step by step to understand what it does:

  1. WITH Clause (Common Table Expressions – CTEs): The query starts with three Common Table Expressions (CTEs): BASIC_OFFSET_14DAY, BASIC_OFFSET_90DAY, and BASIC_OFFSET_180DAY. CTEs allow us to define temporary result sets that we can reference later in the main query.
  2. CTE – BASIC_OFFSET_14DAY: This CTE calculates aggregated metrics for each customer for the past 14 days. It computes the average purchase amount (AVG_PURCHASEAMOUNT_PAST14DAY), the maximum purchase amount (MAX_PURCHASEAMOUNT_PAST14DAY), and the count of distinct transactions (COUNT_DISTINCT_TRANSACTIONID_PAST14DAY) for each customer within a 14-day window. It uses a self-join on the Transaction table to match transactions for a specific customer that fall within the 14-day period before each transaction’s date.
  3. CTE – BASIC_OFFSET_90DAY: Similar to the previous CTE, this one calculates aggregated metrics for each customer for the past 90 days. It computes the average purchase amount (AVG_PURCHASEAMOUNT_PAST90DAY), the maximum purchase amount (MAX_PURCHASEAMOUNT_PAST90DAY), and the count of distinct transactions (COUNT_DISTINCT_TRANSACTIONID_PAST90DAY) for each customer within a 90-day window.
  4. CTE – BASIC_OFFSET_180DAY: This CTE calculates aggregated metrics for each customer for the past 180 days. It computes the average purchase amount (AVG_PURCHASEAMOUNT_PAST180DAY), the maximum purchase amount (MAX_PURCHASEAMOUNT_PAST180DAY), and the count of distinct transactions (COUNT_DISTINCT_TRANSACTIONID_PAST180DAY) for each customer within a 180-day window.
  5. Main Query: The main query selects data from the Transaction table (aliased as src) and joins it with the three CTEs using LEFT OUTER JOINS. The goal is to combine the original transaction data with the aggregated metrics calculated for the 14-day, 90-day, and 180-day periods.
  6. SELECT Statement: The SELECT statement retrieves columns from both the original Transaction table (src) and the three CTEs. It includes all columns from the Transaction table and the aggregated metrics from each CTE.
  7. LEFT OUTER JOIN: The main query performs LEFT OUTER JOINS between the Transaction table and each CTE. This ensures that all rows from the Transaction table are included in the final result, even if there are no matches in the CTEs for a specific customer and transaction date.
  8. ON Clause: The ON clauses in the LEFT OUTER JOINS specify the conditions for matching rows between the Transaction table and each CTE. The matching is done based on the TransactionDate and CustomerID columns to align the transaction data with the respective aggregated metrics.

In summary, time-series aggregation and common table expressions simplify the process of analyzing sales data, enabling us to quickly get valuable insights without reformatting the entire dataset. It’s like having a neat summary of the important details without having to go through each transaction one by one.

By Apoorva