• azurecoder

Kusto Lightning Fact 6: series-fir() - Generating Moving Averages for Time Series data

A key initiative and a first step in introducing time series analytics into a data model is to generate moving averages. Indeed, the most methodology ARIMA, AutoRegressive IntegratedMovingAverages.


In this post we'll focus on the moving average part and in subsequent posts we'll focus on regression.


Moving averages are incredibly useful in that they allow us to compress and smooth out the volume of data effectively averaging a group of close data points into a single one so the period that the moving average is calculated is all important.


It gives for a much smoother understanding of data over time without the peaks, troughs and anomalies.


We'll consider our sales dataset again which will pick up from the previous post which described the use of make-series to create a series capability using KQL.

let series = sales
| where getyear(Order_Date) == 2010 and (getmonth(Order_Date) == 1
  or getmonth(Order_Date) == 2 or getmonth(Order_Date) == 3) and (Country == "United Kingdom")
| summarize UnitsSold = sum(Units_Sold) by Country, Timestamp = Order_Date
| order by Timestamp asc; 
let min_t = toscalar(series | summarize min(Timestamp));
let max_t = toscalar(series | summarize max(Timestamp));
series
| make-series NumSold=sum(UnitsSold) default=0 on Timestamp in range(min_t, max_t, 1d) by Country
| extend NumSoldMA=series_fir(NumSold, repeat(1, 8))
| render timechart

The query above looks at the the first 3 months of 2010 for United Kingdom "units sold" going through the motions of creating a series like we did in the previous post. However, you'll see the addition of the series_fir() (finite impulse response) function. This taks our newly created series NumSold a period to roll up data points repeat(1, 8) - or 8 data points and creates a a moving average. The original series is in blue and the new moving average series is in red as in the below graph. You should be able to see the different immediately with the overall up and down trends captured without the spikes. Lowering the repeat period will increase the size of the spikes and troughs in our moving average series.

One key aspect of generating this new moving average series is being able to capture the trend. If we then use that trend to generate the difference between the real numbers in the data we're left a bit of a random walk showing how the series varies from the moving average.

let series = sales
| where getyear(Order_Date) == 2010 and (Country == "United Kingdom")
| summarize UnitsSold = sum(Units_Sold) by Country, Timestamp = Order_Date
| order by Timestamp asc; 
let min_t = toscalar(series | summarize min(Timestamp));
let max_t = toscalar(series | summarize max(Timestamp));
series
| make-series NumSold=sum(UnitsSold) default=0 on Timestamp in range(min_t, max_t, 1d) by Country
| extend NumSoldMA=series_fir(NumSold, repeat(1, 8))
| extend Residuals=series_subtract(NumSold, NumSoldMA)
| render timechart

We've now extended our dataset with a new column called Residuals which is a simple subtraction of our 2 time series. The original one and the one we created from the moving average. The data in the above query is extended to over a year which is looking at at a 10-day moving average so the peaks and troughs will be closer to the actual peaks and troughs as you can see below. The original series is in blue and moving average series in red. The new series which looks like a random walk is in green and shows how the actual changes in data vary from our calculated moving average.

In the next couple of posts we'll look at using regression in time series and continue on with the concept of seasonality.


Keep reading!

257 views0 comments

Recent Posts

See All