df <- tibble(x = 1:252, phi = rnorm(252, mean = 0, sd = 1.5)) |>
mutate(y = 0.07 * x + 0.03 + phi)
ggplot(df, aes(x, y)) +
geom_line() +
ggtitle(label = 'Compose a linear trend with a stochastic component')
François de Ryckel
October 21, 2022
October 22, 2023
When studying time-series, we usually consider trend and seasonality. With that in mind, we define time-series either in an additive way \[y_t = T_t + S_t + R_t\] or in a multiplicative way \[y_t = T_t \cdot S_t \cdot R_t\]
with
Note that, using log, we can easily transform a multiplicative time-series into an addidiative one.
\[log(y_t) = log(T_t \cdot S_t \cdot R_t) = log(T_t) + log(S_t) + log(R_t)\]
Because decomposition used averages (for seasonal) and noving average (for trend), decomposition is not robust to outlier (aka, it is very sensitive to outlier)
Perhaps before to decompose time-series, we can try to compose some dummy time-series in both additive or multiplicative ways.
df <- tibble(x = 1:252, phi = rnorm(252, mean = 0, sd = 1.5)) |>
mutate(y = 0.07 * x + 0.03 + phi)
ggplot(df, aes(x, y)) +
geom_line() +
ggtitle(label = 'Compose a linear trend with a stochastic component')
We could also create a seasonal time-series with a stochastic component.
Let’s go back to the milk example which looks like this.
# A tibble: 6 × 2
month milk_prod_per_cow_kg
<date> <dbl>
1 1962-01-01 265.
2 1962-02-01 252.
3 1962-03-01 288
4 1962-04-01 295.
5 1962-05-01 327.
6 1962-06-01 314.

If we transform our tibble into a time-series df, then we can use the decompose() function.
ts_milk <- ts(milk$milk_prod_per_cow_kg, start = c(1962, 1, 1), frequency = 12)
milk_dec <- decompose(ts_milk, type = 'additive', filter = NULL)
plot(milk_dec)
And now, we can use our decomposed time-series to detrend or remove seasonality
The timtk package can achieve the same output in a more direct way as timetk fit the tidyverse framework.
frequency = 12 observations
trend = 60 observations per 5 years
It should be noted that the timetk package use a Seasonal-Trend-Loess decomposition (STL). If we want to get the values, we use the tk_stl_diagnostics(date, value) function.
(From FPP 3rd ed. )While loess is a method for estimating nonlinear relationships. The STL method was developed by R. B. Cleveland et al. (1990).
STL has several advantages over classical decomposition:
frequency = 12 observations per 1 year
trend = 60 observations per 5 years
# A tibble: 6 × 6
month observed season trend remainder seasadj
<date> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1962-01-01 265. -8.44 272. 1.81 273.
2 1962-02-01 252. -27.2 272. 7.15 280.
3 1962-03-01 288 15.5 273. -0.726 273.
4 1962-04-01 295. 22.4 274. -1.18 273.
5 1962-05-01 327. 50.0 275. 2.42 277.
6 1962-06-01 314. 37.3 276. 0.815 276.
There are so many packet in R to calculate moving averages. We’ll keep using the timetk package