Challenge Objective

Goal is to perform an 8-week revenue forecast using ARIMA, Prophet, Exponential Smoothing, and TBATS models. Experiment with different parameter settings and feature engineering in time series forecasting.

Libraries

# Modeling
library(tidymodels)
library(modeltime)

# Core
library(tidyverse)
library(timetk)
library(lubridate)

Import Artifacts

First, import the challenge_02_artifacts.

challenge_02_artifacts <- read_rds("challenge_03_data/challenge_02_artifacts.rds")

Interactively, explore the artifacts, which includes the following: - raw data and process datasets with data at weekly level - transformation parameters - training/test splits for modeling - workflow modeling recipes

challenge_02_artifacts %>% View("artifacts")

Data Preparation

Load the artifacts.

# Processed Data
data_prepared_tbl <- challenge_02_artifacts$processed_data$data_prepared_tbl
forecast_tbl      <- challenge_02_artifacts$processed_data$forecast_tbl

# Train/Test Splits
splits            <- challenge_02_artifacts$train_test_splits$splits

# Inversion Parameters
std_mean <- challenge_02_artifacts$transformation_params$standardize_params$std_mean
std_sd   <- challenge_02_artifacts$transformation_params$standardize_params$std_sd

Train/Test Splits

Preview training data.

training(splits)

## # A tibble: 83 x 10
##    purchased_at revenue lag_revenue_lag8 lag_revenue_lag8_r~ lag_revenue_lag8_r~
##    <date>         <dbl>            <dbl>               <dbl>               <dbl>
##  1 2018-06-03    -1.37             NA                  NA                     NA
##  2 2018-06-10    -1.13             NA                  NA                     NA
##  3 2018-06-17    -1.06             NA                  NA                     NA
##  4 2018-06-24    -1.24             NA                  NA                     NA
##  5 2018-07-01    -1.55             NA                  NA                     NA
##  6 2018-07-08    -0.805            NA                  NA                     NA
##  7 2018-07-15    -1.42             NA                  NA                     NA
##  8 2018-07-22    -0.689            NA                  NA                     NA
##  9 2018-07-29    -1.02             -1.37               NA                     NA
## 10 2018-08-05    -1.20             -1.13               -1.20                  NA
## # ... with 73 more rows, and 5 more variables: lag_revenue_lag8_roll_16 <dbl>,
## #   lag_revenue_lag8_roll_24 <dbl>, lag_revenue_lag8_roll_36 <dbl>,
## #   event_november_sale <dbl>, event_product_launch <dbl>

Visualize the train/test split.

splits %>% 
    tk_time_series_cv_plan() %>% 
    plot_time_series_cv_plan(.date_var = purchased_at,.value = revenue)

Modeling

There are 3 Sections in Modeling process:

ARIMA: Models 1-5
Prophet: Models 6-9
Exponential Smoothing & TBATS: Models 10-11

ARIMA

We’ll start by modeling the revenue with ARIMA.

Model 1 - Basic Auto ARIMA

model_fit_1_arima_basic <- arima_reg() %>% 
    set_engine("auto_arima") %>% 
    fit(revenue ~ purchased_at, 
        data = training(splits))

## frequency = 13 observations per 1 quarter

model_fit_1_arima_basic

## parsnip model object
## 
## Series: outcome 
## ARIMA(0,1,1) 
## 
## Coefficients:
##           ma1
##       -0.7828
## s.e.   0.0825
## 
## sigma^2 estimated as 0.5224:  log likelihood=-89.71
## AIC=183.41   AICc=183.56   BIC=188.22

Observations: - Auto ARIMA produced orders of (0, 1, 1) - 0 lags were used - 1 differencing was performed - 1 lagged error features used? - With this weekly dataset, auto-generate a frequency of 13 weeks per quarter - This is a non-seasonal model since no seasonal terms were in the output

Model 2 - Add Product Events

Next, I’ll repeat the model but add events such as November sale and product launches:

Observations: - The AIC improved from Model 1 of 183.41 to Model 2 of 128.99 - Coefficient in model add value and are not close to zero - This is not a seasonal model

model_fit_2_arima_xregs <- arima_reg() %>% 
    set_engine("auto_arima") %>% 
    fit(revenue ~ purchased_at + event_november_sale + event_product_launch, 
        data = training(splits))

## frequency = 13 observations per 1 quarter

model_fit_2_arima_xregs

## parsnip model object
## 
## Series: outcome 
## Regression with ARIMA(1,1,2) errors 
## 
## Coefficients:
##           ar1      ma1      ma2  event_november_sale  event_product_launch
##       -0.8403  -0.1553  -0.4298               1.2105                1.3999
## s.e.   0.1028   0.1542   0.1402               0.2182                0.1581
## 
## sigma^2 estimated as 0.2555:  log likelihood=-58.49
## AIC=128.99   AICc=130.11   BIC=143.43

Model 3 - Add Seasonality + Product Events

After reviewing the ACF of the ARIMA model below are observations: - ACF chart shows spikes in ACF Lags 4 - PACF chart shows spikes in PACF Lags 1 and 3

training(splits) %>% 
    plot_acf_diagnostics(.date_var = purchased_at, .value = diff_vec(revenue))

## diff_vec(): Initial values: -1.36527984183316

## Max lag exceeds data available. Using max lag: 82

## Warning: `gather_()` was deprecated in tidyr 1.2.0.
## Please use `gather()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.

On the 3rd Auto ARIMA model, I’ll build upon the 2nd model and add season_period = 4 to try and capture the ACF Lag 4 as a seasonality

Observation: - No this is still not a seasonal model after using auto arima

model_fit_3_arima_sarimax <- arima_reg(seasonal_period = 4) %>% 
    set_engine("auto_arima") %>% 
    fit(revenue ~ purchased_at 
                + event_november_sale 
                + event_product_launch, 
        data = training(splits))

model_fit_3_arima_sarimax

## parsnip model object
## 
## Series: outcome 
## Regression with ARIMA(1,1,2) errors 
## 
## Coefficients:
##           ar1      ma1      ma2  event_november_sale  event_product_launch
##       -0.8403  -0.1553  -0.4298               1.2105                1.3999
## s.e.   0.1028   0.1542   0.1402               0.2182                0.1581
## 
## sigma^2 estimated as 0.2555:  log likelihood=-58.49
## AIC=128.99   AICc=130.11   BIC=143.43

Model 4 - Force Seasonality w/ Regular ARIMA

Now, I’ll force seasonality using ARIMA (not Auto-ARIMA) by adding monthly seasonality (4 weeks in a month)

Observations: - Now this is a seasonal model since we forced it. - Adding seasonality did not appear to improve the model

model_fit_4_arima_sarimax <- arima_reg(seasonal_period = 4, 
          # Non-seasonal terms
          non_seasonal_ar = 2, 
          non_seasonal_differences = 1, 
          non_seasonal_ma = 2, 
          # Seasonal terms
          seasonal_ar = 1, 
          seasonal_differences = 0, 
          seasonal_ma = 1) %>% 
    set_engine("arima") %>% 
    fit(revenue ~ purchased_at + 
                  event_november_sale + 
                  event_product_launch, 
        data = training(splits))

model_fit_4_arima_sarimax

## parsnip model object
## 
## Series: outcome 
## Regression with ARIMA(2,1,2)(1,0,1)[4] errors 
## 
## Coefficients:
##           ar1     ar2      ma1      ma2    sar1     sma1  event_november_sale
##       -0.4448  0.2099  -0.5349  -0.2587  0.7479  -0.6177               1.1829
## s.e.   0.3813  0.1773   0.3754   0.2754  0.3449   0.4081               0.2182
##       event_product_launch
##                     1.3866
## s.e.                0.1646
## 
## sigma^2 estimated as 0.2595:  log likelihood=-57.6
## AIC=133.2   AICc=135.7   BIC=154.86

Model 5 - Use Fourier Terms + Product Events instead of Seasonality

Next, I’ll try to improve the auto arima model by adding fourier series at period = 4 to capture strong ACF relationship at Lag 4.

Observations: - The AIC did not improve much from Model 2 of 128.99, but it is close to it

model_fit_5_arima_xreg_fourier <- arima_reg() %>% 
    set_engine("auto_arima") %>% 
    fit(revenue ~ purchased_at 
                + event_november_sale 
                + event_product_launch +     
                fourier_vec(purchased_at, period = 4), 
        data = training(splits))

## frequency = 13 observations per 1 quarter

model_fit_5_arima_xreg_fourier

## parsnip model object
## 
## Series: outcome 
## Regression with ARIMA(1,1,1) errors 
## 
## Coefficients:
##           ar1      ma1   drift  event_november_sale  event_product_launch
##       -0.3001  -0.9231  0.0263               1.1360                1.4002
## s.e.   0.1579   0.1768  0.0050               0.2445                0.1778
##       fourier_vec_purchased_at_period_4
##                                  0.0489
## s.e.                             0.0701
## 
## sigma^2 estimated as 0.2503:  log likelihood=-57.7
## AIC=129.4   AICc=130.91   BIC=146.25

Investigate - Modeltime Workflow

Model Table

Consolidate Models 1 - 5 (ARIMA models)into a table.

model_tbl_arima <- modeltime_table(
    model_fit_1_arima_basic,
    model_fit_2_arima_xregs,
    model_fit_3_arima_sarimax,
    model_fit_4_arima_sarimax,
    model_fit_5_arima_xreg_fourier
)

model_tbl_arima

## # Modeltime Table
## # A tibble: 5 x 3
##   .model_id .model   .model_desc                                  
##       <int> <list>   <chr>                                        
## 1         1 <fit[+]> ARIMA(0,1,1)                                 
## 2         2 <fit[+]> REGRESSION WITH ARIMA(1,1,2) ERRORS          
## 3         3 <fit[+]> REGRESSION WITH ARIMA(1,1,2) ERRORS          
## 4         4 <fit[+]> REGRESSION WITH ARIMA(2,1,2)(1,0,1)[4] ERRORS
## 5         5 <fit[+]> REGRESSION WITH ARIMA(1,1,1) ERRORS

Calibration Table

Calibrate models on the testing split.

calibration_tbl <- 
    model_tbl_arima %>% 
    modeltime_calibrate(testing(splits))

calibration_tbl

## # Modeltime Table
## # A tibble: 5 x 5
##   .model_id .model   .model_desc                          .type .calibration_da~
##       <int> <list>   <chr>                                <chr> <list>          
## 1         1 <fit[+]> ARIMA(0,1,1)                         Test  <tibble [8 x 4]>
## 2         2 <fit[+]> REGRESSION WITH ARIMA(1,1,2) ERRORS  Test  <tibble [8 x 4]>
## 3         3 <fit[+]> REGRESSION WITH ARIMA(1,1,2) ERRORS  Test  <tibble [8 x 4]>
## 4         4 <fit[+]> REGRESSION WITH ARIMA(2,1,2)(1,0,1)~ Test  <tibble [8 x 4]>
## 5         5 <fit[+]> REGRESSION WITH ARIMA(1,1,1) ERRORS  Test  <tibble [8 x 4]>

Test Accuracy

Calculate the test accuracy.

Observations: - Best model so far is Model 5 with lower MAE and higher R-square compared to other models in the table

calibration_tbl %>% 
    modeltime_accuracy()

## Warning: A correlation computation is required, but `estimate` is constant
## and has 0 standard deviation, resulting in a divide by 0 error. `NA` will be
## returned.

## # A tibble: 5 x 9
##   .model_id .model_desc             .type   mae  mape  mase smape  rmse      rsq
##       <int> <chr>                   <chr> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>
## 1         1 ARIMA(0,1,1)            Test  0.263 108.  0.580  34.3 0.373 NA      
## 2         2 REGRESSION WITH ARIMA(~ Test  0.321  84.9 0.709  43.8 0.409  1.14e-4
## 3         3 REGRESSION WITH ARIMA(~ Test  0.321  84.9 0.709  43.8 0.409  1.14e-4
## 4         4 REGRESSION WITH ARIMA(~ Test  0.327  88.8 0.722  44.2 0.424  5.30e-2
## 5         5 REGRESSION WITH ARIMA(~ Test  0.221 100.  0.488  30.1 0.351  1.20e-1

Test Forecast

Forecast testing data using ARIMA models

calibration_tbl %>% 
    modeltime_forecast(new_data = testing(splits), 
                   actual_data = data_prepared_tbl) %>% 
    plot_modeltime_forecast()

ARIMA Forecast Review

For the test data set (out of sample test), the ARIMA forecasts performed the best using Model 5 since it had the lowest MAE and explained most variance in data, which uses events and fourier series
Global trend is overall increasing, but the models don’t do a great job in capturing the global trend as most don’t show upward trend
Local trend on the most recent data a more steady and less steep upward trend, which seems to be captured my most models

Prophet

Next, I’ll experiment with prophet algorithm.

Model 6 - Basic Prophet

model_fit_6_prophet_basic <- prophet_reg() %>% 
    set_engine("prophet") %>% 
    fit(revenue ~ purchased_at, 
        data = training(splits))

## Disabling yearly seasonality. Run prophet with yearly.seasonality=TRUE to override this.

## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.

## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.

model_fit_6_prophet_basic

## parsnip model object
## 
## PROPHET Model
## - growth: 'linear'
## - n.changepoints: 25
## - changepoint.range: 0.8
## - yearly.seasonality: 'auto'
## - weekly.seasonality: 'auto'
## - daily.seasonality: 'auto'
## - seasonality.mode: 'additive'
## - changepoint.prior.scale: 0.05
## - seasonality.prior.scale: 10
## - holidays.prior.scale: 10
## - logistic_cap: NULL
## - logistic_floor: NULL
## - extra_regressors: 0

Model 7 - Turn on yearly seasonality

Next, I’ll try turning yearly seasonality on.

model_fit_7_prophet_yearly <- prophet_reg(
    seasonality_yearly = TRUE
) %>% 
    set_engine("prophet") %>% 
    fit(revenue ~ purchased_at, 
        data = training(splits))

## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.

## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.

model_fit_7_prophet_yearly

## parsnip model object
## 
## PROPHET Model
## - growth: 'linear'
## - n.changepoints: 25
## - changepoint.range: 0.8
## - yearly.seasonality: 'TRUE'
## - weekly.seasonality: 'auto'
## - daily.seasonality: 'auto'
## - seasonality.mode: 'additive'
## - changepoint.prior.scale: 0.05
## - seasonality.prior.scale: 10
## - holidays.prior.scale: 10
## - logistic_cap: NULL
## - logistic_floor: NULL
## - extra_regressors: 0

Model 8 - Product Events

Next, I’ll remove forcing yearly seasonality, but add events.

model_fit_8_prophet_events <- prophet_reg(
) %>% 
    set_engine("prophet") %>% 
    fit(revenue ~ purchased_at 
        + event_november_sale
        + event_product_launch, 
        data = training(splits))

## Disabling yearly seasonality. Run prophet with yearly.seasonality=TRUE to override this.

## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.

## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.

model_fit_8_prophet_events

## parsnip model object
## 
## PROPHET w/ Regressors Model
## - growth: 'linear'
## - n.changepoints: 25
## - changepoint.range: 0.8
## - yearly.seasonality: 'auto'
## - weekly.seasonality: 'auto'
## - daily.seasonality: 'auto'
## - seasonality.mode: 'additive'
## - changepoint.prior.scale: 0.05
## - seasonality.prior.scale: 10
## - holidays.prior.scale: 10
## - logistic_cap: NULL
## - logistic_floor: NULL
## - extra_regressors: 2

Model 9 - Events + Fourier Series

Next, I’ll try another model that includes events and a fourier series similar to ARIMA models with period = 4

model_fit_9_prophet_events_fourier <- prophet_reg(
) %>% 
    set_engine("prophet") %>% 
    fit(revenue ~ purchased_at 
        + event_november_sale
        + event_product_launch
        + fourier_vec(purchased_at, period = 4), 
        data = training(splits))

## Disabling yearly seasonality. Run prophet with yearly.seasonality=TRUE to override this.

## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.

## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.

model_fit_9_prophet_events_fourier

## parsnip model object
## 
## PROPHET w/ Regressors Model
## - growth: 'linear'
## - n.changepoints: 25
## - changepoint.range: 0.8
## - yearly.seasonality: 'auto'
## - weekly.seasonality: 'auto'
## - daily.seasonality: 'auto'
## - seasonality.mode: 'additive'
## - changepoint.prior.scale: 0.05
## - seasonality.prior.scale: 10
## - holidays.prior.scale: 10
## - logistic_cap: NULL
## - logistic_floor: NULL
## - extra_regressors: 3

Investigate - Modeltime Workflow

Model Table

Create a modeltime table with each of the prophet models 6-9 in the table

model_tbl_prophet <- modeltime_table(
    model_fit_6_prophet_basic,
    model_fit_7_prophet_yearly,
    model_fit_8_prophet_events,
    model_fit_9_prophet_events_fourier
)

model_tbl_prophet

## # Modeltime Table
## # A tibble: 4 x 3
##   .model_id .model   .model_desc          
##       <int> <list>   <chr>                
## 1         1 <fit[+]> PROPHET              
## 2         2 <fit[+]> PROPHET              
## 3         3 <fit[+]> PROPHET W/ REGRESSORS
## 4         4 <fit[+]> PROPHET W/ REGRESSORS

Calibration Table

Next, calibrate the models using testing data.

calibration_tbl <- model_tbl_prophet %>% 
    modeltime_calibrate(testing(splits))

calibration_tbl

## # Modeltime Table
## # A tibble: 4 x 5
##   .model_id .model   .model_desc           .type .calibration_data
##       <int> <list>   <chr>                 <chr> <list>           
## 1         1 <fit[+]> PROPHET               Test  <tibble [8 x 4]> 
## 2         2 <fit[+]> PROPHET               Test  <tibble [8 x 4]> 
## 3         3 <fit[+]> PROPHET W/ REGRESSORS Test  <tibble [8 x 4]> 
## 4         4 <fit[+]> PROPHET W/ REGRESSORS Test  <tibble [8 x 4]>

Test Accuracy

Calculate the accuracy.

Observations: - Out of the prophet models, Model 9 has lowest MAE and explains the most variance (R-square) compared to other prophet models - Model 9 (prophet model) performs close to Model 5 (ARIMA model), but ARIMA Model 5 is still better. In both models, we included fourier series and events.

calibration_tbl %>% 
    modeltime_accuracy()

## # A tibble: 4 x 9
##   .model_id .model_desc           .type   mae  mape  mase smape  rmse     rsq
##       <int> <chr>                 <chr> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
## 1         1 PROPHET               Test  0.423  166. 0.934  45.3 0.550 0.00563
## 2         2 PROPHET               Test  0.696  149. 1.54   63.6 0.845 0.0120 
## 3         3 PROPHET W/ REGRESSORS Test  0.256  110. 0.567  33.5 0.372 0.00563
## 4         4 PROPHET W/ REGRESSORS Test  0.235  105. 0.520  31.5 0.357 0.0843

Test Forecast

Finally, forecast the calibrated models for testing data.

calibration_tbl %>% 
    modeltime_forecast(new_data = testing(splits), 
                       actual_data = data_prepared_tbl) %>% 
    plot_modeltime_forecast()

Prophet Forecast Review

Model 8 and 9 had performed the best on forecasting test data. Model 9 was slightly better with lower MAE and highest explainable variance
Most models were able to forecast the global trend (upward trend), but not the local trend (slightly flat or less steep)

Exponential Smoothing

Next, I’ll experiment incorporate exponential smoothing.

Model 10 - ETS

The first model I’ll experiment with the automated “ETS” model:

Observations: - ETS parameters show (A, A, N), which means - Yes, this is an exponentially smoothed error model since first parameter is additive and there is an alpha parameter - Yes, this a trend based model since second parameter indicate additive and there’s is a beta parameter - No, this not a seasonal model as third parameter is no.

model_fit_10_ets <- exp_smoothing() %>% 
    set_engine("ets") %>% 
    fit(revenue ~ purchased_at, 
        data = training(splits))

## frequency = 13 observations per 1 quarter

model_fit_10_ets

## parsnip model object
## 
## ETS(A,A,N) 
## 
## Call:
##  forecast::ets(y = outcome, model = model_ets, damped = damping_ets,  
## 
##  Call:
##      alpha = alpha, beta = beta, gamma = gamma) 
## 
##   Smoothing parameters:
##     alpha = 1e-04 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = -1.1407 
##     b = 0.0268 
## 
##   sigma:  0.6873
## 
##      AIC     AICc      BIC 
## 310.4166 311.1958 322.5108

Model 11 - TBATS

Next, I’ll experiment with TBATS model with seasonality with seasonal_period_1 = 4 due to ACF and seasonal_period_2 = 13 for quarterly frequency. No external regressors can be added to this type of model.

model_fit_11_tbats <- seasonal_reg(
    seasonal_period_1 = 4, 
    seasonal_period_2 = 13
) %>% 
    set_engine("tbats") %>% 
    fit(revenue ~ purchased_at, 
        data = training(splits))

model_fit_11_tbats

## parsnip model object
## 
## BATS(1, {0,0}, -, -)
## 
## Call: forecast::tbats(y = outcome, use.parallel = use.parallel)
## 
## Parameters
##   Alpha: 0.2029992
## 
## Seed States:
##           [,1]
## [1,] -1.013346
## 
## Sigma: 0.714492
## AIC: 314.9573

Investigate - Modeltime Workflow

Model Table

Create a modeltime table with each of the exponential smoothing models 10-11 in the table.

model_tbl_exp_smooth <- modeltime_table(
    model_fit_10_ets, 
    model_fit_11_tbats
) 

model_tbl_exp_smooth

## # Modeltime Table
## # A tibble: 2 x 3
##   .model_id .model   .model_desc         
##       <int> <list>   <chr>               
## 1         1 <fit[+]> ETS(A,A,N)          
## 2         2 <fit[+]> BATS(1, {0,0}, -, -)

Calibration Table

Next, calibrate the models using your testing set.

calibration_tbl <- model_tbl_exp_smooth %>% 
    modeltime_calibrate(new_data = testing(splits))

calibration_tbl

## # Modeltime Table
## # A tibble: 2 x 5
##   .model_id .model   .model_desc          .type .calibration_data
##       <int> <list>   <chr>                <chr> <list>           
## 1         1 <fit[+]> ETS(A,A,N)           Test  <tibble [8 x 4]> 
## 2         2 <fit[+]> BATS(1, {0,0}, -, -) Test  <tibble [8 x 4]>

Test Accuracy

Calculate the accuracy.

calibration_tbl %>% 
    modeltime_accuracy()

## Warning: A correlation computation is required, but `estimate` is constant
## and has 0 standard deviation, resulting in a divide by 0 error. `NA` will be
## returned.

## # A tibble: 2 x 9
##   .model_id .model_desc          .type   mae  mape  mase smape  rmse      rsq
##       <int> <chr>                <chr> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>
## 1         1 ETS(A,A,N)           Test  0.333  139. 0.737  39.3 0.444  0.00563
## 2         2 BATS(1, {0,0}, -, -) Test  0.265  108. 0.585  34.5 0.374 NA

Test Forecast

Finally, forecast the calibrated models on the testing data.

calibration_tbl %>% 
    modeltime_forecast(new_data = testing(splits), 
                       actual_data = data_prepared_tbl) %>% 
    plot_modeltime_forecast()

Exponential Smoothing Forecast Review

TBATS model appears to be better than ETS since it as a lower MAE, but it wasn’t able to capture the variance
Both exponential smoothing models perform below the ARIMA Model 5 and Prophet Model 9

Forecast Future Data

Forecast the future timeframe.

Model Table

Combine the 3 previous modeltime tables into a single modeltime table. Combine these Modeltime Tables: - model_tbl_arima - model_tbl_prophet - model_tbl_exp_smooth

model_tbl <- combine_modeltime_tables(
    model_tbl_arima,
    model_tbl_prophet,
    model_tbl_exp_smooth
)

model_tbl

## # Modeltime Table
## # A tibble: 11 x 3
##    .model_id .model   .model_desc                                  
##        <int> <list>   <chr>                                        
##  1         1 <fit[+]> ARIMA(0,1,1)                                 
##  2         2 <fit[+]> REGRESSION WITH ARIMA(1,1,2) ERRORS          
##  3         3 <fit[+]> REGRESSION WITH ARIMA(1,1,2) ERRORS          
##  4         4 <fit[+]> REGRESSION WITH ARIMA(2,1,2)(1,0,1)[4] ERRORS
##  5         5 <fit[+]> REGRESSION WITH ARIMA(1,1,1) ERRORS          
##  6         6 <fit[+]> PROPHET                                      
##  7         7 <fit[+]> PROPHET                                      
##  8         8 <fit[+]> PROPHET W/ REGRESSORS                        
##  9         9 <fit[+]> PROPHET W/ REGRESSORS                        
## 10        10 <fit[+]> ETS(A,A,N)                                   
## 11        11 <fit[+]> BATS(1, {0,0}, -, -)

# Refitting makes sure models work over time. 
model_tbl <- model_tbl %>%
    modeltime_refit(training(splits))

## frequency = 13 observations per 1 quarter
## frequency = 13 observations per 1 quarter
## frequency = 13 observations per 1 quarter

## Disabling yearly seasonality. Run prophet with yearly.seasonality=TRUE to override this.

## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.

## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.

## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.

## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.

## Disabling yearly seasonality. Run prophet with yearly.seasonality=TRUE to override this.

## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.

## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.

## Disabling yearly seasonality. Run prophet with yearly.seasonality=TRUE to override this.

## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.

## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.

## frequency = 13 observations per 1 quarter

Calibrate the Table

Use testing data to calibrate the model

calibration_tbl <- model_tbl %>% 
    modeltime_calibrate(testing(splits))

Calculate the Accuracy

Calculate the accuracy metrics.

calibration_tbl %>% 
    modeltime_accuracy()

## Warning: A correlation computation is required, but `estimate` is constant
## and has 0 standard deviation, resulting in a divide by 0 error. `NA` will be
## returned.

## Warning: A correlation computation is required, but `estimate` is constant
## and has 0 standard deviation, resulting in a divide by 0 error. `NA` will be
## returned.

## # A tibble: 11 x 9
##    .model_id .model_desc            .type   mae  mape  mase smape  rmse      rsq
##        <int> <chr>                  <chr> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>
##  1         1 ARIMA(0,1,1)           Test  0.263 108.  0.580  34.3 0.373 NA      
##  2         2 REGRESSION WITH ARIMA~ Test  0.321  84.9 0.709  43.8 0.409  1.14e-4
##  3         3 REGRESSION WITH ARIMA~ Test  0.321  84.9 0.709  43.8 0.409  1.14e-4
##  4         4 REGRESSION WITH ARIMA~ Test  0.327  88.8 0.722  44.2 0.424  5.30e-2
##  5         5 REGRESSION WITH ARIMA~ Test  0.221 100.  0.488  30.1 0.351  1.20e-1
##  6         6 PROPHET                Test  0.423 166.  0.934  45.3 0.550  5.63e-3
##  7         7 PROPHET                Test  0.696 149.  1.54   63.6 0.845  1.20e-2
##  8         8 PROPHET W/ REGRESSORS  Test  0.256 110.  0.567  33.5 0.372  5.63e-3
##  9         9 PROPHET W/ REGRESSORS  Test  0.235 105.  0.520  31.5 0.357  8.43e-2
## 10        10 ETS(A,A,N)             Test  0.333 139.  0.737  39.3 0.444  5.63e-3
## 11        11 BATS(1, {0,0}, -, -)   Test  0.265 108.  0.585  34.5 0.374 NA

Visualize the Model Forecast

calibration_tbl %>% 
    modeltime_forecast(new_data = testing(splits), 
                       actual_data = data_prepared_tbl) %>% 
    plot_modeltime_forecast()

Refit

Refit models on full dataset.

refit_tbl <- calibration_tbl %>% 
    modeltime_refit(data_prepared_tbl)

## frequency = 13 observations per 1 quarter
## frequency = 13 observations per 1 quarter
## frequency = 13 observations per 1 quarter

## Disabling yearly seasonality. Run prophet with yearly.seasonality=TRUE to override this.

## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.

## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.

## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.

## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.

## Disabling yearly seasonality. Run prophet with yearly.seasonality=TRUE to override this.

## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.

## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.

## Disabling yearly seasonality. Run prophet with yearly.seasonality=TRUE to override this.

## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.

## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.

## frequency = 13 observations per 1 quarter

Forecast Future

Forecast using future horizon in forecast_tbl.

refit_tbl %>% 
    modeltime_forecast(new_data = forecast_tbl, 
                       actual_data = data_prepared_tbl) %>% 
    plot_modeltime_forecast()

Invert Transformation

Apply the inversion to the forecast plot:

Invert the standardization
Invert the log transformation

refit_tbl %>% 
    modeltime_forecast(new_data = forecast_tbl, 
                       actual_data = data_prepared_tbl) %>% 
    
    # Invert transformation
    mutate(across(.value:.conf_hi, .fns = ~ standardize_inv_vec(
        x = ., 
        mean = std_mean, 
        sd = std_sd
    ))) %>% 
    mutate(across(.value:.conf_hi, .fns = exp)) %>%

    # Plot forecast
    plot_modeltime_forecast()

Forecast Review

Overall, forecast are able to pick up either the global or local trend
Best performing models are model 5 and model 9. Similarities between these were adding events and fourier series

Revenue Forecasting with ARIMA, Prophet, & ETS/TBATS

Cindy Khuu

Challenge Objective

Libraries

Import Artifacts

Data Preparation

Train/Test Splits

Modeling

ARIMA

Model 1 - Basic Auto ARIMA

Model 2 - Add Product Events

Model 3 - Add Seasonality + Product Events

Model 4 - Force Seasonality w/ Regular ARIMA

Model 5 - Use Fourier Terms + Product Events instead of Seasonality

Investigate - Modeltime Workflow

Model Table

Calibration Table

Test Accuracy

Test Forecast

ARIMA Forecast Review

Prophet

Model 6 - Basic Prophet

Model 7 - Turn on yearly seasonality

Model 8 - Product Events

Model 9 - Events + Fourier Series

Investigate - Modeltime Workflow

Model Table

Calibration Table

Test Accuracy

Test Forecast

Prophet Forecast Review

Exponential Smoothing

Model 10 - ETS

Model 11 - TBATS

Investigate - Modeltime Workflow

Model Table

Calibration Table

Test Accuracy

Test Forecast

Exponential Smoothing Forecast Review

Forecast Future Data

Model Table

Calibrate the Table

Calculate the Accuracy

Visualize the Model Forecast

Refit

Forecast Future

Invert Transformation

Forecast Review