1. Exogenous variables
Exogenous variables are external factors that provide additional information about the behavior of the target variable in time series forecasting. These variables, which are correlated with the target, can significantly improve predictions. Examples of exogenous variables include weather data, economic indicators, holiday markers, and promotional sales.
TimeGPT
allows you to include exogenous variables when
generating a forecast. This vignette will show you how to include them.
It assumes you have already set up your API key. If you haven’t done
this, please read the Get
Started vignette first.
2. Load data
For this vignette, we will use the electricity consumption dataset
with exogenous variables included in nixtlar
. This dataset
contains hourly prices from five different electricity markets, along
with two exogenous variables related to the prices and binary variables
indicating the day of the week.
df_exo_vars <- nixtlar::electricity_exo_vars
head(df_exo_vars)
#> unique_id ds y Exogenous1 Exogenous2 day_0 day_1 day_2
#> 1 BE 2016-10-22 00:00:00 70.00 49593 57253 0 0 0
#> 2 BE 2016-10-22 01:00:00 37.10 46073 51887 0 0 0
#> 3 BE 2016-10-22 02:00:00 37.10 44927 51896 0 0 0
#> 4 BE 2016-10-22 03:00:00 44.75 44483 48428 0 0 0
#> 5 BE 2016-10-22 04:00:00 37.10 44338 46721 0 0 0
#> 6 BE 2016-10-22 05:00:00 35.61 44504 46303 0 0 0
#> day_3 day_4 day_5 day_6
#> 1 0 0 1 0
#> 2 0 0 1 0
#> 3 0 0 1 0
#> 4 0 0 1 0
#> 5 0 0 1 0
#> 6 0 0 1 0
There are two types of exogenous variables: historic and future.
-
Historic Exogenous Variables: They should be
included directly in the input dataset
df
.
-
Future Exogenous Variables: They must be included
in the
X_df
parameter.
To specify which variables should be treated as historic, use the
hist_exog_list
parameter. This parameter is available in
both the forecast
and cross_validation
functions.
- If
df
contains exogenous variables but they are not found inX_df
nor declared inhist_exog_list
, they will be ignored.
- If exogenous variables were declared as historic but found in
X_df
, then they will be considered as historic.
In the next section, we will explore different cases for forecasting with exogenous variables.
3a. Forecasting electricity prices using historic and future exogenous variables
If both historic and future values of all exogenous variables are
available, include the historic exogenous variables in df
and the future exogenous variables in X_df
.
future_exo_vars <- nixtlar::electricity_future_exo_vars
head(future_exo_vars)
#> unique_id ds Exogenous1 Exogenous2 day_0 day_1 day_2 day_3
#> 1 BE 2016-12-31 00:00:00 64108 70318 0 0 0 0
#> 2 BE 2016-12-31 01:00:00 62492 67898 0 0 0 0
#> 3 BE 2016-12-31 02:00:00 61571 68379 0 0 0 0
#> 4 BE 2016-12-31 03:00:00 60381 64972 0 0 0 0
#> 5 BE 2016-12-31 04:00:00 60298 62900 0 0 0 0
#> 6 BE 2016-12-31 05:00:00 60339 62364 0 0 0 0
#> day_4 day_5 day_6
#> 1 0 1 0
#> 2 0 1 0
#> 3 0 1 0
#> 4 0 1 0
#> 5 0 1 0
#> 6 0 1 0
fcst_exo_vars <- nixtla_client_forecast(
df_exo_vars,
h = 24,
X_df = future_exo_vars
)
#> Frequency chosen: h
#> Using future exogenous features: [Exogenous1, Exogenous2, day_0, day_1, day_2, day_3, day_4, day_5, day_6]
head(fcst_exo_vars)
#> unique_id ds TimeGPT
#> 1 BE 2016-12-31 00:00:00 74.54077
#> 2 BE 2016-12-31 01:00:00 43.34429
#> 3 BE 2016-12-31 02:00:00 44.42921
#> 4 BE 2016-12-31 03:00:00 38.09440
#> 5 BE 2016-12-31 04:00:00 37.38914
#> 6 BE 2016-12-31 05:00:00 39.08574
3b. Forecasting electricity prices using only historic exogenous variables
If future values of the exogenous variables are not available, you
can still generate forecasts using only their historical values. In this
case, simply include them in df
and declare them in
hist_exog_list
.
fcst_exo_vars <- nixtla_client_forecast(
df_exo_vars,
h = 24,
hist_exog_list = c("Exogenous1", "Exogenous2", "day_0", "day_1", "day_2", "day_3", "day_4", "day_5", "day_6")
)
#> Frequency chosen: h
#> Using historical exogenous features: [Exogenous1, Exogenous2, day_0, day_1, day_2, day_3, day_4, day_5, day_6]
head(fcst_exo_vars)
#> unique_id ds TimeGPT
#> 1 BE 2016-12-31 00:00:00 45.76938
#> 2 BE 2016-12-31 01:00:00 47.99101
#> 3 BE 2016-12-31 02:00:00 49.49613
#> 4 BE 2016-12-31 03:00:00 49.51081
#> 5 BE 2016-12-31 04:00:00 48.51056
#> 6 BE 2016-12-31 05:00:00 50.20716
Note that if you don’t declare the exogenous variables in
hist_exog_list
, they will be ignored. If we hadn’t declared
them above, the output would be the same as the TimeGPT forecast using
only the target variable y
.
Important: If you include historical exogenous variables without explicitly defining their future values, you are implicitly assuming that their historical patterns will continue into the future. Whenever possible, it is recommended to use future exogenous variables to make these assumptions explicit.
3c. Forecasting future exogenous variables
When future exogenous variables are not available, an alternative
approach is to forecast them separately using TimeGPT. First, generate
forecasts for the exogenous variables and then pass the predicted values
in X_df
for the main forecast.
3d. Forecasting electricity prices using both future and historic exogenous variables
In some cases, only a subset of future exogenous variables is
available. For example, if future values of Exogenous1
and
Exogenous2
are unknown, add them to
hist_exog_list
.
future_exo_vars <- future_exo_vars |>
dplyr::select(-dplyr::all_of(c("Exogenous1", "Exogenous2")))
fcst_exo_vars <- nixtla_client_forecast(
df_exo_vars,
h = 24,
X_df = future_exo_vars,
hist_exog_list = c("Exogenous1", "Exogenous2")
)
#> Frequency chosen: h
#> The following features were declared as historic but found in X_df:: [Exogenous1, Exogenous2]. They will be considered historic.
#> Using future exogenous features: [day_0, day_1, day_2, day_3, day_4, day_5, day_6]
#> Using historical exogenous features: [Exogenous1, Exogenous2]
head(fcst_exo_vars)
#> unique_id ds TimeGPT
#> 1 BE 2016-12-31 00:00:00 47.05948
#> 2 BE 2016-12-31 01:00:00 49.28110
#> 3 BE 2016-12-31 02:00:00 50.78623
#> 4 BE 2016-12-31 03:00:00 50.80090
#> 5 BE 2016-12-31 04:00:00 49.80066
#> 6 BE 2016-12-31 05:00:00 51.49725
4. Plot TimeGPT forecast
nixtlar
includes a function to plot the historical data
and any output from nixtla_client_forecast
,
nixtla_client_historic
,
nixtla_client_anomaly_detection
and
nixtla_client_cross_validation
. If you have long series,
you can use max_insample_length
to only plot the last N
historical values (the forecast will always be plotted in full).
nixtla_client_plot(df_exo_vars, fcst_exo_vars, max_insample_length = 500)