Prediction Intervals
prediction-intervals.Rmd
library(nixtlar)
#> Registered S3 method overwritten by 'tsibble':
#> method from
#> as_tibble.grouped_df dplyr
1. Uncertainty quantification via prediction intervals
For uncertainty quantification, TimeGPT
can generate
both prediction intervals and quantiles, offering a measure of the range
of potential outcomes rather than just a single point forecast. In
real-life scenarios, forecasting often requires considering multiple
alternatives, not just one prediction. This vignette will explain how to
use prediction intervals with TimeGPT
via the
nixtlar
package.
A prediction interval is a range of values that the forecast can take with a given probability, often referred to as the confidence level. Hence, a 95% prediction interval should contain a range of values that includes the actual future value with a probability of 95%. Prediction intervals are part of probabilistic forecasting, which, unlike point forecasting, aims to generate the full forecast distribution instead of just the mean or the median of that distribution.
This vignette assumes you have already set up your API key. If you haven’t done this, please read the Get Started vignette first.
2. Load data
For this vignette, we’ll use the electricity consumption dataset that
is included in nixtlar
, which contains the hourly prices of
five different electricity markets.
df <- nixtlar::electricity
head(df)
#> unique_id ds y
#> 1 BE 2016-10-22 00:00:00 70.00
#> 2 BE 2016-10-22 01:00:00 37.10
#> 3 BE 2016-10-22 02:00:00 37.10
#> 4 BE 2016-10-22 03:00:00 44.75
#> 5 BE 2016-10-22 04:00:00 37.10
#> 6 BE 2016-10-22 05:00:00 35.61
3. Forecast with prediction intervals
TimeGPT
can generate prediction intervals when using the
following functions:
- nixtlar::nixtla_client_forecast()
- nixtlar::nixtla_client_historic()
- nixtlar::nixtla_client_detect_anomalies()
- nixtlar::nixtla_client_cross_validation()
For any of these functions, simply set the level
argument to the desired confidence level for the prediction intervals.
Keep in mind that level
should be a vector with numbers
between 0 and 100. You can use either quantiles
or
level
for uncertainty quantification, but not both.
fcst <- nixtla_client_forecast(df, h = 8, id_col = "unique_id", level=c(80,95))
#> Frequency chosen: H
head(fcst)
#> unique_id ds TimeGPT TimeGPT-lo-95 TimeGPT-lo-80
#> 1 BE 2016-12-31 00:00:00 45.19045 32.60115 40.42074
#> 2 BE 2016-12-31 01:00:00 43.24445 29.30454 36.91513
#> 3 BE 2016-12-31 02:00:00 41.95839 28.17721 35.55863
#> 4 BE 2016-12-31 03:00:00 39.79649 25.42790 33.45859
#> 5 BE 2016-12-31 04:00:00 39.20454 23.53869 30.35095
#> 6 BE 2016-12-31 05:00:00 40.10878 26.90472 31.60236
#> TimeGPT-hi-80 TimeGPT-hi-95
#> 1 49.96017 57.77975
#> 2 49.57376 57.18435
#> 3 48.35815 55.73957
#> 4 46.13438 54.16507
#> 5 48.05812 54.87038
#> 6 48.61520 53.31284
Note that the level
argument in the
nixtlar::nixtla_client_detect_anomalies()
function uses
only the maximum value when there are multiple values. Hence, setting
level=c(90,95,99)
, for example, is equivalent to setting
level=c(99)
, which is the default value.
anomalies <- nixtla_client_detect_anomalies(df, id_col = "unique_id") # level=c(90,95,99)
#> Frequency chosen: H
head(anomalies) # only the 99% confidence level is used
#> unique_id ds y anomaly TimeGPT-lo-99 TimeGPT
#> 1 BE 2016-10-27 00:00:00 52.58 0 -28.58336 56.07623
#> 2 BE 2016-10-27 01:00:00 44.86 0 -32.23986 52.41973
#> 3 BE 2016-10-27 02:00:00 42.31 0 -31.84485 52.81474
#> 4 BE 2016-10-27 03:00:00 39.66 0 -32.06933 52.59026
#> 5 BE 2016-10-27 04:00:00 38.98 0 -31.98661 52.67297
#> 6 BE 2016-10-27 05:00:00 42.31 0 -30.55300 54.10659
#> TimeGPT-hi-99
#> 1 140.7358
#> 2 137.0793
#> 3 137.4743
#> 4 137.2498
#> 5 137.3326
#> 6 138.7662
4. Plot prediction intervals
nixtlar
includes a function to plot the historical data
and any output from nixtlar::nixtla_client_forecast
,
nixtlar::nixtla_client_historic
,
nixtlar::nixtla_client_detect_anomalies
and
nixtlar::nixtla_client_cross_validation
. If you have long
series, you can use max_insample_length
to only plot the
last N historical values (the forecast will always be plotted in
full).
When available, nixtlar::nixtla_client_plot
will
automatically plot the prediction intervals.
nixtla_client_plot(df, fcst, id_col = "unique_id", max_insample_length = 100)
#> Frequency chosen: H
nixtlar::nixtla_client_plot(df, anomalies, id_col = "unique_id", plot_anomalies = TRUE)
#> Frequency chosen: H