Skip to contents

Special topics

This vignette explains some special topics regarding the use of TimeGPT via nixtlar.

1. Handling missing values

Before using TimeGPT, you need to ensure that:

  1. The target column contains no missing values (NA).
  2. Given the frequency of the data, the dates are continuous, with no missing dates between the start and the end dates.

Regarding the second point, it is worth mentioning that it is possible to have multiple time series that start and end on different dates, but each series must contain uninterrupted data for its given time frame.

There are several ways to check for missing values in R. One method is with the any and is.na functions from base R.


df <- nixtlar::electricity # load data 

# create some missing values at random 
index <- sample(nrow(df), 10)
df$y[index] <- NA

# check for missing values 
any(is.na(df)) # will return TRUE if there are missing values 
#> [1] TRUE

If you find missing values in your data, you need to decide how to fill them, which is very context-dependent. For example, if you are dealing with daily retail data, a missing value most likely indicates that there were no sales on that day, and you can probably fill it with zero. However, if you are working with hourly temperature data, a missing value likely means that the sensor was not functioning correctly, and you might prefer to use interpolation to fill the missing values. Whatever you decide to do, always keep in mind the nature of your data.

Checking if there are missing dates is more complicated since it depends on the frequency of the data. Sometimes plotting can help spot large gaps. nixtlar has a plotting function called nixtla_client_plot that can be used for this.

However, this method is ineffective when the missing dates are not continuous. One possible solution is to compare the dates for every unique id with a vector of dates generated using the start date, the end date, and the frequency of your data. This requires knowing such information, which can become tricky when working with hundreds or thousands of time series.

2. Specifying the frequency of your data

The frequency parameter is crucial when working with time series data because it informs the model about the expected intervals between data points. The core functions of nixtlar that interface with TimeGPT, such as nixtla_client_forecast, nixtla_client_historic, nixtla_client_detect_anomalies, and nixtla_client_cross_validation, require you to specify the freq parameter, although in some cases nixtlar can deduce it from your data.

TimeGPT supports the following aliases:

Frequency Alias
Yearly Y
Quarterly Q, QS, or QE
Monthly M, MS, or MS
Weekly (starting Sundays) W
Daily d
Hourly h
Minute-level min
Second-level s
Business day B

In this table, QS and MS stand for quarter and month start, while QE and ME stand for quarter and month end. Hourly and subhourly frequencies can be preceded by an integer, such as “6h”, “10min” or “30s”. Only the aliases “min” and “s” are allowed for minute and second-level frequencies.

The default value of the frequency parameter is NULL. When this parameter is not specified, nixtlar will attempt to determine the frequency of your data.


df <- nixtlar::electricity
fcst <- nixtlar::nixtla_client_forecast(df, h = 8, level = c(80,95)) # freq = "h"
#> Frequency chosen: h
# infer the frequency when `freq` is not specified