Utils

The core.NeuralForecast class allows you to efficiently fit multiple NeuralForecast models for large sets of time series. It operates with pandas DataFrame df that identifies individual series and datestamps with the unique_id and ds columns, and the y column denotes the target time series variable. To assist development, we declare useful datasets that we use throughout all NeuralForecast’s unit tests.

1. Synthetic Panel Data

source

generate_series

 generate_series (n_series:int, freq:str='D', min_length:int=50,
                  max_length:int=500, n_temporal_features:int=0,
                  n_static_features:int=0, equal_ends:bool=False,
                  seed:int=0)

Generate Synthetic Panel Series.

Generates n_series of frequency freq of different lengths in the interval [min_length, max_length]. If n_temporal_features > 0, then each serie gets temporal features with random values. If n_static_features > 0, then a static dataframe is returned along the temporal dataframe. If equal_ends == True then all series end at the same date.

Parameters:
n_series: int, number of series for synthetic panel.
min_length: int, minimal length of synthetic panel’s series.
max_length: int, minimal length of synthetic panel’s series.
n_temporal_features: int, default=0, number of temporal exogenous variables for synthetic panel’s series.
n_static_features: int, default=0, number of static exogenous variables for synthetic panel’s series.
equal_ends: bool, if True, series finish in the same date stamp ds.
freq: str, frequency of the data, panda’s available frequencies.

Returns:
freq: pandas.DataFrame, synthetic panel with columns [unique_id, ds, y] and exogenous.

from neuralforecast.utils import generate_series

synthetic_panel = generate_series(n_series=2)
synthetic_panel.groupby('unique_id').head(4)

	ds	y
unique_id
0	2000-01-01	0.357595
0	2000-01-02	1.301382
0	2000-01-03	2.272442
0	2000-01-04	3.211827
1	2000-01-01	5.399023
1	2000-01-02	6.092818
1	2000-01-03	0.476396
1	2000-01-04	1.343744

temporal_df, static_df = generate_series(n_series=1000, n_static_features=2,
                                         n_temporal_features=4, equal_ends=False)
static_df.head(2)

2. AirPassengers Data

The classic Box & Jenkins airline data. Monthly totals of international airline passengers, 1949 to 1960.

It has been used as a reference on several forecasting libraries, since it is a series that shows clear trends and seasonalities it offers a nice opportunity to quickly showcase a model’s predictions performance.

from neuralforecast.utils import AirPassengersDF

AirPassengersDF.head(12)

	unique_id	ds	y
0	1.0	1949-01-31	112.0
1	1.0	1949-02-28	118.0
2	1.0	1949-03-31	132.0
3	1.0	1949-04-30	129.0
4	1.0	1949-05-31	121.0
5	1.0	1949-06-30	135.0
6	1.0	1949-07-31	148.0
7	1.0	1949-08-31	148.0
8	1.0	1949-09-30	136.0
9	1.0	1949-10-31	119.0
10	1.0	1949-11-30	104.0
11	1.0	1949-12-31	118.0

#We are going to plot the ARIMA predictions, and the prediction intervals.
fig, ax = plt.subplots(1, 1, figsize = (20, 7))
plot_df = AirPassengersDF.set_index('ds')

plot_df[['y']].plot(ax=ax, linewidth=2)
ax.set_title('AirPassengers Forecast', fontsize=22)
ax.set_ylabel('Monthly Passengers', fontsize=20)
ax.set_xlabel('Timestamp [t]', fontsize=20)
ax.legend(prop={'size': 15})
ax.grid()

import numpy as np
import pandas as pd

n_static_features = 3
n_series = 5

static_features = np.random.uniform(low=0.0, high=1.0, 
                        size=(n_series, n_static_features))
static_df = pd.DataFrame.from_records(static_features, 
                   columns = [f'static_{i}'for i in  range(n_static_features)])
static_df['unique_id'] = np.arange(n_series)

static_df