XGBoost forecaster

Wrapper of xgboost.dask.DaskXGBRegressor that adds a model_ property that contains the fitted model and is sent to the workers in the forecasting step.


 XGBForecast (max_depth:Optional[int]=None, max_leaves:Optional[int]=None,
              max_bin:Optional[int]=None, grow_policy:Optional[str]=None,
              learning_rate:Optional[float]=None, n_estimators:int=100,
              verbosity:Optional[int]=None, objective:Union[str,Callable[[
              ay]],NoneType]=None, booster:Optional[str]=None,
              tree_method:Optional[str]=None, n_jobs:Optional[int]=None,
              base_score:Optional[float]=None, random_state:Union[numpy.ra
              missing:float=nan, num_parallel_tree:Optional[int]=None,
              neType]=None, importance_type:Optional[str]=None,
              predictor:Optional[str]=None, enable_categorical:bool=False,
              early_stopping_rounds:Optional[int]=None, callbacks:Optional

Implementation of the Scikit-Learn API for XGBoost.

Type Default Details
max_depth Optional None Maximum tree depth for base learners.
max_leaves Optional None Maximum number of leaves; 0 indicates no limit.
max_bin Optional None If using histogram-based algorithm, maximum number of bins per feature
grow_policy Optional None Tree growing policy. 0: favor splitting at nodes closest to the node, i.e. grow
depth-wise. 1: favor splitting at nodes with highest loss change.
learning_rate Optional None Boosting learning rate (xgb’s “eta”)
n_estimators int 100 Number of gradient boosted trees. Equivalent to number of boosting
verbosity Optional None The degree of verbosity. Valid values are 0 (silent) - 3 (debug).
objective Union None Specify the learning task and the corresponding learning objective or
a custom objective function to be used (see note below).
booster Optional None
tree_method Optional None
n_jobs Optional None Number of parallel threads used to run xgboost. When used with other Scikit-Learn
algorithms like grid search, you may choose which algorithm to parallelize and
balance the threads. Creating thread contention will significantly slow down both
gamma Optional None (min_split_loss) Minimum loss reduction required to make a further partition on a
leaf node of the tree.
min_child_weight Optional None Minimum sum of instance weight(hessian) needed in a child.
max_delta_step Optional None Maximum delta step we allow each tree’s weight estimation to be.
subsample Optional None Subsample ratio of the training instance.
sampling_method Optional None Sampling method. Used only by gpu_hist tree method.
- uniform: select random training instances uniformly.
- gradient_based select random training instances with higher probability when
the gradient and hessian are larger. (cf. CatBoost)
colsample_bytree Optional None Subsample ratio of columns when constructing each tree.
colsample_bylevel Optional None Subsample ratio of columns for each level.
colsample_bynode Optional None Subsample ratio of columns for each split.
reg_alpha Optional None L1 regularization term on weights (xgb’s alpha).
reg_lambda Optional None L2 regularization term on weights (xgb’s lambda).
scale_pos_weight Optional None Balancing of positive and negative weights.
base_score Optional None The initial prediction score of all instances, global bias.
random_state Union None Random number seed.

.. note::

Using gblinear booster with shotgun updater is nondeterministic as
it uses Hogwild algorithm.
missing float nan Value in the data which needs to be present as a missing value.
num_parallel_tree Optional None
monotone_constraints Union None Constraint of variable monotonicity. See :doc:tutorial </tutorials/monotonic>
for more information.
interaction_constraints Union None Constraints for interaction representing permitted interactions. The
constraints must be specified in the form of a nested list, e.g. [[0, 1], [2,<br>3, 4]], where each inner list is a group of indices of features that are
allowed to interact with each other. See :doc:tutorial<br></tutorials/feature_interaction_constraint> for more information
importance_type Optional None
gpu_id Optional None Device ordinal.
validate_parameters Optional None Give warnings for unknown parameter.
predictor Optional None Force XGBoost to use specific predictor, available choices are [cpu_predictor,
enable_categorical bool False .. versionadded:: 1.5.0

.. note:: This parameter is experimental

Experimental support for categorical data. When enabled, cudf/pandas.DataFrame
should be used to specify categorical data type. Also, JSON/UBJSON
serialization format is required.
max_cat_to_onehot Optional None .. versionadded:: 1.6.0

.. note:: This parameter is experimental

A threshold for deciding whether XGBoost should use one-hot encoding based split
for categorical data. When number of categories is lesser than the threshold
then one-hot encoding is chosen, otherwise the categories will be partitioned
into children nodes. Only relevant for regression and binary classification.
See :doc:Categorical Data </tutorials/categorical> for details.
eval_metric Union None .. versionadded:: 1.6.0

Metric used for monitoring the training result and early stopping. It can be a
string or list of strings as names of predefined metric in XGBoost (See
doc/parameter.rst), one of the metrics in :py:mod:sklearn.metrics, or any other
user defined metric that looks like sklearn.metrics.

If custom objective is also provided, then custom metric should implement the
corresponding reverse link function.

Unlike the scoring parameter commonly used in scikit-learn, when a callable
object is provided, it’s assumed to be a cost function and by default XGBoost will
minimize the result during early stopping.

For advanced usage on Early stopping like directly choosing to maximize instead of
minimize, see :py:obj:xgboost.callback.EarlyStopping.

See :doc:Custom Objective and Evaluation Metric </tutorials/custom_metric_obj>
for more.

.. note::

This parameter replaces eval_metric in :py:meth:fit method. The old one
receives un-transformed prediction regardless of whether custom objective is
being used.

.. code-block:: python

from sklearn.datasets import load_diabetes
from sklearn.metrics import mean_absolute_error
X, y = load_diabetes(return_X_y=True)
reg = xgb.XGBRegressor(
reg.fit(X, y, eval_set=[(X, y)])
early_stopping_rounds Optional None .. versionadded:: 1.6.0

Activates early stopping. Validation metric needs to improve at least once in
every early_stopping_rounds round(s) to continue training. Requires at least
one item in eval_set in :py:meth:fit.

The method returns the model from the last iteration (not the best one). If
there’s more than one item in eval_set, the last entry will be used for early
stopping. If there’s more than one metric in eval_metric, the last metric
will be used for early stopping.

If early stopping occurs, the model will have three additional fields:
:py:attr:best_score, :py:attr:best_iteration and

.. note::

This parameter replaces early_stopping_rounds in :py:meth:fit method.
callbacks Optional None List of callback functions that are applied at end of each iteration.
It is possible to use predefined callbacks by using
:ref:Callback API <callback_api>.

.. note::

States in callback are not preserved during training, which means callback
objects can not be reused for multiple training sessions without
reinitialization or deepcopy.

.. code-block:: python

for params in parameters_grid:
# be sure to (re)initialize the callbacks before each run
callbacks = [xgb.callback.LearningRateScheduler(custom_rates)]
xgboost.train(params, Xy, callbacks=callbacks)
kwargs Any Keyword arguments for XGBoost Booster object. Full documentation of parameters
can be found :doc:here </parameter>.
Attempting to set a parameter via the constructor args and **kwargs
dict simultaneously will result in a TypeError.

.. note:: **kwargs unsupported by scikit-learn

**kwargs is unsupported by scikit-learn. We do not guarantee
that parameters passed via this argument will interact properly
with scikit-learn.
Returns None
import dask
import dask.dataframe as dd
import numpy as np
import pandas as pd

from dask.distributed import Client
from mlforecast.distributed import DistributedForecast
from mlforecast.utils import generate_daily_series
client = Client(n_workers=2, threads_per_worker=1)
series = generate_daily_series(100)
distr_series = dd.from_pandas(series, npartitions=2)
fcst = DistributedForecast(
actual = fcst.predict(1).compute()

def get_updates(ts):
    upd = ts._update_features()
    return upd.drop(columns='ds')

upd_futures = client.map(get_updates, fcst.dts.ts)
upd_ddf = dd.from_delayed(upd_futures)
expected = fcst.models_[0].predict(upd_ddf).compute()

np.testing.assert_equal(actual['XGBRegressor'].values, expected)
[21:06:28] task [xgboost.dask]:tcp:// got new rank 0
[21:06:28] task [xgboost.dask]:tcp:// got new rank 1