XGBoost forecaster

Wrapper of xgboost.dask.DaskXGBRegressor that adds a model_ property that contains the fitted model and is sent to the workers in the forecasting step.


 XGBForecast (max_depth:Optional[int]=None, max_leaves:Optional[int]=None,
              max_bin:Optional[int]=None, grow_policy:Optional[str]=None,
              learning_rate:Optional[float]=None, n_estimators:int=100,
              verbosity:Optional[int]=None, objective:Union[str,Callable[[
              ay]],NoneType]=None, booster:Optional[str]=None,
              tree_method:Optional[str]=None, n_jobs:Optional[int]=None,
              base_score:Optional[float]=None, random_state:Union[int,nump
              missing:float=nan, num_parallel_tree:Optional[int]=None,
              neType]=None, importance_type:Optional[str]=None,
              predictor:Optional[str]=None, enable_categorical:bool=False,
              early_stopping_rounds:Optional[int]=None, callbacks:Optional

Implementation of the Scikit-Learn API for XGBoost.

Type Default Details
max_depth typing.Optional[int] None Maximum tree depth for base learners.
max_leaves typing.Optional[int] None Maximum number of leaves; 0 indicates no limit.
max_bin typing.Optional[int] None If using histogram-based algorithm, maximum number of bins per feature
grow_policy typing.Optional[str] None Tree growing policy. 0: favor splitting at nodes closest to the node, i.e. grow
depth-wise. 1: favor splitting at nodes with highest loss change.
learning_rate typing.Optional[float] None Boosting learning rate (xgb’s “eta”)
n_estimators int 100 Number of gradient boosted trees. Equivalent to number of boosting
verbosity typing.Optional[int] None The degree of verbosity. Valid values are 0 (silent) - 3 (debug).
objective typing.Union[str, typing.Callable[[numpy.ndarray, numpy.ndarray], typing.Tuple[numpy.ndarray, numpy.ndarray]], NoneType] None Specify the learning task and the corresponding learning objective or
a custom objective function to be used (see note below).
booster typing.Optional[str] None
tree_method typing.Optional[str] None
n_jobs typing.Optional[int] None Number of parallel threads used to run xgboost. When used with other
Scikit-Learn algorithms like grid search, you may choose which algorithm to
parallelize and balance the threads. Creating thread contention will
significantly slow down both algorithms.
gamma typing.Optional[float] None (min_split_loss) Minimum loss reduction required to make a further partition on a
leaf node of the tree.
min_child_weight typing.Optional[float] None Minimum sum of instance weight(hessian) needed in a child.
max_delta_step typing.Optional[float] None Maximum delta step we allow each tree’s weight estimation to be.
subsample typing.Optional[float] None Subsample ratio of the training instance.
sampling_method typing.Optional[str] None Sampling method. Used only by gpu_hist tree method.
- uniform: select random training instances uniformly.
- gradient_based select random training instances with higher probability when
the gradient and hessian are larger. (cf. CatBoost)
colsample_bytree typing.Optional[float] None Subsample ratio of columns when constructing each tree.
colsample_bylevel typing.Optional[float] None Subsample ratio of columns for each level.
colsample_bynode typing.Optional[float] None Subsample ratio of columns for each split.
reg_alpha typing.Optional[float] None L1 regularization term on weights (xgb’s alpha).
reg_lambda typing.Optional[float] None L2 regularization term on weights (xgb’s lambda).
scale_pos_weight typing.Optional[float] None Balancing of positive and negative weights.
base_score typing.Optional[float] None The initial prediction score of all instances, global bias.
random_state typing.Union[int, numpy.random.mtrand.RandomState, NoneType] None Random number seed.

.. note::

Using gblinear booster with shotgun updater is nondeterministic as
it uses Hogwild algorithm.
missing float nan Value in the data which needs to be present as a missing value.
num_parallel_tree typing.Optional[int] None
monotone_constraints typing.Union[typing.Dict[str, int], str, NoneType] None Constraint of variable monotonicity. See :doc:tutorial </tutorials/monotonic>
for more information.
interaction_constraints typing.Union[str, typing.Sequence[typing.Sequence[str]], NoneType] None Constraints for interaction representing permitted interactions. The
constraints must be specified in the form of a nested list, e.g. [[0, 1], [2,<br>3, 4]], where each inner list is a group of indices of features that are
allowed to interact with each other. See :doc:tutorial<br></tutorials/feature_interaction_constraint> for more information
importance_type typing.Optional[str] None
gpu_id typing.Optional[int] None Device ordinal.
validate_parameters typing.Optional[bool] None Give warnings for unknown parameter.
predictor typing.Optional[str] None Force XGBoost to use specific predictor, available choices are [cpu_predictor,
enable_categorical bool False .. versionadded:: 1.5.0

.. note:: This parameter is experimental

Experimental support for categorical data. When enabled, cudf/pandas.DataFrame
should be used to specify categorical data type. Also, JSON/UBJSON
serialization format is required.
feature_types typing.Sequence[str] None .. versionadded:: 1.7.0

Used for specifying feature types without constructing a dataframe. See
:py:class:DMatrix for details.
max_cat_to_onehot typing.Optional[int] None .. versionadded:: 1.6.0

.. note:: This parameter is experimental

A threshold for deciding whether XGBoost should use one-hot encoding based split
for categorical data. When number of categories is lesser than the threshold
then one-hot encoding is chosen, otherwise the categories will be partitioned
into children nodes. Also, enable_categorical needs to be set to have
categorical feature support. See :doc:Categorical Data<br></tutorials/categorical> and :ref:cat-param for details.
max_cat_threshold typing.Optional[int] None .. versionadded:: 1.7.0

.. note:: This parameter is experimental

Maximum number of categories considered for each split. Used only by
partition-based splits for preventing over-fitting. Also, enable_categorical
needs to be set to have categorical feature support. See :doc:Categorical Data<br></tutorials/categorical> and :ref:cat-param for details.
eval_metric typing.Union[str, typing.List[str], typing.Callable, NoneType] None .. versionadded:: 1.6.0

Metric used for monitoring the training result and early stopping. It can be a
string or list of strings as names of predefined metric in XGBoost (See
doc/parameter.rst), one of the metrics in :py:mod:sklearn.metrics, or any other
user defined metric that looks like sklearn.metrics.

If custom objective is also provided, then custom metric should implement the
corresponding reverse link function.

Unlike the scoring parameter commonly used in scikit-learn, when a callable
object is provided, it’s assumed to be a cost function and by default XGBoost will
minimize the result during early stopping.

For advanced usage on Early stopping like directly choosing to maximize instead of
minimize, see :py:obj:xgboost.callback.EarlyStopping.

See :doc:Custom Objective and Evaluation Metric </tutorials/custom_metric_obj>
for more.

.. note::

This parameter replaces eval_metric in :py:meth:fit method. The old one
receives un-transformed prediction regardless of whether custom objective is
being used.

.. code-block:: python

from sklearn.datasets import load_diabetes
from sklearn.metrics import mean_absolute_error
X, y = load_diabetes(return_X_y=True)
reg = xgb.XGBRegressor(
reg.fit(X, y, eval_set=[(X, y)])
early_stopping_rounds typing.Optional[int] None .. versionadded:: 1.6.0

Activates early stopping. Validation metric needs to improve at least once in
every early_stopping_rounds round(s) to continue training. Requires at least
one item in eval_set in :py:meth:fit.

The method returns the model from the last iteration (not the best one). If
there’s more than one item in eval_set, the last entry will be used for early
stopping. If there’s more than one metric in eval_metric, the last metric
will be used for early stopping.

If early stopping occurs, the model will have three additional fields:
:py:attr:best_score, :py:attr:best_iteration and

.. note::

This parameter replaces early_stopping_rounds in :py:meth:fit method.
callbacks typing.Optional[typing.List[xgboost.callback.TrainingCallback]] None List of callback functions that are applied at end of each iteration.
It is possible to use predefined callbacks by using
:ref:Callback API <callback_api>.

.. note::

States in callback are not preserved during training, which means callback
objects can not be reused for multiple training sessions without
reinitialization or deepcopy.

.. code-block:: python

for params in parameters_grid:
# be sure to (re)initialize the callbacks before each run
callbacks = [xgb.callback.LearningRateScheduler(custom_rates)]
xgboost.train(params, Xy, callbacks=callbacks)
kwargs typing.Any Keyword arguments for XGBoost Booster object. Full documentation of parameters
can be found :doc:here </parameter>.
Attempting to set a parameter via the constructor args and **kwargs
dict simultaneously will result in a TypeError.

.. note:: **kwargs unsupported by scikit-learn

**kwargs is unsupported by scikit-learn. We do not guarantee
that parameters passed via this argument will interact properly
with scikit-learn.
Returns None
import dask
import dask.dataframe as dd
import numpy as np
import pandas as pd

from dask.distributed import Client
from mlforecast.distributed import DistributedForecast
from mlforecast.utils import generate_daily_series
client = Client(n_workers=2, threads_per_worker=1)
series = generate_daily_series(100)
distr_series = dd.from_pandas(series, npartitions=2)
fcst = DistributedForecast(
fcst.fit(distr_series, id_col='index', time_col='ds', target_col='y')
actual = fcst.predict(1).compute()

def get_updates(ts):
    upd = ts._update_features()
    return upd.drop(columns='ds')

upd_futures = client.map(get_updates, fcst.dts.ts)
upd_ddf = dd.from_delayed(upd_futures)
expected = fcst.models_['XGBForecast'].predict(upd_ddf.compute())

np.testing.assert_equal(actual['XGBForecast'].values, expected)
[00:07:56] task [xgboost.dask-0]:tcp:// got new rank 0
[00:07:56] task [xgboost.dask-1]:tcp:// got new rank 1

Give us a ⭐ on Github