Ranger Forest Regressor

class skranger.ensemble.RangerForestRegressor(n_estimators=100, *, verbose=False, mtry=0, importance='none', min_node_size=0, max_depth=0, replace=True, sample_fraction=None, keep_inbag=False, inbag=None, split_rule='variance', num_random_splits=1, alpha=0.5, minprop=0.1, split_select_weights=None, always_split_features=None, categorical_features=None, respect_categorical_features=None, scale_permutation_importance=False, local_importance=False, regularization_factor=None, regularization_usedepth=False, holdout=False, quantiles=False, oob_error=False, n_jobs=- 1, save_memory=False, seed=42, enable_tree_details=False)[source]

Ranger Random Forest Regression implementation for sci-kit learn.

Provides a sklearn regressor interface to the Ranger C++ library using Cython.

  • n_estimators (int) – The number of tree regressors to train

  • verbose (bool) – Enable ranger’s verbose logging

  • mtry (int/callable) – The number of features to split on each node. When a callable is passed, the function must accept a single parameter which is the number of features passed, and return some value between 1 and the number of features.

  • importance (str) – One of one of none, impurity, impurity_corrected, permutation.

  • min_node_size (int) – The minimal node size.

  • max_depth (int) – The maximal tree depth; 0 means unlimited.

  • replace (bool) – Sample with replacement.

  • sample_fraction (float/list) – The fraction of observations to sample. The default is 1 when sampling with replacement, and 0.632 otherwise. This can be a list of class specific values.

  • keep_inbag (bool) – If true, save how often observations are in-bag in each tree. These will be stored in the ranger_forest_ attribute under the key "inbag_counts".

  • inbag (list) – A list of size n_estimators, containing inbag counts for each observation. Can be used for stratified sampling.

  • split_rule (str) – One of variance, extratrees, maxstat, beta; default variance.

  • num_random_splits (int) – The number of random splits to consider for the extratrees splitrule.

  • alpha (float) – Significance threshold to allow splitting for the maxstat split rule.

  • minprop (float) – Lower quantile of covariate distribution to be considered for splitting for maxstat split rule.

  • respect_categorical_features (str) – One of ignore, order, partition. The default is partition for the extratrees splitrule, otherwise the default is ignore.

  • scale_permutation_importance (bool) – For permutation importance, scale permutation importance by standard error as in (Breiman 2001).

  • local_importance (bool) – For permutation importance, calculate and return local importance values as (Breiman 2001).

  • regularization_factor (list) – A vector of regularization factors for the features.

  • regularization_usedepth (bool) – Whether to consider depth in regularization.

  • holdout (bool) – Hold-out all samples with case weight 0 and use these for feature importance and prediction error.

  • quantiles (bool) – Enable quantile regression after fitting. This must be set to True in order to call predict_quantiles after fitting.

  • oob_error (bool) – Whether to calculate out-of-bag prediction error.

  • n_jobs (int) – The number of threads. Default is number of CPU cores.

  • save_memory (bool) – Save memory at the cost of speed growing trees.

  • seed (int) – Random seed value.

  • enable_tree_details (bool) – When True, perform additional calculations for detailing the underlying decision trees. Must be enabled for estimators_ and get_estimator to work. Very slow.

  • n_features_in_ (int) – The number of features (columns) from the fit input X.

  • feature_names_ (list) – Names for the features of the fit input X.

  • ranger_forest_ (dict) – The returned result object from calling C++ ranger.

  • mtry_ (int) – The mtry value as determined if mtry is callable, otherwise it is the same as mtry.

  • sample_fraction_ (float) – The sample fraction determined by input validation

  • regularization_factor_ (list) – The regularization factors determined by input validation.

  • unordered_features_ (list) – The unordered feature names determined by input validation.

  • split_rule_ (int) – The split rule integer corresponding to ranger enum SplitRule.

  • use_regularization_factor_ (bool) – Input validation determined bool for using regularization factor input parameter.

  • respect_categorical_features_ (str) – Input validation determined string respecting categorical features.

  • importance_mode_ (int) – The importance mode integer corresponding to ranger enum ImportanceMode.

  • random_node_values_ (2darray) – Random training target values based on trained forest terminal nodes for the purpose of quantile regression.

  • feature_importances_ (ndarray) – The variable importances from ranger.

property criterion

Compatibility alias for split rule.

fit(X, y, sample_weight=None, split_select_weights=None, always_split_features=None, categorical_features=None)[source]

Fit the ranger random forest using training data.

  • X (array2d) – training input features

  • y (array1d) – training input targets

  • sample_weight (array1d) – optional weights for input samples

  • split_select_weights (list) – Vector of weights between 0 and 1 of probabilities to select features for splitting. Can be a single vector or a vector of vectors with one vector per tree.

  • always_split_features (list) – Features which should always be selected for splitting. A list of column index values.

  • categorical_features (list) – A list of column index values which should be considered categorical, or unordered.


Extract a single estimator tree from the forest. :param int idx: The index of the tree to extract.


Calculate p-values for variable importances.

Uses the fast method from Janitza et al. (2016).


Get parameters for this estimator.


deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.


params – Parameter names mapped to their values.

predict(X, quantiles=None)[source]

Predict regression target for X.

If quantiles are passed, predict quantiles instead.

  • X (array2d) – prediction input features

  • quantiles (list(float)) – a list of quantiles on which to predict. If the list contains a single quantile, the result will be a 1darray. If there are multiple quantiles, the result will be a 2darray with columns corresponding to respective quantiles. If quantiles are not provided the result is the regression target estimate.

predict_quantiles(X, quantiles)[source]

Predict quantile regression target for X.

  • X (array2d) – prediction input features

  • quantiles (list(float)) – a list of quantiles on which to predict. If the list contains a single quantile, the result will be a 1darray. If there are multiple quantiles, the result will be a 2darray with columns corresponding to respective quantiles.

score(X, y, sample_weight=None)

Return the coefficient of determination of the prediction.

The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares ((y_true - y_pred)** 2).sum() and \(v\) is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.

  • X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.


score\(R^2\) of self.predict(X) wrt. y.

The \(R^2\) score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score(). This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).


Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.


**params (dict) – Estimator parameters.


self – Estimator instance.

