Ranger Tree Classifier

class skranger.tree.RangerTreeClassifier(*, verbose=False, mtry=0, importance='none', min_node_size=0, max_depth=0, replace=True, sample_fraction=None, keep_inbag=False, inbag=None, split_rule='gini', num_random_splits=1, respect_categorical_features=None, scale_permutation_importance=False, local_importance=False, regularization_factor=None, regularization_usedepth=False, holdout=False, oob_error=False, save_memory=False, seed=42)[source]

Ranger Tree Probability/Classification implementation for sci-kit learn.

Provides a sklearn classifier interface to the Ranger C++ library using Cython.

  • verbose (bool) – Enable ranger’s verbose logging

  • mtry (int/callable) – The number of features to split on each node. When a callable is passed, the function must accept a single parameter which is the number of features passed, and return some value between 1 and the number of features.

  • importance (str) – One of one of none, impurity, impurity_corrected, permutation.

  • min_node_size (int) – The minimal node size.

  • max_depth (int) – The maximal tree depth; 0 means unlimited.

  • replace (bool) – Sample with replacement.

  • sample_fraction (float/list) – The fraction of observations to sample. The default is 1 when sampling with replacement, and 0.632 otherwise. This can be a list of class specific values.

  • keep_inbag (bool) – If true, save how often observations are in-bag in each tree. These will be stored in the ranger_forest_ attribute under the key "inbag_counts".

  • inbag (list) – A list of size n_estimators, containing inbag counts for each observation. Can be used for stratified sampling.

  • split_rule (str) – One of gini, extratrees, hellinger; default gini.

  • num_random_splits (int) – The number of random splits to consider for the extratrees splitrule.

  • respect_categorical_features (str) – One of ignore, order, partition. The default is partition for the extratrees splitrule, otherwise the default is ignore.

  • scale_permutation_importance (bool) – For permutation importance, scale permutation importance by standard error as in (Breiman 2001).

  • local_importance (bool) – For permutation importance, calculate and return local importance values as (Breiman 2001).

  • regularization_factor (list) – A vector of regularization factors for the features.

  • regularization_usedepth (bool) – Whether to consider depth in regularization.

  • holdout (bool) – Hold-out all samples with case weight 0 and use these for feature importance and prediction error.

  • oob_error (bool) – Whether to calculate out-of-bag prediction error.

  • save_memory (bool) – Save memory at the cost of speed growing trees.

  • seed (int) – Random seed value.

  • classes_ (ndarray) – The class labels determined from the fit input y.

  • n_classes_ (int) – The number of unique class labels from the fit input y.

  • n_features_in_ (int) – The number of features (columns) from the fit input X.

  • feature_names_ (list) – Names for the features of the fit input X.

  • ranger_forest_ (dict) – The returned result object from calling C++ ranger.

  • mtry_ (int) – The mtry value as determined if mtry is callable, otherwise it is the same as mtry.

  • sample_fraction_ (float/list) – The sample fraction determined by input validation

  • regularization_factor_ (list) – The regularization factors determined by input validation.

  • unordered_variable_names_ (list) – The unordered variable names determined by input validation.

  • split_rule_ (int) – The split rule integer corresponding to ranger enum SplitRule.

  • use_regularization_factor_ (bool) – Input validation determined bool for using regularization factor input parameter.

  • respect_categorical_features_ (str) – Input validation determined string respecting categorical features.

  • importance_mode_ (int) – The importance mode integer corresponding to ranger enum ImportanceMode.

  • ranger_class_order_ (list) – The class reference ordering derived from ranger.

  • feature_importances_ (ndarray) – The variable importances from ranger.


Calculate the index of the leaf for each sample. :param array2d X: training input features

property criterion

Compatibility alias for split rule.


Calculate the decision path through the tree for each sample. :param array2d X: training input features

fit(X, y, sample_weight=None, class_weights=None, split_select_weights=None, always_split_features=None, categorical_features=None)[source]

Fit the ranger tree using training data.

  • X (array2d) – training input features

  • y (array1d) – training input target classes

  • sample_weight (array1d) – optional weights for input samples

  • class_weights (dict) – A dictionary of outcome classes to weights.

  • split_select_weights (list) – Vector of weights between 0 and 1 of probabilities to select features for splitting. Can be a single vector or a vector of vectors with one vector per tree.

  • always_split_features (list) – Features which should always be selected for splitting. A list of column index values.

  • categorical_features (list) – A list of column index values which should be considered categorical, or unordered.

classmethod from_forest(forest: RangerForestClassifier, idx: int)[source]

Extract a tree from a forest.

  • forest (RangerForestClassifier) – A trained RangerForestClassifier instance

  • idx (int) – The tree index from the forest to extract.


Calculate the maximum depth of the tree.


Calculate the number of leaves of the tree.


Get parameters for this estimator.


deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.


params – Parameter names mapped to their values.

Return type



Predict classes from X.


X (array2d) – prediction input features


Predict log probabilities for classes from X.


X (array2d) – prediction input features


Predict probabilities for classes from X.


X (array2d) – prediction input features

score(X, y, sample_weight=None)

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

  • X (array-like of shape (n_samples, n_features)) – Test samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True labels for X.

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.


score – Mean accuracy of self.predict(X) wrt. y.

Return type



Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.


**params (dict) – Estimator parameters.


self – Estimator instance.

Return type

estimator instance