Low-level Tree Interface¶
The tree interface
mimics that of sklearn.tree._tree.Tree
. In order to get this
level of detail, the argument enable_tree_details
must be set to True
on
ensemble estimators prior to fitting. The single Tree estimators will always perform
these calculations.
Since ranger does not track this level of detail on its trees, we perform extra operations in Python to provide them. These operations are currently quite slow and not well-optimized.
These operations include
determining which leaf node corresponds to each training sample for every tree
determining the sum of the weights of the training samples at each leaf node
determining the weighted average prediction value of each leaf node
These extra calculations deliver n_node_values
, weighted_n_node_values
,
and value
attributes of the Tree
class.
Note
Since we don’t have direct access to the sub-sampled training sets used in building each of the trees, we determine the above values using the full training set.
- class skranger.tree._tree.Tree(*, ranger_forest)[source]¶
The low-level tree interface.
Tree objects can be accessed using the
tree_
attribute on fitted decision tree estimators. Instances ofTree
provide methods and properties describing the underlying structure and attributes of the tree.- apply(X)[source]¶
Calculate the leaf index for each sample.
- Parameters
X (array2d) – training input features
- property capacity¶
The capacity of the node array.
- property children_left¶
Left children nodes.
- property children_right¶
Right children nodes.
- decision_path(X)[source]¶
Calculate the decision path through the tree for each sample.
- Parameters
X (array2d) – training input features
- property feature¶
Variables on which nodes are split.
- property max_depth¶
Max depth of the tree.
- property n_classes¶
The quantity of classes.
- property n_node_samples¶
The number of samples reaching each node.
- property n_outputs¶
The quantity of outputs of the tree.
- property node_count¶
The quantity of nodes in the tree.
- property threshold¶
Threshold values on which nodes are split.
- property value¶
The constant prediction value of each node.
- property weighted_n_node_samples¶
The sum of the weights of the samples reaching each node.
SHAP¶
RangerForestRegressor
and RangerForestClassifier
can be used with shap. A
context manager is provided which patches skranger
objects so that they work
with shap.
from shap import TreeExplainer
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from skranger.ensemble import RangerForestClassifier
from skranger.utils.shap import shap_patch
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
forest = RangerForestClassifier(enable_tree_details=True).fit(X_train, y_train)
with shap_patch():
explainer = TreeExplainer(model=forest)
values = explainer.shap_values(X_test)