The tree is binary, each internal node represent a segmentation choice. For exemple a DistanceNode would discriminate data according to a distance threshold. At the moment the internal node types used are
The leaves of the tree contains the observations that fit in the segmentation obtained by following the path from the root to the leaf. When a leaf contains enough observations it is split. The factors used for splitting a leaf are
- keeping approximately the same amount of data in each branch. It picks the median value for each segmentation axis, and corrects it to the closest standardised value. The standardised value allow the tree to be saved while maintaining a small file size (less than 0.6ko for a 35 rounds game even with long rounds).
- the information gain of all possible internal node type. See Segmentation/Prioritizing
The tree is rebuild at the beginning of each round. When there is too much data to rebuild the tree completely it is just partially rebuild.
Two threads are used for the tree, one for inserting new observations and splitting leaves on the fly, another one is used for rebuilding the tree.
The tree can be saved at the end of a game, in fact just a symbolic representation of the internal nodes, at the moment this is disabled I haven't tested if it is valuable.