A survey of a new paper that aims to explain the behavior of DNNs

Recently, a major researcher at AAC Technologies, Caglar Aytekin, published an article titled “Neural Networks are Decision Trees.” I read it carefully and tried to understand what exactly is the big discovery from this article. As many data scientists will likely agree, many transformations adopt one algorithm into another. However, (deep) neural networks (DNNs) are difficult to interpret. So has Aytekin discovered something new that brings us one step closer to the explainable era of AI?

I don’t spray

In this post, let’s examine the article and try to find out if this is actually a new discovery. Alternatively, we’ll look at whether this is just an important spotlight that every data scientist should know and remember while tackling the challenge of interpretability of DNNs.

Aytekin demonstrated that any classical pre-connected DNN with linear part-activation functions (such as ReLU) can be represented by a decision tree model. Let’s review the main difference between the two:

DNN fits the parameters of transform the input and indirectly direct the activation of their neurons.

Decision trees expressly suitable data flow routing parameters.

The motivation for this paper is yes addressing the black-box nature of DNN models and have another way to explain the behavior of DNNs. The work handles fully connected and convolutional networks and presents a direct equivalent decision tree representation. So, essentially, it explores the transformation from a DNN to a decision tree model when a sequence of weights with non-linearity between them is taken and transformed into a new weight structure. One additional result Aytekin discusses is the advantages of the corresponding DNN in terms of computational complexity (less memory to store).

Fross and Hinton presented in their work [4] “Distilling Neural Network into Soft Decision Tree” a great approach to explain DNN using decision trees. However, their work differs from Aytekin’s paper because they combined the advantages of DNNs and decision trees.

Building the spanning tree by calculating the new weights: the proposed algorithm takes the signals coming into the network and searches for the signals where ReLUs are activated and where they are not. Ultimately, the algorithm (transformation) replaces/places a vector of ones (or the bias values) and zeros.

The algorithm works on all layers. For each layer, it sees what the inputs are from the previous layer and calculates the dependency for each input. In fact, in each layer, a new one effective filter is selected so it will be applied to the network input (based on the previous solution). Thus, a fully connected DNN can be represented as a single decision tree, where the effective matrix found by the transformations act as categorization rules.

You can also apply it to a convolutional layer. The main difference is that many decisions are made on partial input regions rather than the entire input to the layer.

regarding dimensionality and computational complexity: The number of categories in the resulting decision tree seems overwhelming. In a fully balanced tree we need 2 per degree of tree depth (unresolvable). However, we also need to remember the breaking and redundant rules that provide lossless pruning.

Image by the author
  • This idea is valid for DNNs with linear part activation functions
  • The basis of this idea that neural networks are decision trees is not new
  • Personally, I found the explanation and mathematical description very simple [1]motivated to use it and drive the domain of Explainable AI
  • Someone should test this idea on ResNet 😊

The original article can be found at: https://arxiv.org/pdf/2210.05189.pdf

[1] Aytekin, Chaglar. “Neural networks are decision trees.” arXiv preprint arXiv:2210.05189 (2022).

If you want to watch a 30 min interview about the paper see here:

[2] The great Yannick Kilcher interviewed Alexander Mattick for this documentary on YouTube: https://www.youtube.com/watch?v=_okxGdHM5b8&ab_channel=YannicKilcher

A great paper on applying approximation theory to deep learning to explore how a DNN model organizes signals in a hierarchical manner:

[3] Balestriero, Randall. “Spline Theory for Deep Learning.” International Conference on Machine Learning. PMLR, 2018

Great work that combines the power of decision trees and DNNs:

[4] Fross, Nicholas, and Geoffrey Hinton. “Neural Network Distillation into a Soft Decision Tree.” arXiv preprint arXiv:1711.09784 (2017).

You can read a Medium post summarizing this work [4]:

[5] Neural Network Distillation into a Soft Decision Tree by Razorthink Inc, Medium, 2019.

Barak Orr is an entrepreneur and expert in artificial intelligence and navigation; Formerly Qualcomm. Barak holds an M.Sc. and B.Sc. in Engineering and a BA in Economics from the Technion. Winner of the Gemunder Award. Barak is completing his Ph.D. in AI and sensor synthesis. Author of a number of articles and patents. He is the founder and CEO of ALMA Tech. LTD, an artificial intelligence and advanced navigation company.

https://towardsdatascience.com/pushing-towards-the-explainable-ai-era-neural-networks-are-decision-trees-1603ab97eb1b?source=rss—-7f60cf5620c9—4