Discovering Insights : What can be learned through a Prediction Tree?

20/05/2019 |

Comprendre les données : que peut-on apprendre d'un arbre de décision ?  

Let’s take an example with this Prediction tree generated on an ecommerce website:

How to interpret it?

Red represents a strong probability of purchase (the darker the red, the higher the probability)
Blue represents a strong probability of no purchase (the darker the blue, the less chance a user will purchase)

The first point to the left represents 100% of the population, a first split occurs with the total_order_amount variable.

On one side, we have those whose total purchases (for their entire purchase history) amount to less than €25, and who are very unlikely to make a purchase (dark blue); on the other side, those whose total purchases amount to over €25 are slightly more likely to make a purchase (light blue)
Amongst the latter, those who were recently in the funnel are likely to make a purchase (light red circle) and out of these, those whose last visit was less than 28 days ago are even more likely to make a purchase (very dark red circle).

And for those whose 1st visit was over 28 days ago, we can see that those who recently viewed less than 22 pages are unlikely to buy, except when their total viewed pages is less than 55 and if they recently viewed over 4 pages.

However, those who have viewed a total of over 55 pages and who recently viewed less than 8 pages won’t make a purchase

Etc., etc. following the nodes

Lessons to take away

  • The most predictable variable is the total order amount, and it must be taken into account when you create your segments.
  • Recent presence in the buying tunnel drastically changes the probability (unsurprisingly, hence the importance of recovering abandoned baskets)
  • The number of page views is an indicator of purchase probability and depends on the date of the last visit (we can deduce that there is a sort of ratio that can determine if someone has a purchase intent based on the date of their first visit, total number of page views since and the number of recent page views).
    You must therefore create a segment for each red node on the far right of the screen to find all the visitors with an intent (or do further research into the ratio to create a new score variable to make this easier later on)
  • The other predictable variables to bear in mind are recent_view_product and recent_view_category. You can see that the higher they are, less chance there is of a purchase, no doubt because these are visitors who are just browsing without really knowing what they want, unlike those who look at few products and categories and who are more likely to buy quickly