Transition matrix#

The transition matrix shows the frequencies of a transition between a pair of events. It is strongly connected to transition graph. Each value of a transition matrix is essentially an edge weight of the corresponding transition graph. For example, the weight of A B transition is located at A row and B column of the transition matrix. See the this and this section of the transition graph user guide for the details.

Loading data#

Throughout this guide we use our demonstration simple_shop dataset. It has already been converted to Eventstream and assigned to stream variable. If you want to use your own dataset, upload it following this instruction.

from retentioneering import datasets

stream = datasets.load_simple_shop()

A basic example#

Similar to transition graph’s parameters edges_norm_type and edges_weight_col, transition matrix has norm_type and weight_col arguments. The default parameters are norm_type=None and weight_col='user_id'.

stream.transition_matrix(norm_type='node', weight_col='user_id')
../_images/transition_matrix_basic.png

For example, from this matrix we can see that with respect to norm_type='node' and weight_col='user_id' configuration, the weight of the edge cart catalog is 25% meaning that 25% of the users who had cart event had also cart catalog transition.

There are some arguments that control the appearance of a transition matrix.

  • By default, we do not recommend to plot any matrix of dimension > 60. To override this, you can set show_large_matrix=True explicitly.

  • Matrix values are displayed if the matrix dimension is <= 30, and not displayed otherwise. To show or hide them explicitly use show_values flag.

  • The precision argument sets the number of digits after the decimal point.

  • heatmap_axis allows you to color each row or each column by with a separate heatmap.

stream.transition_matrix(norm_type='node', weight_col='event_id', heatmap_axis=0)
../_images/transition_matrix_heatmap_axis.png

This is a row-wise heatmap of the Markov matrix (norm_type='node', weight_col='event_id') that highlights the basic property of Markov transition matrix: the sum of each row equals to 1.

The next example demonstrates how to hide the cell values and make the image smaller:

stream.transition_matrix(
    norm_type='node', weight_col='event_id',
    show_values=False, figsize=(4, 4)
)
../_images/transition_matrix_no_values.png

Differential transition matrix#

Similar to some other tools (e.g. step matrix, funnels, sequences), transition matrix supports comparison of two groups of users. If M1 and M2 are transition matrices for the first and the second groups then the differential matrix is defined as M1 - M2.

The groups argument defines the groups of paths to compare. You can pass either a collection of two collections containing path ids, or a couple of pre-defined segment values. The latter option is often preferable since pre-defined segments allows you to compare the same groups using other Retentioneering tools too. For example, if you have a segment country you can compare the users from the US and the UK like this:

stream.transition_matrix(groups=['has_payment_done', 'US', 'UK'])

See the segments user guide for more details on how to create and use segments.

If you do not want to create a segment explicitly, you can pass a pair of collections containing path ids on the fly like this:

group1 = [39690243, 56229892, 770891782, 189849617, 345530386]
group2 = [950233183, 681437279, 816957536, 913156199, 614680680]

stream.transition_matrix(
    weight_col='user_id', norm_type='full',
    groups=[group1, group2]
)
../_images/diff_transition_matrix1.png

Using a separate instance#

Eventstream.transition_matrix() returns an instance of TransitionMatrix class that have values attribute so you can get the transition matrix as a pandas.DataFrame. To supress plotting the heatmap matrix, use show_plot=False flag.

tm = stream.transition_matrix(show_plot=False)
tm.values
cart catalog ... payment_done payment_cash
cart 1.0 571.0 ... 0.0 0.0
catalog 1709.0 4857.0 ... 0.0 0.0
... ... ... ... ... ...
payment_done 0.0 0.0 ... 0.0 0.0
payment_cash 0.0 0.0 ... 104.0 0.0