# Explore users behavior with transition matrix

In the sections above we have discussed how we can visualize user trajectories with plot_graph() Rete function. Here we will discuss in depth analysis of dataframe corresponding to the graph and major clickstream exploratory analysis tools for helicopter view on your data.

## Before you start

install retentioneering if running from google.colab or for the first time:

In [1]:
!pip3 install retentioneering

To start with these tools, you need to upload your own .csv with clickstream data (as described in Getting started) or you can use the retentioneering.datasets.load_simple_shop() for our sample dataset.

So for the first step please make sure you have Retentioneering imported and dataframe with your clickstream is created, and by calling retentioneering.config.update you defined for the library where the essential user_col, event_col, event_time_col are located in your loaded dataframe:

In [3]:
import retentioneering

# load sample data
data = retentioneering.datasets.load_simple_shop()

retentioneering.config.update({
    'user_col': 'user_id',
    'event_col':'event',
    'event_time_col':'timestamp',
})

We suggest you have a quick exploration of your dataframe data before moving forward.

## Explore transitions of your users between events with dataframe of adjacency matrix 

### get_adjacency function and its options

Similar approch as we had used with plot_graph() we may apply to explore the transitions in form of dataframe. Every graph can be represented as a matrix (table or dataframe). Your data have transitions of many users, we can strictly count how many users have certain transitions and build the table, where every row correspond to the origin event from which transition is made, and the columns correspond to destination event. Therefore, every cell of this table correspond to particular graph edge. 

Please note, that diagonal cells correspond to loops: transition from the node to itself. Typical example is the navigation with online shop where user goes from one catalog page to another catalog page.

The dataframe with this table, formally defined as adjacency matrix, because it reveales how the graph nodes are connected with edges, can be build by Retentioneering get_adjacency() function. Its arguments weight_col and norm_type are analogous to plot_graph() function, (read mode about these arguments in visualization tool descriptions https://retentioneering.github.io/retentioneering-tools/_build/html/plot_graph.html)

As we want to explore how many users of our clickstream dataset had particular transition, we can run it with weigh_col='user_id' and norm_type=None:


In [4]:
data.rete.get_adjacency(weight_col='user_id', norm_type=None)

Unnamed: 0,cart,catalog,delivery_choice,lost,main,product1,product2,delivery_courier,delivery_pickup,payment_choice,payment_card,payment_done,payment_cash
cart,1.0,478.0,1356.0,330.0,204.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
catalog,1324.0,2004.0,0.0,1605.0,1480.0,1122.0,1430.0,0.0,0.0,0.0,0.0,0.0,0.0
delivery_choice,0.0,172.0,0.0,92.0,68.0,0.0,0.0,748.0,469.0,0.0,0.0,0.0,0.0
lost,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
main,0.0,2015.0,0.0,488.0,603.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
product1,431.0,620.0,0.0,163.0,114.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
product2,582.0,934.0,0.0,116.0,88.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
delivery_courier,0.0,0.0,0.0,46.0,34.0,0.0,0.0,0.0,0.0,683.0,0.0,0.0,0.0
delivery_pickup,0.0,0.0,0.0,92.0,55.0,0.0,0.0,0.0,0.0,332.0,0.0,0.0,0.0
payment_choice,0.0,108.0,0.0,89.0,41.0,0.0,0.0,0.0,0.0,0.0,521.0,94.0,190.0


The beauty of this function is that it returns dataframe you can farther work with in a very convinient way:

In [5]:
df=data.rete.get_adjacency(weight_col='user_id', norm_type=None)

Now we can select only nodes from which at least some users (more than 0) had transitions into the cart:


In [6]:
df[df['cart']>0]

Unnamed: 0,cart,catalog,delivery_choice,lost,main,product1,product2,delivery_courier,delivery_pickup,payment_choice,payment_card,payment_done,payment_cash
cart,1.0,478.0,1356.0,330.0,204.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
catalog,1324.0,2004.0,0.0,1605.0,1480.0,1122.0,1430.0,0.0,0.0,0.0,0.0,0.0,0.0
product1,431.0,620.0,0.0,163.0,114.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
product2,582.0,934.0,0.0,116.0,88.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


How many users had reached the cart in total:


In [7]:
df['cart'].sum()

2338.0