Transition matrix#

Transition matrix class#

class retentioneering.tooling.transition_matrix.transition_matrix.TransitionMatrix(eventstream)[source]#

The TransitionMatrix class represents a matrix where the element at position (i, j) displays the weight of the transition from event i to event j. This class provides methods for calculating and visualizing transition matrices, using the same logic as for calculating edge weights in a transition graph.

Parameters:
eventstreamEventstreamType

The eventstream for which the transition matrix is computed.

See also

Eventstream.transition_matrix

This method can be called on an Eventstream to obtain a TransitionMatrix.

TransitionGraph

An interactive tool for representing transitions as a graph.

Notes

For more detailed information, refer to the Transition matrix user guide.

fit(weight_col=None, norm_type=None, groups=None)[source]#

Calculates transition weights as a matrix for each unique pair of events. The calculation logic is the same that is used for edge weights calculation of transition graph. Applying fit method is necessary for the following usage of any visualization or descriptive TransitionMatrix methods.

Parameters:
norm_type{“full”, “node”, None}, default None

Type of normalization that is used to calculate weights. Based on weight_col parameter the weight values are calculated.

  • If None, normalization is not used, the absolute values are taken.

  • If full, normalization across the whole eventstream.

  • If node, normalization across each node (or outgoing transitions from each node).

See Transition graph user guide for the details.

weight_colstr, optional

A column name from the EventstreamSchema which values will control the final edges’ weights.

For each edge is calculated:

  • If None or user_id - the number of unique users.

  • If event_id - the number of transitions.

  • If session_id - the number of unique sessions.

  • If custom_col - the number of unique values in selected column.

See Transition graph user guide for the details.

groupstuple[list, list], tuple[str, str, str], str, optional

Specify two groups of paths to plot differential transition matrix. Two transition matrices M1 and M2 will be calculated for these groups. Resulting matrix is M = M1 - M2.

  • If tuple[list, list], each sub-list should contain valid path ids.

  • If tuple[str, str, str], the first str should refer to a segment name, the others should refer to the corresponding segment values.

  • If str, it should refer to a binary (i.e. containing two segment values only) segment name.

plot(heatmap_axis='both', precision='auto', figsize=None, show_large_matrix=None, show_values=None)[source]#

Create a heatmap plot based on the calculated transition matrix values. This method should be used after calling fit().

Parameters:
heatmap_axis{0 or ‘rows’, 1 or ‘columns’, ‘both’}, default ‘both’

The axis for which the heatmap is to be generated. If specified, the heatmap will be created separately for the selected axis. If heatmap_axis='both', the heatmap will be applied to the entire matrix.

figsizetuple[float, float], default None

The size of the visualization. The default size is calculated automatically depending on the matrix dimension and precision and show_values options.

precisionint or str, default ‘auto’

The number of decimal digits to display after zero as fractions in the heatmap. If precision is auto, the value will depend on the norm_type: 0 for norm_type=None, and 2 otherwise.

show_large_matrixbool, optional

If None the matrix is displayed only in case the matrix dimension <= 60. If True, the matrix is plotted explicitly.

show_valuesbool, optional

If None the matrix values are not displayed only in case the matrix dimension lies between 30 and 60. If True, the matrix values are shown explicitly. If False, the values are hidden, precision parameter is ignored in this case.

Returns:
matplotlib.axes.Axes

The Axes object containing the heatmap plot.

property values#

Returns the calculated transition matrix as a pandas.DataFrame. Should be used after fit().

Eventstream#

Eventstream.transition_matrix(norm_type=None, weight_col=None, groups=None, heatmap_axis='both', precision='auto', figsize=None, show_large_matrix=None, show_values=None, show_plot=True)[source]#

Retrieve a matrix of transition weights for each pair of unique events. This function calculates transition weights based on the same logic used for calculating edge weights in a transition graph.

Parameters:
norm_type{“full”, “node”, None}, default None

Type of normalization that is used to calculate weights. Based on weight_col parameter the weight values are calculated.

  • If None, normalization is not used, the absolute values are taken.

  • If full, normalization across the whole eventstream.

  • If node, normalization across each node (or outgoing transitions from each node).

See Transition graph user guide for the details.

weight_colstr, default ‘user_id’

A column name from the EventstreamSchema which values will control the final edges’ weights.

For each edge is calculated:

  • If None or user_id - the number of unique users.

  • If event_id - the number of transitions.

  • If session_id - the number of unique sessions.

  • If custom_col - the number of unique values in selected column.

See Transition graph user guide for the details.

groupstuple[list, list], optional

Can be specified to calculate differential transition matrix. Must contain a tuple of two elements (g_1, g_2): where g_1 and g_2 are collections of user_id`s. Two separate transition matrices M1 and M2 will be calculated for users from g_1 and g_2, respectively. Resulting matrix will be the matrix M = M1 - M2.

heatmap_axis{0 or ‘rows’, 1 or ‘columns’, ‘both’}, default ‘both’

The axis for which the heatmap is to be generated. If specified, the heatmap will be created separately for the selected axis. If heatmap_axis='both', the heatmap will be applied to the entire matrix.

precisionint or str, default ‘auto’

The number of decimal digits to display after zero as fractions in the heatmap. If precision is auto, the value will depend on the norm_type: 0 for norm_type=None, and 2 otherwise.

figsizetuple[float, float], default None

The size of the visualization. The default size is calculated automatically depending on the matrix dimension and precision and show_values options.

show_large_matrixbool, optional

If None the matrix is displayed only in case the matrix dimension <= 60. If True, the matrix is plotted explicitly.

show_valuesbool, optional

If None the matrix values are not displayed only in case the matrix dimension lies between 30 and 60. If True, the matrix values are shown explicitly. If False, the values are hidden, precision parameter is ignored in this case.

show_plotbool, default True

If True, a heatmap of the transition matrix will be displayed.

Returns:
TransitionMatrix

A TransitionMatrix instance fitted to the given parameters is returned.