Step Matrix#

Step Matrix Class#

class retentioneering.tooling.step_matrix.step_matrix.StepMatrix(eventstream)[source]#

Step matrix is a matrix where its (i, j) element shows the frequency of event i occurring as j-th step in user trajectories. This class provides methods for step matrix calculation and visualization.


See also


Call StepMatrix tool as an eventstream method.


A class for the visualization of user paths in stepwise manner using Sankey diagram.


Find loops and create new synthetic events in the paths of all users having such sequences.


See StepMatrix user guide for the details.

fit(max_steps=20, weight_col=None, precision=2, targets=None, accumulated=None, sorting=None, threshold=0.01, centered=None, groups=None)[source]#

Calculates the step matrix internal values with the defined parameters. Applying fit method is necessary for the following usage of any visualization or descriptive StepMatrix methods.

max_stepsint, default 20

Maximum number of steps in user path to include.

weight_colstr, optional

Aggregation column for edge weighting. If None, specified user_id from eventstream.schema will be used. For example, can be specified as session_id if eventstream has such custom_col.

precisionint, default 2

Number of decimal digits after 0 to show as fractions in the heatmap.

targetslist of str or str, optional

List of event names to include in the bottom of step_matrix as individual rows. Each specified target will have separate color-coding space for clear visualization. Example: [‘product_page’, ‘cart’, ‘payment’]

If multiple targets need to be compared and plotted using the same color-coding scale, such targets must be combined in a sub-list. Example: [‘product_page’, [‘cart’, ‘payment’]]

accumulated{“both”, “only”}, optional

Option to include accumulated values for targets.

  • If None, accumulated tartes are not shown.

  • If both, show step values and accumulated values.

  • If only, show targets only as accumulated.

sortinglist of str, optional
  • If list of event names specified - lines in the heatmap will be shown in the passed order.

  • If None - rows will be ordered according to i`th value (first row, where 1st element is max; second row, where second element is max; etc)

thresholdfloat, default 0.01

Used to remove rare events. Aggregates all rows where all values are less than the specified threshold.

centereddict, optional

Parameter used to align user paths at a specific event at a specific step. Has to contain three keys: - event: str, name of event to align. - left_gap: int, number of events to include before specified event. - occurrence : int which occurrence of event to align (typical 1).

If not None - only users who have selected events with the specified occurrence in their paths will be included. Fraction of such remaining users is specified in the title of centered step_matrix. Example: {‘event’: ‘cart’, ‘left_gap’: 8, ‘occurrence’: 1}

groupstuple[list, list], optional

Can be specified to plot differential step_matrix. Must contain a tuple of two elements (g_1, g_2): where g_1 and g_2 are collections of user_id`s. Two separate step_matrices M1 and M2 will be calculated for users from g_1 and g_2, respectively. Resulting matrix will be the matrix M = M1-M2.


During step matrix calculation an artificial ENDED event is created. If a path already contains path_end event (See AddStartEndEvents), it will be temporarily replaced with ENDED (within step matrix only). Otherwise, ENDED event will be explicitly added to the end of each path.

Event ENDED is cumulated so that the values in its row are summed up from the first step to the last. ENDED row is always placed at the last line of step matrix. This design guarantees that the sum of any step matrix’s column is 1 (0 for a differential step matrix).


Create a heatmap plot based on the calculated step matrix values. Should be used after fit().

property params#

Returns the parameters used for the last fitting. Should be used after fit().

property values#

Returns the calculated step matrix as a pd.DataFrame. Should be used after fit().

tuple[pd.DataFrame, pd.DataFrame | None]
  1. Stands for the step matrix.

  2. Stands for a separate step matrix related for target events only.


Eventstream.step_matrix(max_steps=20, weight_col=None, precision=2, targets=None, accumulated=None, sorting=None, threshold=0.01, centered=None, groups=None, show_plot=True)[source]#

Show a heatmap visualization of the step matrix.

show_plotbool, default True

If True, a step matrix heatmap is shown.

See other parameters’ description



A StepMatrix class instance fitted to the given parameters.