Sequences#

Sequences Class#

class retentioneering.tooling.sequences.sequences.Sequences(eventstream)[source]#

A class that provides methods for patterns exploration.

Parameters:
eventstreamEventstreamType

See also

Eventstream.sequences

Call Sequences tool as an eventstream method.

Notes

See Sequences user guide for the details.

fit(ngram_range=(1, 1), groups=None, group_names=None, path_id_col=None)[source]#

Calculate statistics on n-grams found in eventstream. Calculated path_metrics:

  • paths: The number of unique paths that contain each particular event sequence (calculated within the specified path_id_col, or user_id by default).

  • paths_share: The ratio of paths to the sum of paths.

  • count: The number of occurrences of a particular sequence.

  • count_share: The ratio of count to the sum of counts.

  • avg_count: The average number of occurrences per path.

Defined sequences types:

  • loop - if sequence length >= 2 and all the events are equal.

  • cycle - if sequence length >= 3 and start and end events are equal.

  • other - all other sequences.

Parameters:
ngram_rangeTuple(int, int)

The lower and upper boundary of the range of n-values for different word n-grams to be extracted. For example, ngram_range=(1, 1) means only single events, (1, 2) means single events and bigrams.

groupsSplitExpr, optional

Can be specified to calculate statistics for n-grams group-wise and provide delta values. Must contain a tuple of two elements (g_1, g_2), where g_1 and g_2 are collections of path IDs (for the column specified in the path_id_col parameter).

group_namesUserGroupsNamesType, optional

Names for the selected groups g_1 and g_2, which will be shown in the final plot header.

path_id_colstr, optional

The column used for calculating ‘paths’ and ‘paths_share’ path_metrics. If not specified, the user_id from eventstream.schema will be used. For example, it can be specified as session_id if eventstream has such a custom_col.

Returns:
None

Notes

See the results of calculation using plot() method and the values() attribute.

plot(metrics=None, threshold=None, sorting=None, heatmap_cols=None, sample_size=1, precision=2)[source]#
Parameters:
metrics{‘paths’, ‘paths_share’, ‘count’, ‘count_share’}, optional

Specify the path_metrics to be displayed in the plot.

  • If groups are specified, by default, only the ‘paths’ metric will be plotted within each group, along with both deltas (relative and absolute).

  • If groups=None, all four path_metrics will be shown by default.

thresholdtuple[str, float | int], optional

Used to filter out infrequent sequences based on the specified metric.

  • Example without groups: (‘paths’, 1200)

  • Example with groups: ((‘user_id’, ‘group_1’), 1200)

Only rows with values greater than or equal to 1200 in the specified column will be displayed.

sortingTuple(str or list of str, bool or list of bool) or None, default None
  • The first element in the tuple: Column name or list of names for sorting.

  • The second element: The sorting order (ascending vs. descending). Specify a list for multiple sorting orders. If a list of bools is provided, it must match the length of the sorting columns.

heatmap_colsstr or list of str or list of tuples or None

Specifies columns to be represented in the heatmap as follows:

  • The heatmap range is calculated column-wise.

  • For columns containing negative values, the palette will be divergent (blue - orange with white as zero).

  • For columns with only positive values, the palette will be orange.

  • For columns with only negative values, the palette will be blue.

Default values

sample_sizeint or None, default 1

Number of ID samples to display.

precisionint, default 2

Number of decimal digits to show as fractions in the heatmap.

Returns:
pd.io.formats.style.Styler

Styled pd.Dataframe object.

property params#

Returns the parameters used for the last fitting. Should be used after fit().

property values#

Returns a pd.DataFrame representing the fitted or plotted Sequences table. Should be used after fit() or plot().

Returns:
pd.DataFrame

Eventstream#

Eventstream.sequences(ngram_range=(1, 1), groups=None, group_names=None, weight_col=None, metrics=None, threshold=None, sorting=None, heatmap_cols=None, sample_size=1, precision=2, show_plot=True)[source]#

Calculate statistics on n-grams found in eventstream.

Parameters:
show_plotbool, default True

If True, a sankey diagram is shown.

See other parameters’ description

Sequences

Returns:
Sequences

A Sequences class instance fitted to the given parameters.