Sequences#
Sequences Class#
- class retentioneering.tooling.sequences.sequences.Sequences(eventstream)[source]#
A class that provides methods for patterns exploration.
- Parameters:
- eventstreamEventstreamType
See also
Eventstream.sequences
Call Sequences tool as an eventstream method.
Notes
See Sequences user guide for the details.
- fit(ngram_range=(1, 1), groups=None, group_names=None, path_id_col=None)[source]#
Calculate statistics on n-grams found in eventstream. Calculated path_metrics:
paths
: The number of unique paths that contain each particular event sequence (calculated within the specifiedpath_id_col
, oruser_id
by default).paths_share
: The ratio of paths to the sum of paths.count
: The number of occurrences of a particular sequence.count_share
: The ratio of count to the sum of counts.avg_count
: The average number of occurrences per path.
Defined sequences types:
loop
- if sequence length >= 2 and all the events are equal.cycle
- if sequence length >= 3 and start and end events are equal.other
- all other sequences.
- Parameters:
- ngram_rangeTuple(int, int)
The lower and upper boundary of the range of n-values for different word n-grams to be extracted. For example, ngram_range=(1, 1) means only single events, (1, 2) means single events and bigrams.
- groupsSplitExpr, optional
Can be specified to calculate statistics for n-grams group-wise and provide delta values. Must contain a tuple of two elements (g_1, g_2), where g_1 and g_2 are collections of path IDs (for the column specified in the
path_id_col
parameter).- group_namesUserGroupsNamesType, optional
Names for the selected groups g_1 and g_2, which will be shown in the final plot header.
- path_id_colstr, optional
The column used for calculating ‘paths’ and ‘paths_share’ path_metrics. If not specified, the
user_id
fromeventstream.schema
will be used. For example, it can be specified assession_id
ifeventstream
has such acustom_col
.
- Returns:
- None
Notes
See the results of calculation using
plot()
method and thevalues()
attribute.
- plot(metrics=None, threshold=None, sorting=None, heatmap_cols=None, sample_size=1, precision=2)[source]#
- Parameters:
- metrics{‘paths’, ‘paths_share’, ‘count’, ‘count_share’}, optional
Specify the path_metrics to be displayed in the plot.
If groups are specified, by default, only the ‘paths’ metric will be plotted within each group, along with both deltas (relative and absolute).
If
groups=None
, all four path_metrics will be shown by default.
- thresholdtuple[str, float | int], optional
Used to filter out infrequent sequences based on the specified metric.
Example without groups: (‘paths’, 1200)
Example with groups: ((‘user_id’, ‘group_1’), 1200)
Only rows with values greater than or equal to 1200 in the specified column will be displayed.
- sortingTuple(str or list of str, bool or list of bool) or None, default None
The first element in the tuple: Column name or list of names for sorting.
The second element: The sorting order (ascending vs. descending). Specify a list for multiple sorting orders. If a list of bools is provided, it must match the length of the sorting columns.
- heatmap_colsstr or list of str or list of tuples or None
Specifies columns to be represented in the heatmap as follows:
The heatmap range is calculated column-wise.
For columns containing negative values, the palette will be divergent (blue - orange with white as zero).
For columns with only positive values, the palette will be orange.
For columns with only negative values, the palette will be blue.
Default values
- sample_sizeint or None, default 1
Number of ID samples to display.
- precisionint, default 2
Number of decimal digits to show as fractions in the heatmap.
- Returns:
- pd.io.formats.style.Styler
Styled pd.Dataframe object.
Eventstream#
- Eventstream.sequences(ngram_range=(1, 1), groups=None, group_names=None, weight_col=None, metrics=None, threshold=None, sorting=None, heatmap_cols=None, sample_size=1, precision=2, show_plot=True)[source]#
Calculate statistics on n-grams found in eventstream.
- Parameters:
- show_plotbool, default True
If
True
, a sankey diagram is shown.- See other parameters’ description
- Returns:
- Sequences
A
Sequences
class instance fitted to the given parameters.