Timedelta hist#

Class#

class retentioneering.tooling.timedelta_hist.timedelta_hist.TimedeltaHist(eventstream)[source]#

Plot the distribution of the time deltas between two events. Support various distribution types, such as distribution of time for adjacent consecutive events, or for a pair of pre-defined events, or median transition time from event to event per user/session.

Parameters:
eventstreamEventstreamType

See also

UserLifetimeHist

Plot the distribution of user lifetimes.

EventTimestampHist

Plot the distribution of events over time.

Eventstream.describe

Show general eventstream statistics.

Eventstream.describe_events

Show general eventstream events statistics.

AddStartEndEvents

Create new synthetic events path_start and path_end to each user trajectory.

SplitSessions

Create new synthetic events, that divide users’ paths on sessions.

LabelCroppedPaths

Create new synthetic event(s) for each user based on the timeout threshold.

DropPaths

Filter user paths based on the path length, removing the paths that are shorter than the specified number of events or cut_off.

Notes

See Eventstream user guide for the details.

fit(raw_events_only=False, event_pair=None, adjacent_events_only=True, weight_col=None, time_agg=None, timedelta_unit='s', log_scale=None, lower_cutoff_quantile=None, upper_cutoff_quantile=None, bins=20)[source]#

Calculate values and bins for the histplot.

Parameters:
raw_events_onlybool, default True

If True - statistics will be shown only for raw events. If False - statistics will be shown for all events presented in your data.

event_pairtuple of str, optional

Specify an event pair to plot the time distance between. The first item corresponds to chronologically first event, the second item corresponds to the second event. If event_pair=None, plot distribution of timedelta for all adjacent events.

Examples: (‘login’, ‘purchase’); [‘start’, ‘cabinet’]

Besides the generic eventstream events, event_pair can accept special eventstream_start and eventstream_end events which denote the first and the last event in the entire eventstream correspondingly.

Note that the sequence of events and weight_col is important.

adjacent_events_onlybool, default True

Is used only when event_pair is not None; specifies whether events need to be adjacent to be included.

For example, if event_pair=("login", "purchase") and adjacent_events_only=False, then the sequence (“login”, “main”, “trading”, “purchase”) will contain a valid pair (which is not the case with adjacent_events_only=True).

weight_colstr, default None

Specify a unit of observation, inside which time differences will be computed. By default, the values from user_id column in EventstreamSchema is taken.

For example:

  • If user_id - time deltas will be computed only for events inside each user path.

  • If session_id - the same, but inside each session.

time_agg{None, “mean”, “median”}, default None

Specify the aggregation policy for the time distances. Aggregate based on passed weight_col.

  • If None - no aggregation;

  • mean and median plot distributions of weight_col unit mean or unit median timedeltas.

For example, if session id is specified in weight_col, one observation per session (for example, session median) will be provided for the histogram.

timedelta_unitDATETIME_UNITS, default ‘s’

Specify units of time differences the histogram should use. Use “s” for seconds, “m” for minutes, “h” for hours and “D” for days.

log_scale: bool | tuple of bool | None, optional
  • If True - apply log scaling to the x axis.

  • If tuple of bool - apply log scaling to the (x,``y``) axes correspondingly.

lower_cutoff_quantilefloat, optional

Specify time distance quantile as the lower boundary. The values below the boundary are truncated.

upper_cutoff_quantilefloat, optional

Specify time distance quantile as the upper boundary. The values above the boundary are truncated.

binsint or {“auto”, “fd”, “doane”, “scott”, “stone”, “rice”, “sturges”, “sqrt”}, default 20

Generic bin parameter that can be the name of a reference rule or the number of bins. Passed to numpy.histogram_bin_edges.

Returns:
None
plot(width=6.0, height=4.5)[source]#

Create a sns.histplot based on the calculated values.

Parameters:
widthfloat, default 6.0

Width in inches.

heightfloat, default 4.5

Height in inches.

Returns
——-
:matplotlib_axes:`matplotlib.axes.Axes<>`

The matplotlib axes containing the plot.

property values#
Returns:
tuple(np.ndarray, np.ndarray)
  1. The first array contains the values for histogram.

  2. The first array contains the bin edges.

Eventstream#

Eventstream.timedelta_hist(raw_events_only=False, event_pair=None, adjacent_events_only=True, weight_col=None, time_agg=None, timedelta_unit='s', log_scale=None, lower_cutoff_quantile=None, upper_cutoff_quantile=None, bins=20, width=6.0, height=4.5, show_plot=True)[source]#

Plot the distribution of the time deltas between two events. Support various distribution types, such as distribution of time for adjacent consecutive events, or for a pair of pre-defined events, or median transition time from event to event per user/session.

Parameters:
show_plotbool, default True

If True, histogram is shown.

See other parameters’ description

TimedeltaHist

Returns:
TimedeltaHist

A TimedeltaHist class instance fitted with given parameters.