Timedelta hist#
Class#
- class retentioneering.tooling.timedelta_hist.timedelta_hist.TimedeltaHist(eventstream)[source]#
Plot the distribution of the time deltas between two events. Support various distribution types, such as distribution of time for adjacent consecutive events, or for a pair of pre-defined events, or median transition time from event to event per user/session.
- Parameters:
- eventstreamEventstreamType
See also
UserLifetimeHist
Plot the distribution of user lifetimes.
EventTimestampHist
Plot the distribution of events over time.
Eventstream.describe
Show general eventstream statistics.
Eventstream.describe_events
Show general eventstream events statistics.
AddStartEndEvents
Create new synthetic events
path_start
andpath_end
to each user trajectory.SplitSessions
Create new synthetic events, that divide users’ paths on sessions.
LabelCroppedPaths
Create new synthetic event(s) for each user based on the timeout threshold.
DropPaths
Filter user paths based on the path length, removing the paths that are shorter than the specified number of events or cut_off.
Notes
See Eventstream user guide for the details.
- fit(raw_events_only=False, event_pair=None, adjacent_events_only=True, weight_col=None, time_agg=None, timedelta_unit='s', log_scale=None, lower_cutoff_quantile=None, upper_cutoff_quantile=None, bins=20)[source]#
Calculate values and bins for the histplot.
- Parameters:
- raw_events_onlybool, default True
If
True
- statistics will be shown only for raw events. IfFalse
- statistics will be shown for all events presented in your data.- event_pairtuple of str, optional
Specify an event pair to plot the time distance between. The first item corresponds to chronologically first event, the second item corresponds to the second event. If
event_pair=None
, plot distribution of timedelta for all adjacent events.Examples: (‘login’, ‘purchase’); [‘start’, ‘cabinet’]
Besides the generic eventstream events,
event_pair
can accept specialeventstream_start
andeventstream_end
events which denote the first and the last event in the entire eventstream correspondingly.Note that the sequence of events and
weight_col
is important.- adjacent_events_onlybool, default True
Is used only when
event_pair
is notNone
; specifies whether events need to be adjacent to be included.For example, if
event_pair=("login", "purchase")
andadjacent_events_only=False
, then the sequence (“login”, “main”, “trading”, “purchase”) will contain a valid pair (which is not the case withadjacent_events_only=True
).- weight_colstr, default None
Specify a unit of observation, inside which time differences will be computed. By default, the values from
user_id
column inEventstreamSchema
is taken.For example:
If
user_id
- time deltas will be computed only for events inside each user path.If
session_id
- the same, but inside each session.
- time_agg{None, “mean”, “median”}, default None
Specify the aggregation policy for the time distances. Aggregate based on passed
weight_col
.If
None
- no aggregation;mean
andmedian
plot distributions ofweight_col
unit mean or unitmedian
timedeltas.
For example, if session id is specified in
weight_col
, one observation per session (for example, session median) will be provided for the histogram.- timedelta_unitDATETIME_UNITS, default ‘s’
Specify units of time differences the histogram should use. Use “s” for seconds, “m” for minutes, “h” for hours and “D” for days.
- log_scale: bool | tuple of bool | None, optional
If
True
- apply log scaling to thex
axis.If tuple of bool - apply log scaling to the (
x
,``y``) axes correspondingly.
- lower_cutoff_quantilefloat, optional
Specify time distance quantile as the lower boundary. The values below the boundary are truncated.
- upper_cutoff_quantilefloat, optional
Specify time distance quantile as the upper boundary. The values above the boundary are truncated.
- binsint or {“auto”, “fd”, “doane”, “scott”, “stone”, “rice”, “sturges”, “sqrt”}, default 20
Generic bin parameter that can be the name of a reference rule or the number of bins. Passed to numpy.histogram_bin_edges.
- Returns:
- None
- plot(width=6.0, height=4.5)[source]#
Create a sns.histplot based on the calculated values.
- Parameters:
- widthfloat, default 6.0
Width in inches.
- heightfloat, default 4.5
Height in inches.
- Returns
- ——-
- :matplotlib_axes:`matplotlib.axes.Axes<>`
The matplotlib axes containing the plot.
- property values#
- Returns:
- tuple(np.ndarray, np.ndarray)
The first array contains the values for histogram.
The first array contains the bin edges.
Eventstream#
- Eventstream.timedelta_hist(raw_events_only=False, event_pair=None, adjacent_events_only=True, weight_col=None, time_agg=None, timedelta_unit='s', log_scale=None, lower_cutoff_quantile=None, upper_cutoff_quantile=None, bins=20, width=6.0, height=4.5, show_plot=True)[source]#
Plot the distribution of the time deltas between two events. Support various distribution types, such as distribution of time for adjacent consecutive events, or for a pair of pre-defined events, or median transition time from event to event per user/session.
- Parameters:
- show_plotbool, default True
If
True
, histogram is shown.- See other parameters’ description
- Returns:
- TimedeltaHist
A
TimedeltaHist
class instance fitted with given parameters.