SplitSessions#
Data processor#
- class retentioneering.data_processors_lib.split_sessions.SplitSessions(params)[source]#
Create new synthetic events, that divide users’ paths on sessions:
session_start
(orsession_start_cropped
) andsession_end
(orsession_end_cropped
). Also create a new column that contains session number for each event in input eventstream. Session number will take the form:{user_id}_{session_number through one user path}
.- Parameters:
- timeoutTuple(float, DATETIME_UNITS), optional
Threshold value and its unit of measure.
session_start
andsession_end
events are always placed before the first and after the last event in each user’s path. Because user can have more than one session, it calculates timedelta between every two consecutive events in each user’s path. If the calculated timedelta is more than selected timeout, new synthetic events -session_start
andsession_end
are created inside the user path, marking session starting and ending points.- delimiter_eventslist of str, optional
Delimiters define special events in the eventstream that indicate the start and the end of a session.
If a single delimiter is defined, it is associated with the session start and the end simultaneously. Delimiting events will be replaced with “session_start” event.
If a list of two delimiters is defined, the first and second events are associated with session start and end correspondingly. Delimiting events will be replaced with “session_start” and “session_end” events.
- delimiter_collist, optional
Determines a column that already contains custom session identifiers.
- mark_truncatedbool, default False
Works with
timeout
argument only. IfTrue
- calculates timedelta between:first event in each user’s path and first event in the whole eventstream.
last event in each user’s path and last event in the whole eventstream.
For users with timedelta less than selected
timeout
, a new synthetic event -session_start_cropped
orsession_end_cropped
will be added.- session_colstr, default “session_id”
The name of the
session_col
.
- Returns:
- Eventstream
Eventstream
with new synthetic events andsession_col
.event_name
event_type
timestamp
session_start
session_start
first_event
session_end
session_end
last_event
session_start_cropped
session_start_cropped
first_event
session_end_cropped
session_end_cropped
last_event
If the delta between timestamps of two consecutive events (raw_event_n and raw_event_n+1) is greater than the selected
timeout
the user will have more than one session:user_id
event_name
event_type
timestamp
session_col
1
session_start
session_start
first_event
1_0
1
session_end
session_end
raw_event_n
1_0
1
session_start
session_start
raw_event_n+1
1_1
1
session_end
session_end
last_event
1_1
See also
TimedeltaHist
Plot the distribution of the time deltas between two events.
Eventstream.describe
Show general eventstream statistics.
Eventstream.describe_events
Show general eventstream events statistics.
Notes
See Data processors user guide for the details.
Examples
Splitting with a single delimiting event.
df = pd.DataFrame( [ [111, "session_delimiter", "2023-01-01 00:00:00"], [111, "A", "2023-01-01 00:00:01"], [111, "B", "2023-01-01 00:00:02"], [111, "session_delimiter", "2023-01-01 00:00:04"], [111, "C", "2023-01-01 00:00:04"], ], columns=["user_id", "event", "timestamp"] ) Eventstream(df)\ .split_sessions(delimiter_events=["session_delimiter"])\ .to_dataframe()\ .sort_values(["user_id", "event_index"])\ [["user_id", "event", "timestamp", "session_id"]] user_id event timestamp session_id 0 111 session_start 2023-01-01 00:00:00 111_1 1 111 A 2023-01-01 00:00:01 111_1 2 111 B 2023-01-01 00:00:02 111_1 3 111 session_end 2023-01-01 00:00:02 111_1 4 111 session_start 2023-01-01 00:00:04 111_2 5 111 C 2023-01-01 00:00:04 111_2 6 111 session_end 2023-01-01 00:00:04 111_2
Splitting with a couple of delimiters indicating session start and session end.
df = pd.DataFrame( [ [111, "custom_start", "2023-01-01 00:00:00"], [111, "A", "2023-01-01 00:00:01"], [111, "B", "2023-01-01 00:00:02"], [111, "custom_end", "2023-01-01 00:00:02"], [111, "custom_start", "2023-01-01 00:00:04"], [111, "C", "2023-01-01 00:00:04"], [111, "custom_end", "2023-01-01 00:00:04"] ], columns=["user_id", "event", "timestamp"] ) stream = Eventstream(df) stream.split_sessions(delimiter_events=["custom_start", "custom_end"])\ .to_dataframe()\ .sort_values(["user_id", "event_index"])\ [["user_id", "event", "timestamp", "session_id"]] user_id event timestamp session_id 0 111 session_start 2023-01-01 00:00:00 111_1 1 111 A 2023-01-01 00:00:01 111_1 2 111 B 2023-01-01 00:00:02 111_1 3 111 session_end 2023-01-01 00:00:02 111_1 4 111 session_start 2023-01-01 00:00:04 111_2 5 111 C 2023-01-01 00:00:04 111_2 6 111 session_end 2023-01-01 00:00:04 111_2
Splitting by a ‘delimiter_col’.
df = pd.DataFrame( [ [111, "A", "2023-01-01 00:00:01", "session_1"], [111, "B", "2023-01-01 00:00:02", "session_1"], [111, "C", "2023-01-01 00:00:03", "session_2"], [111, "D", "2023-01-01 00:00:04", "session_2"], ], columns=["user_id", "event", "timestamp", "custom_ses_id"] ) raw_data_schema = {"custom_cols": [{"raw_data_col": "custom_ses_id", "custom_col": "custom_ses_id"}]} stream = Eventstream(df, raw_data_schema=raw_data_schema) stream.split_sessions(delimiter_col="custom_ses_id")\ .to_dataframe()\ .sort_values(["user_id", "event_index"])\ [["user_id", "event", "timestamp", "session_id", "custom_ses_id"]] user_id event timestamp session_id custom_ses_id 0 111 session_start 2023-01-01 00:00:01 111_1 session_1 1 111 A 2023-01-01 00:00:01 111_1 session_1 2 111 B 2023-01-01 00:00:02 111_1 session_1 3 111 session_end 2023-01-01 00:00:02 111_1 session_1 4 111 session_start 2023-01-01 00:00:03 111_2 session_2 5 111 C 2023-01-01 00:00:03 111_2 session_2 6 111 D 2023-01-01 00:00:04 111_2 session_2 7 111 session_end 2023-01-01 00:00:04 111_2 session_2
- class retentioneering.data_processors_lib.split_sessions.SplitSessionsParams(*, timeout=None, delimiter_events=None, delimiter_col=None, mark_truncated=False, session_col='session_id')[source]#
A class with parameters for
SplitSessions
class.
Eventstream#
- SplitSessionsHelperMixin.split_sessions(timeout=None, delimiter_events=None, delimiter_col=None, session_col='session_id', mark_truncated=False)[source]#
A method of
Eventstream
class that creates new synthetic events in each user’s path:session_start
(orsession_start_cropped
) andsession_end
(orsession_end_cropped
). The created events divide users’ paths on sessions. Also creates a new column that contains session number for each event in the input eventstream Session number will take the form:{user_id}_{session_number through one user path}
. The created events and column are added to the input eventstream.- Parameters:
- timeoutTuple(float, DATETIME_UNITS), optional
Threshold value and its unit of measure.
session_start
andsession_end
events are always placed before the first and after the last event in each user’s path. Because user can have more than one session, it calculates timedelta between every two consecutive events in each user’s path. If the calculated timedelta is more than selected timeout, new synthetic events -session_start
andsession_end
are created inside the user path, marking session starting and ending points.- delimiter_eventslist of str, optional
Delimiters define special events in the eventstream that indicate the start and the end of a session.
If a single delimiter is defined, it is associated with the session start and the end simultaneously. Delimiting events will be replaced with “session_start” event.
If a list of two delimiters is defined, the first and second events are associated with session start and end correspondingly. Delimiting events will be replaced with “session_start” and “session_end” events.
- Returns:
- Eventstream
Input
eventstream
with new synthetic events andsession_col
.