Eventstream core#
Eventstream#
- class retentioneering.eventstream.eventstream.Eventstream(raw_data, raw_data_schema=None, schema=None, prepare=True, index_order=None, user_sample_size=None, user_sample_seed=None, events_order=None, custom_cols=None, add_start_end_events=True, convert_tz=None, segment_cols=None)[source]#
Collection of tools for storing and processing clickstream data.
- Parameters:
- raw_datapd.DataFrame or pd.Series
Raw clickstream data.
- raw_data_schemadict or RawDataSchema, optional
Represents mapping rules connecting important eventstream columns with the raw data columns. The keys are defined in
RawDataSchema
. The values are the corresponding column names in the raw data.custom_cols
key stands for the defining additional columns that can be used in the eventstream. See the Eventstream user guide for the details.- schemadict or EventstreamSchema, optional
Represents a schema of the created eventstream. The keys are defined in
EventstreamSchema
. The values are the names of the corresponding eventstream columns. See the Eventstream user guide for the details.- custom_colslist of str, optional
The list of additional columns from the raw data to be included in the eventstream. If not defined, all the columns from the raw data are included.
- preparebool, default True
If
True
, input data will be transformed in the following way:event_timestamp
column is converted to pandas datetime format.event_type
column is added and filled withraw
value. If the column exists, it remains unchanged.
If
False
-raw_data
will be remained as is.
- index_orderlist of str, default DEFAULT_INDEX_ORDER
Sorting order for
event_type
column.- user_sample_sizeint of float, optional
Number (
int
) or share (float
) of all users’ trajectories that will be randomly chosen and left in final sample (all other trajectories will be removed) . See numpy documentation.- user_sample_seedint, optional
A seed value that is used to generate user samples. See numpy documentation.
- events_orderlist of str, optional
Sorting order for
event_name
column, if there are events with equal timestamps inside each user trajectory. The order of raw events is fixed once while eventstream initialization.- add_start_end_eventsbool, default True
If True,
path_start
andpath_end
synthetic events are added to each path explicitly. See alsoAddStartEndEvents
documentation.- convert_tz‘local’ or ‘UTC’, optional
Timestamp column with timezones is not supported in the eventstream and should be explicitly converted.
If
UTC
, the timestamp column will be converted to utc time, and the timezone part will be truncated.If
local
, the timezone will be truncated.
Notes
See Eventstream user guide for the details.
- add_custom_col(name, data)[source]#
Add custom column to an existing
eventstream
.- Parameters:
- namestr
New column name.
- datapd.Series
If
pd.Series
- new column with given values will be added.If
None
- new column will be filled withnp.nan
.
- Returns:
- Eventstream
- append_eventstream(eventstream)[source]#
Append
eventstream
with the same schema.- Parameters:
- eventstreamEventstream
- Returns:
- eventstream
- Raises:
- ValueError
If
EventstreamSchemas
of twoeventstreams
are not equal.
- to_dataframe(copy=False, drop_segment_events=True)[source]#
Convert
eventstream
topd.DataFrame
- Parameters:
- copybool, default False
If
True
copy data from current eventstream. See details in the pandas documentation.- drop_segment_eventsbool, default True
If
True
remove segment synthetic events.
- Returns:
- pd.DataFrame
Schema#
- class retentioneering.eventstream.schema.EventstreamSchema(event_id='event_id', event_type='event_type', event_index='event_index', event_name='event', event_timestamp='timestamp', user_id='user_id', custom_cols=<factory>)[source]#
Define a schema for
eventstream
columns names. If names of the columns are different from default names, they need to be specified.- Parameters:
- event_idstr, default “event_id”
- event_typestr, default “event_type”
- event_indexstr, default “event_index”
- event_namestr, default “event”
- event_timestampstr, default “timestamp”
- user_idstr, default “user_id”
- custom_colslist of str, optional
Notes
See Eventstream user guide for the details.
- class retentioneering.eventstream.schema.RawDataSchema(event_name='event', event_timestamp='timestamp', user_id='user_id', event_index=None, event_type=None, event_id=None, custom_cols=<factory>)[source]#
Define schema for
raw_data
columns names. If names of the columns are different from default names, they need to be specified.- Parameters:
- event_namestr, default “event”
- event_timestampstr, default “timestamp”
- user_idstr, default “user_id”
- event_typestr, optional
- event_index: str, optional
- custom_colslist, optional
Notes
See Eventstream user guide for the details.