Eventstream Core#
Eventstream#
- class retentioneering.eventstream.eventstream.Eventstream(raw_data, raw_data_schema=None, schema=None, prepare=True, index_order=None, relations=None, user_sample_size=None, user_sample_seed=None, events_order=None)[source]#
Collection of tools for storing and processing clickstream data.
- Parameters:
- raw_datapd.DataFrame or pd.Series
Raw clickstream data.
- raw_data_schemaRawDataSchema, optional
Should be specified as an instance of class
RawDataSchema
:If
raw_data
column names are different from the defaultRawDataSchema
.If there is at least one
custom_col
inraw_data
.
- schemaEventstreamSchema, optional
Schema of the created
eventstream
. See default schemaEventstreamSchema
.- preparebool, default True
If
True
, input data will be transformed in the following way:event_timestamp
column is converted to pandas datetime format.event_type
column is added and filled withraw
value.If the column exists, it remains unchanged.
If
False
-raw_data
will be remained as is.
- index_orderlist of str, default DEFAULT_INDEX_ORDER
Sorting order for
event_type
column.- relationslist, optional
- user_sample_sizeint of float, optional
Number (
int
) or share (float
) of all users’ trajectories that will be randomly chosen and left in final sample (all other trajectories will be removed) . See numpy documentation.- user_sample_seedint, optional
A seed value that is used to generate user samples. See numpy documentation.
- events_orderlist of str, optional
Sorting order for
event_name
column, if there are events with equal timestamps inside each user trajectory. The order of raw events is fixed once while eventstream initialization.
Notes
See Eventstream user guide for the details.
- add_custom_col(name, data)[source]#
Add custom column to an existing
eventstream
.- Parameters:
- namestr
New column name.
- datapd.Series
If
pd.Series
- new column with given values will be added.If
None
- new column will be filled withnp.nan
.
- Returns:
- Eventstream
- append_eventstream(eventstream)[source]#
Append
eventstream
with the same schema.- Parameters:
- eventstreamEventstream
- Returns:
- eventstream
- Raises:
- ValueError
If
EventstreamSchemas
of twoeventstreams
are not equal.
- to_dataframe(raw_cols=False, show_deleted=False, copy=False)[source]#
Convert
eventstream
topd.Dataframe
- Parameters:
- raw_colsbool, default False
If
True
- original columns of the inputraw_data
will be shown.- show_deletedbool, default False
If
True
- show all rows ineventstream
.- copybool, default False
If
True
- copy data from currenteventstream
. See details in the pandas documentation.
- Returns:
- pd.DataFrame
Schema#
- class retentioneering.eventstream.schema.EventstreamSchema(event_id='event_id', event_type='event_type', event_index='event_index', event_name='event', event_timestamp='timestamp', user_id='user_id', custom_cols=<factory>)[source]#
Define a schema for
eventstream
columns names. If names of the columns are different from default names, they need to be specified.- Parameters:
- event_idstr, default “event_id”
- event_typestr, default “event_type”
- event_indexstr, default “event_index”
- event_namestr, default “event”
- event_timestampstr, default “timestamp”
- user_idstr, default “user_id”
- custom_colslist of str, optional
Notes
See Eventstream user guide for the details.
- class retentioneering.eventstream.schema.RawDataSchema(event_name='event', event_timestamp='timestamp', user_id='user_id', event_index=None, event_type=None, event_id=None, custom_cols=<factory>)[source]#
Define schema for
raw_data
columns names. If names of the columns are different from default names, they need to be specified.- Parameters:
- event_namestr, default “event”
- event_timestampstr, default “timestamp”
- user_idstr, default “user_id”
- event_typestr, optional
- event_index: str, optional
- custom_colslist, optional
Notes
See Eventstream user guide for the details.