Eventstream Core#
Eventstream#
- class retentioneering.eventstream.eventstream.Eventstream(raw_data, raw_data_schema=None, schema=None, prepare=True, index_order=None, relations=None, user_sample_size=None, user_sample_seed=None)[source]#
Collection of tools for storing and processing clickstream data.
- Parameters:
- raw_datapd.DataFrame or pd.Series
Raw clickstream data.
- raw_data_schemaRawDataSchema, optional
Should be specified as an instance of class
RawDataSchema
:If
raw_data
column names are different from the defaultRawDataSchema
.If there is at least one
custom_col
inraw_data
.
- schemaEventstreamSchema, optional
Schema of the created
eventstream
. See default schemaEventstreamSchema
.- preparebool, default True
If
True
, input data will be transformed in the following way:event_timestamp
column is converted to pandas datetime format.event_type
column is added and filled withraw
value.If the column exists, it remains unchanged.
If
False
-raw_data
will be remained as is.
- index_orderlist of str, default DEFAULT_INDEX_ORDER
Sorting order for
event_type
column.- relationslist, optional
- user_sample_sizeint of float, optional
Number (
int
) or share (float
) of all users’ trajectories that will be randomly chosen and left in final sample (all other trajectories will be removed) . See numpy documentation.- user_sample_seedint, optional
A seed value that is used to generate user samples. See numpy documentation.
Notes
See Eventstream user guide for the details.
- add_custom_col(name, data)[source]#
Add custom column to an existing
eventstream
.- Parameters:
- namestr
New column name.
- datapd.Series
If
pd.Series
- new column with given values will be added.If
None
- new column will be filled withnp.nan
.
- Returns:
- Eventstream
- append_eventstream(eventstream)[source]#
Append
eventstream
with the same schema.- Parameters:
- eventstreamEventstream
- Returns:
- eventstream
- Raises:
- ValueError
If
EventstreamSchemas
of twoeventstreams
are not equal.
- to_dataframe(raw_cols=False, show_deleted=False, copy=False)[source]#
Convert
eventstream
topd.Dataframe
- Parameters:
- raw_colsbool, default False
If
True
- original columns of the inputraw_data
will be shown.- show_deletedbool, default False
If
True
- show all rows ineventstream
.- copybool, default False
If
True
- copy data from currenteventstream
. See details in the pandas documentation.
- Returns:
- pd.DataFrame
Schema#
- class retentioneering.eventstream.schema.EventstreamSchema(event_id='event_id', event_type='event_type', event_index='event_index', event_name='event', event_timestamp='timestamp', user_id='user_id', custom_cols=<factory>)[source]#
Define a schema for
eventstream
columns names. If names of the columns are different from default names, they need to be specified.- Parameters:
- event_idstr, default “event_id”
- event_typestr, default “event_type”
- event_indexstr, default “event_index”
- event_namestr, default “event”
- event_timestampstr, default “timestamp”
- user_idstr, default “user_id”
- custom_colslist of str, optional
Notes
See Eventstream user guide for the details.
- class retentioneering.eventstream.schema.RawDataSchema(event_name='event', event_timestamp='timestamp', user_id='user_id', event_type=None, custom_cols=<factory>)[source]#
Define schema for
raw_data
columns names. If names of the columns are different from default names, they need to be specified.- Parameters:
- event_namestr, default “event”
- event_timestampstr, default “timestamp”
- user_idstr, default “user_id”
- event_typestr, optional
- custom_colslist, optional
Notes
See Eventstream user guide for the details.