Eventstream Core#

Eventstream#

class retentioneering.eventstream.eventstream.Eventstream(raw_data, raw_data_schema=None, schema=None, prepare=True, index_order=None, relations=None, user_sample_size=None, user_sample_seed=None)[source]#

Collection of tools for storing and processing clickstream data.

Parameters:
raw_datapd.DataFrame or pd.Series

Raw clickstream data.

raw_data_schemaRawDataSchema, optional

Should be specified as an instance of class RawDataSchema:

  • If raw_data column names are different from the default RawDataSchema.

  • If there is at least one custom_col in raw_data.

schemaEventstreamSchema, optional

Schema of the created eventstream. See default schema EventstreamSchema.

preparebool, default True
  • If True, input data will be transformed in the following way:

    • event_timestamp column is converted to pandas datetime format.

    • event_type column is added and filled with raw value.
      If the column exists, it remains unchanged.
  • If False - raw_data will be remained as is.

index_orderlist of str, default DEFAULT_INDEX_ORDER

Sorting order for event_type column.

relationslist, optional
user_sample_sizeint of float, optional

Number (int) or share (float) of all users’ trajectories that will be randomly chosen and left in final sample (all other trajectories will be removed) . See numpy documentation.

user_sample_seedint, optional

A seed value that is used to generate user samples. See numpy documentation.

Notes

See Eventstream user guide for the details.

add_custom_col(name, data)[source]#

Add custom column to an existing eventstream.

Parameters:
namestr

New column name.

datapd.Series
  • If pd.Series - new column with given values will be added.

  • If None - new column will be filled with np.nan.

Returns:
Eventstream
append_eventstream(eventstream)[source]#

Append eventstream with the same schema.

Parameters:
eventstreamEventstream
Returns:
eventstream
Raises:
ValueError

If EventstreamSchemas of two eventstreams are not equal.

copy()[source]#

Make a copy of current eventstream.

Returns:
Eventstream
index_events()[source]#

Sort and index eventstream using DEFAULT_INDEX_ORDER.

Returns:
None
to_dataframe(raw_cols=False, show_deleted=False, copy=False)[source]#

Convert eventstream to pd.Dataframe

Parameters:
raw_colsbool, default False

If True - original columns of the input raw_data will be shown.

show_deletedbool, default False

If True - show all rows in eventstream.

copybool, default False

If True - copy data from current eventstream. See details in the pandas documentation.

Returns:
pd.DataFrame

Schema#

class retentioneering.eventstream.schema.EventstreamSchema(event_id='event_id', event_type='event_type', event_index='event_index', event_name='event', event_timestamp='timestamp', user_id='user_id', custom_cols=<factory>)[source]#

Define a schema for eventstream columns names. If names of the columns are different from default names, they need to be specified.

Parameters:
event_idstr, default “event_id”
event_typestr, default “event_type”
event_indexstr, default “event_index”
event_namestr, default “event”
event_timestampstr, default “timestamp”
user_idstr, default “user_id”
custom_colslist of str, optional

Notes

See Eventstream user guide for the details.

class retentioneering.eventstream.schema.RawDataSchema(event_name='event', event_timestamp='timestamp', user_id='user_id', event_type=None, custom_cols=<factory>)[source]#

Define schema for raw_data columns names. If names of the columns are different from default names, they need to be specified.

Parameters:
event_namestr, default “event”
event_timestampstr, default “timestamp”
user_idstr, default “user_id”
event_typestr, optional
custom_colslist, optional

Notes

See Eventstream user guide for the details.