Eventstream Core#

Eventstream#

class retentioneering.eventstream.eventstream.Eventstream(raw_data, raw_data_schema=None, schema=None, prepare=True, index_order=None, relations=None, user_sample_size=None, user_sample_seed=None, events_order=None)[source]#

Collection of tools for storing and processing clickstream data.

Parameters:
raw_datapd.DataFrame or pd.Series

Raw clickstream data.

raw_data_schemaRawDataSchema, optional

Should be specified as an instance of class RawDataSchema:

  • If raw_data column names are different from the default RawDataSchema.

  • If there is at least one custom_col in raw_data.

schemaEventstreamSchema, optional

Schema of the created eventstream. See default schema EventstreamSchema.

preparebool, default True
  • If True, input data will be transformed in the following way:

    • event_timestamp column is converted to pandas datetime format.

    • event_type column is added and filled with raw value.
      If the column exists, it remains unchanged.
  • If False - raw_data will be remained as is.

index_orderlist of str, default DEFAULT_INDEX_ORDER

Sorting order for event_type column.

relationslist, optional
user_sample_sizeint of float, optional

Number (int) or share (float) of all users’ trajectories that will be randomly chosen and left in final sample (all other trajectories will be removed) . See numpy documentation.

user_sample_seedint, optional

A seed value that is used to generate user samples. See numpy documentation.

events_orderlist of str, optional

Sorting order for event_name column, if there are events with equal timestamps inside each user trajectory. The order of raw events is fixed once while eventstream initialization.

Notes

See Eventstream user guide for the details.

add_custom_col(name, data)[source]#

Add custom column to an existing eventstream.

Parameters:
namestr

New column name.

datapd.Series
  • If pd.Series - new column with given values will be added.

  • If None - new column will be filled with np.nan.

Returns:
Eventstream
append_eventstream(eventstream)[source]#

Append eventstream with the same schema.

Parameters:
eventstreamEventstream
Returns:
eventstream
Raises:
ValueError

If EventstreamSchemas of two eventstreams are not equal.

copy()[source]#

Make a copy of current eventstream.

Returns:
Eventstream
index_events()[source]#

Sort and index eventstream using DEFAULT_INDEX_ORDER.

Returns:
None
to_dataframe(raw_cols=False, show_deleted=False, copy=False)[source]#

Convert eventstream to pd.Dataframe

Parameters:
raw_colsbool, default False

If True - original columns of the input raw_data will be shown.

show_deletedbool, default False

If True - show all rows in eventstream.

copybool, default False

If True - copy data from current eventstream. See details in the pandas documentation.

Returns:
pd.DataFrame

Schema#

class retentioneering.eventstream.schema.EventstreamSchema(event_id='event_id', event_type='event_type', event_index='event_index', event_name='event', event_timestamp='timestamp', user_id='user_id', custom_cols=<factory>)[source]#

Define a schema for eventstream columns names. If names of the columns are different from default names, they need to be specified.

Parameters:
event_idstr, default “event_id”
event_typestr, default “event_type”
event_indexstr, default “event_index”
event_namestr, default “event”
event_timestampstr, default “timestamp”
user_idstr, default “user_id”
custom_colslist of str, optional

Notes

See Eventstream user guide for the details.

class retentioneering.eventstream.schema.RawDataSchema(event_name='event', event_timestamp='timestamp', user_id='user_id', event_index=None, event_type=None, event_id=None, custom_cols=<factory>)[source]#

Define schema for raw_data columns names. If names of the columns are different from default names, they need to be specified.

Parameters:
event_namestr, default “event”
event_timestampstr, default “timestamp”
user_idstr, default “user_id”
event_typestr, optional
event_index: str, optional
custom_colslist, optional

Notes

See Eventstream user guide for the details.