Eventstream Core#
Eventstream#
- class retentioneering.eventstream.eventstream.Eventstream(raw_data, raw_data_schema=None, schema=None, prepare=True, index_order=None, relations=None, user_sample_size=None, user_sample_seed=None, events_order=None)[source]#
- Collection of tools for storing and processing clickstream data. - Parameters:
- raw_datapd.DataFrame or pd.Series
- Raw clickstream data. 
- raw_data_schemaRawDataSchema, optional
- Should be specified as an instance of class - RawDataSchema:- If - raw_datacolumn names are different from the default- RawDataSchema.
- If there is at least one - custom_colin- raw_data.
 
- schemaEventstreamSchema, optional
- Schema of the created - eventstream. See default schema- EventstreamSchema.
- preparebool, default True
- If - True, input data will be transformed in the following way:- event_timestampcolumn is converted to pandas datetime format.
- event_typecolumn is added and filled with- rawvalue.If the column exists, it remains unchanged.
 
- If - False-- raw_datawill be remained as is.
 
- index_orderlist of str, default DEFAULT_INDEX_ORDER
- Sorting order for - event_typecolumn.
- relationslist, optional
- user_sample_sizeint of float, optional
- Number ( - int) or share (- float) of all users’ trajectories that will be randomly chosen and left in final sample (all other trajectories will be removed) . See numpy documentation.
- user_sample_seedint, optional
- A seed value that is used to generate user samples. See numpy documentation. 
- events_orderlist of str, optional
- Sorting order for - event_namecolumn, if there are events with equal timestamps inside each user trajectory. The order of raw events is fixed once while eventstream initialization.
 
 - Notes - See Eventstream user guide for the details. - add_custom_col(name, data)[source]#
- Add custom column to an existing - eventstream.- Parameters:
- namestr
- New column name. 
- datapd.Series
- If - pd.Series- new column with given values will be added.
- If - None- new column will be filled with- np.nan.
 
 
- Returns:
- Eventstream
 
 
 - append_eventstream(eventstream)[source]#
- Append - eventstreamwith the same schema.- Parameters:
- eventstreamEventstream
 
- Returns:
- eventstream
 
- Raises:
- ValueError
- If - EventstreamSchemasof two- eventstreamsare not equal.
 
 
 - to_dataframe(raw_cols=False, show_deleted=False, copy=False)[source]#
- Convert - eventstreamto- pd.Dataframe- Parameters:
- raw_colsbool, default False
- If - True- original columns of the input- raw_datawill be shown.
- show_deletedbool, default False
- If - True- show all rows in- eventstream.
- copybool, default False
- If - True- copy data from current- eventstream. See details in the pandas documentation.
 
- Returns:
- pd.DataFrame
 
 
 
Schema#
- class retentioneering.eventstream.schema.EventstreamSchema(event_id='event_id', event_type='event_type', event_index='event_index', event_name='event', event_timestamp='timestamp', user_id='user_id', custom_cols=<factory>)[source]#
- Define a schema for - eventstreamcolumns names. If names of the columns are different from default names, they need to be specified.- Parameters:
- event_idstr, default “event_id”
- event_typestr, default “event_type”
- event_indexstr, default “event_index”
- event_namestr, default “event”
- event_timestampstr, default “timestamp”
- user_idstr, default “user_id”
- custom_colslist of str, optional
 
 - Notes - See Eventstream user guide for the details. 
- class retentioneering.eventstream.schema.RawDataSchema(event_name='event', event_timestamp='timestamp', user_id='user_id', event_index=None, event_type=None, event_id=None, custom_cols=<factory>)[source]#
- Define schema for - raw_datacolumns names. If names of the columns are different from default names, they need to be specified.- Parameters:
- event_namestr, default “event”
- event_timestampstr, default “timestamp”
- user_idstr, default “user_id”
- event_typestr, optional
- event_index: str, optional
- custom_colslist, optional
 
 - Notes - See Eventstream user guide for the details.