What’s new in 3.2.0 (2023-11-13)#

New Features#

Eventstream#

  • Improved working with RawDataSchema. All columns of a raw DataFrame except user_id, event, timestamp are considered as custom now and added to the eventstream automatically. New argument custom_cols shapes a white list for the columns to be added only. See eventstream user guide for details.

  • EventstreamSchema can be defined as a dictionary. See eventstream user guide.

  • Synthetic events path_start and path_end are added automatically to an eventstream as if AddStartEndEvents was applied.

Data processors#

  • Added GroupEventsBulk data processor. Now you can apply multiple grouping operations simultaneously.

stream.group_events_bulk(
    {
        'product': lambda _df: _df['event'].str.startswith('product'),
        'delivery': lambda _df: _df['event'].str.startswith('delivery')
    }
)
  • Added Pipe data processor. It allows you to modify an eventstream as if you worked with pandas DataFrame.

stream.pipe(lambda _df: _df.assign(new_column=100))
stream.filter_events(lambda _df: _df['user_id'] == 'user_12345')
  • The architecture of the data processors was improved and simplified. Some legacy features were removed.

Transition graph#

  • The default edges_weight_col and nodes_weight_col is set to user_id. It means that the default weights are associated with the number of unique users who had given transition or experienced given event.