[docs]classLabelLostUsersParams(ParamsModel):""" A class with parameters for :py:class:`.LabelLostUsers` class. """timeout:Optional[Tuple[float,DATETIME_UNITS]]lost_users_list:Optional[Union[List[int],List[str]]]_widgets={"timeout":ReteTimeWidget(),"lost_users_list":ListOfIds(),}
[docs]@docstrings.get_sections(base="LabelLostUsers")# type: ignoreclassLabelLostUsers(DataProcessor):""" Create one of synthetic events in each user's path: ``lost_user`` or ``absent_user``. Parameters ---------- Only one of parameters could be used at the same time timeout : Tuple(float, :numpy_link:`DATETIME_UNITS<>`), optional Threshold value and its unit of measure. Calculate timedelta between the last event in each user's path and the last event in the whole eventstream. For users with timedelta greater or equal to selected ``timeout``, a new synthetic event - ``lost_user`` will be added. For other users paths a new synthetic event - ``absent_user`` will be added. lost_users_list : list of int or list of str, optional If the `list of user_ids` is given new synthetic event - ``lost_user`` will be added to each user from the list. For other user's paths will be added new synthetic event - ``absent_user``. Returns ------- Eventstream ``Eventstream`` with new synthetic events only - one for each user: +-----------------+-----------------+------------------+ | **event_name** | **event_type** | **timestamp** | +-----------------+-----------------+------------------+ | lost_user | lost_user | last_event | +-----------------+-----------------+------------------+ | absent_user | absent_user | last_event | +-----------------+-----------------+------------------+ Raises ------ ValueError Raised when both ``timeout`` and ``lost_users_list`` are either empty or given. Notes ----- See :doc:`Data processors user guide</user_guides/dataprocessors>` for the details. """params:LabelLostUsersParams@time_performance(scope="label_lost_users",event_name="init",)def__init__(self,params:LabelLostUsersParams):super().__init__(params=params)@time_performance(# type: ignorescope="label_lost_users",event_name="apply",)defapply(self,df:pd.DataFrame,schema:EventstreamSchemaType)->pd.DataFrame:user_col=schema.user_idtime_col=schema.event_timestamptype_col=schema.event_typeevent_col=schema.event_nametimeout,timeout_unit=None,Nonelost_users_list=self.params.lost_users_listdata_lost=pd.DataFrame()ifself.params.timeout:timeout,timeout_unit=self.params.timeoutiftimeoutandlost_users_list:raiseValueError("timeout and lost_users_list parameters cannot be used simultaneously!")ifnottimeoutandnotlost_users_list:raiseValueError("Either timeout or lost_users_list must be specified!")iftimeoutandtimeout_unit:data_lost=df.groupby(user_col,as_index=False).last()data_lost["diff_end_to_end"]=data_lost[time_col].max()-data_lost[time_col]data_lost["diff_end_to_end"]/=np.timedelta64(1,timeout_unit)# type: ignoredata_lost[type_col]=np.where(data_lost["diff_end_to_end"]<timeout,"absent_user","lost_user")data_lost[event_col]=data_lost[type_col]deldata_lost["diff_end_to_end"]iflost_users_list:data_lost=df.groupby(user_col,as_index=False).last()data_lost[type_col]=np.where(data_lost["user_id"].isin(lost_users_list),"lost_user","absent_user")data_lost[event_col]=data_lost[type_col]result=pd.concat([df,data_lost])collect_data_performance(scope="label_lost_users",event_name="metadata",called_params=self.to_dict()["values"],not_hash_values=["timeout"],performance_data={"parent":{"shape":df.shape,"hash":hash_dataframe(df),},"child":{"shape":result.shape,"hash":hash_dataframe(result),},},)returnresult