{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Users behavior clustering" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "install retentioneering if running from google.colab or for the first time:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "!pip3 install retentioneering" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Basic example" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will use a sample user activity dataset to illustrate how behavior clustering works. Let’s first import retentioneering, import sample dataset and update config to set used column names:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import retentioneering\n", "\n", "# load sample data\n", "data = retentioneering.datasets.load_simple_shop()\n", "\n", "# setup column names:\n", "retentioneering.config.update({\n", " 'user_col': 'user_id',\n", " 'event_col':'event',\n", " 'event_time_col':'timestamp',\n", "})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Trajectories vectorization" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each user trajectory is represented as a sequence of events. Before we apply any ML algorithms to users dataset we need a way to convert each user trajectory from a sequence of events to a numerical vector. This field of ML learning extensively was developed in applications for text processing. Text analysis in some sense is similar to analysis of discrete user trajectories of behavioural logs. In text processing each text document (in our case - user trajectory) consists of discrete words (in our case - event names) and we need to convert text to numerical values. Let’s work through some examples.\n", "\n", "Function rete.extract_features() returns a dataframe of vectorized user trajectories:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | cart | \n", "catalog | \n", "delivery_choice | \n", "delivery_courier | \n", "delivery_pickup | \n", "lost | \n", "main | \n", "payment_card | \n", "payment_cash | \n", "payment_choice | \n", "payment_done | \n", "product1 | \n", "product2 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
122915 | \n", "1 | \n", "18 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "7 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "4 | \n", "2 | \n", "
463458 | \n", "0 | \n", "8 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "
1475907 | \n", "1 | \n", "5 | \n", "1 | \n", "1 | \n", "0 | \n", "1 | \n", "2 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "1 | \n", "2 | \n", "
1576626 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
2112338 | \n", "0 | \n", "3 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "2 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
999275109 | \n", "1 | \n", "2 | \n", "1 | \n", "0 | \n", "1 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
999642905 | \n", "1 | \n", "2 | \n", "1 | \n", "1 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "1 | \n", "1 | \n", "0 | \n", "0 | \n", "
999914554 | \n", "1 | \n", "10 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "5 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "
999916163 | \n", "2 | \n", "2 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
999941967 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
3751 rows × 13 columns
\n", "