{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Compare groups" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "install retentioneering if running from google.colab or for the first time:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip3 install retentioneering" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Statistical comparison" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Very often we need to compare two groups of users using some metrics. It can be during A/B test results analysis, or comparing two user segments from different channels, or comparing cohorts of users and etc." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this tutorial we will use a simple dataset of user activity logs in app or web-site during hypothetical A/B test. It has raw behavior event-level logs as well as additional information, specifying the particular user in the test or control and some transaction information.\n", "\n", "We start from importing retentioneering and sample datasets:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
user_ideventtimestampuser_backettransaction_valuetransaction_ID
0219483890catalog2019-11-01 17:59:13.273932testNaNNone
1219483890product2019-11-01 17:59:28.459271testNaNNone
2219483890cart2019-11-01 17:59:29.502214testNaNNone
3219483890catalog2019-11-01 17:59:32.557029testNaNNone
4964964743catalog2019-11-01 21:38:19.283663testNaNNone
\n", "
" ], "text/plain": [ " user_id event timestamp user_backet \\\n", "0 219483890 catalog 2019-11-01 17:59:13.273932 test \n", "1 219483890 product 2019-11-01 17:59:28.459271 test \n", "2 219483890 cart 2019-11-01 17:59:29.502214 test \n", "3 219483890 catalog 2019-11-01 17:59:32.557029 test \n", "4 964964743 catalog 2019-11-01 21:38:19.283663 test \n", "\n", " transaction_value transaction_ID \n", "0 NaN None \n", "1 NaN None \n", "2 NaN None \n", "3 NaN None \n", "4 NaN None " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import retentioneering\n", "\n", "# load sample data\n", "data = retentioneering.datasets.load_simple_ab_test()\n", "\n", "data.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see regular columns with information about user actions (‘user_id’, ‘event’, ‘timestamp’) as well as column regarding A/B test: ‘user_backet’, and columns with transactions information (events ‘payment_done’): ‘transaction_value’ and ‘transaction_ID’.\n", "\n", "Next, as usually we need to update retentioneering.config to specify column names for events, user_ids and time:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "retentioneering.config.update({\n", " 'user_col': 'user_id',\n", " 'event_col':'event',\n", " 'event_time_col':'timestamp',\n", "})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let’s explore column ‘user_backet’:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "not_in_test 2624\n", "control 573\n", "test 554\n", "Name: user_backet, dtype: int64" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(data\n", " .drop_duplicates(subset=['user_id'])['user_backet']\n", " .value_counts())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see that our dataset has 554 and 573 unique users in test and control groups, correspondingly. Let’s put those user_id’s in separate variables ‘test’ and ‘control’:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "test = data[data['user_backet']=='test']['user_id'].unique()\n", "control = data[data['user_backet']=='control']['user_id'].unique()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now everything is ready to start comparing these two groups using rete.compare() function.\n", "\n", "Let’s say we would like to compare conversion rate in the test vs control groups.\n", "\n", "For this we would need to specify a function that given one user trajectory (in form of dataframe) will return a numerical value, 1 (converted) or 0 (not converted) in our case. Importantly, function must take as an argument a dataframe of one user trajectory, perform any type of calculation and return a single numerical value.\n", "\n", "In our case user is considered converted is they have ‘payment_done’ event, so the function definition is very straightforward:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "test (mean ± SD): 0.227 ± 0.419, n = 554\n", "control (mean ± SD): 0.148 ± 0.355, n = 573\n", "'test' is greater than 'control' with P-value: 0.00034\n", "power of the test: 96.15%\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "conversion = lambda x: int(['payment_done'] in x['event'].unique())\n", "\n", "data.rete.compare(groups=(test, control),\n", " function=conversion,\n", " test='mannwhitneyu',\n", " group_names=('test','control'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Parameters for function rete.compare():\n", "\n", "* groups: tuple (g1, g2), where g1 and g2 are collections of user_id`s (list, tuple or set) of two groups for comparison.\n", "\n", "* function(x): function that takes a single user dataset as an argument and returns a single numerical value (see below for more examples).\n", "\n", "* test: {‘mannwhitneyu’, 'ttest', ‘ks_2samp’}, selected statistical test to test the null hypothesis that 2 independent samples are drawn from the same distribution. One-sided tests are used, meaning that distributions are compared for ‘less’ or ‘greater’. Rule of thumb is: for discrete variables (like conversions, number of purchases) use Mann-Whitney test (‘mannwhitneyu’) or t-test ('ttest'). For continuous variables (like average_check) use Kolmogorov-Smirnov test (‘ks_2samp’).\n", "\n", "* group_names - optional parameter to set group names for the output.\n", "\n", "* alpha - Selected level of significance, to calculate power of the test: probability to correctly reject a H0 when H1 is true. Default value is 0.05.\n", "\n", "We can see that in the example above the test group has statistically significantly higher conversion rate than the control group (given P-value threshold for significance 0.05), therefore change must be implemented. Histogram just illustrates how the selected metric is distributed between groups (in the example above metrics can only be 0 or 1). \n", "\n", "To illustrate better how to define custom metrics and pass it as an argument to rete.compare() function let’s compare a couple more metrics. Suppose we would like to compare the average check between test and control groups. Again, it’s very easy:\n", "\n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "test (mean ± SD): 736.026 ± 149.001, n = 126\n", "control (mean ± SD): 732.980 ± 139.960, n = 85\n", "'control' is greater than 'test' with P-value: 0.55199\n", "power of the test: 3.65%\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "average_check = lambda x: x['transaction_value'].mean()\n", "\n", "\n", "data.rete.compare(groups=(test, control),\n", " function=average_check,\n", " test='ks_2samp',\n", " group_names=('test','control'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this case we can see that there is no statistically significant difference in average checks between two groups (P-value is 0.55, selected threshold is 0.05). Note that for continuous variables like average check we used Kolmogorov-Smirnov test. While we can conclude that users in the test group converted to purchase more often than in the control group, there was no effect on the average check." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## More complex metrics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Just to illustrate that metrics function can be any complex, let’s consider another example. Suppose we have a separate file, which has all transaction_id’s and their statuses (for example, whether a transaction was already confirmed by the bank or not). \n", "\n", "For the demonstration purpose let’s just create such dataframe with randomized data:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
transaction_IDconfirmed
07121884True
19641982True
29826287True
39647603True
48125650True
\n", "
" ], "text/plain": [ " transaction_ID confirmed\n", "0 7121884 True\n", "1 9641982 True\n", "2 9826287 True\n", "3 9647603 True\n", "4 8125650 True" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "import random\n", "\n", "all_ids = data['transaction_ID'].dropna().unique()\n", "status = pd.DataFrame({'transaction_ID': all_ids, \n", " 'confirmed': [random.random() > 0.2 \n", " for _ in all_ids]})\n", "\n", "status.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, let’s write metrics function confirmed_purch, which will return 1 if user has confirmed transactions or 0 if has not:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "def confirmed_purch(x):\n", " \n", " # get list of transactions for user x\n", " trans_list = x['transaction_ID'].unique()\n", " \n", " # get all status records for transactions from user x\n", " trans_status = status[status['transaction_ID'].isin(trans_list)]\n", " \n", " # True / False if user has conf transactions\n", " has_conf_trans = trans_status['confirmed'].sum() > 0\n", " \n", " # convert bool to int:\n", " return int(has_conf_trans)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It’s very straightforward. Again, function confirmed_purch() takes a single user trajectory as an argument (as pandas dataframe) and returns a single numerical value. Let’s compare our groups using confirmed_purch metric: " ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "test (mean ± SD): 0.190 ± 0.392, n = 554\n", "control (mean ± SD): 0.131 ± 0.337, n = 573\n", "'test' is greater than 'control' with P-value: 0.00363\n", "power of the test: 85.25%\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "data.rete.compare(groups=(test, control),\n", " function=confirmed_purch,\n", " test='mannwhitneyu',\n", " group_names=('test','control'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we can see, a statistically significant difference in the conversion to confirmed purchases still holds with selected significance level 0.05. Note that because overall there are less confirmed transaction than total number of transactions, power for the test above is less than previously obtained value for all transactions (85% vs. 96%) " ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": true } }, "nbformat": 4, "nbformat_minor": 4 }