七周成为数据分析师(案例python脚本)

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df=pd.read_csv("CDNOW_MASTER.CSV")
df.head()
Out[1]:
  ID DATE NUM TOTAL
0 1 19970101 1 11.77
1 2 19970112 1 12.00
2 2 19970112 5 77.00
3 3 19970102 2 20.76
4 3 19970330 2 20.76
  • ID:购买用户id
  • DATE:用户购买日期
  • NUM:用户购买数量
  • TOTAL:用户购买总额
In [2]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 69659 entries, 0 to 69658
Data columns (total 4 columns):
ID       69659 non-null int64
DATE     69659 non-null int64
NUM      69659 non-null int64
TOTAL    69659 non-null float64
dtypes: float64(1), int64(3)
memory usage: 2.1 MB
In [3]:
df.describe()
Out[3]:
  ID DATE NUM TOTAL
count 69659.000000 6.965900e+04 69659.000000 69659.000000
mean 11470.854592 1.997228e+07 2.410040 35.893648
std 6819.904848 3.837735e+03 2.333924 36.281942
min 1.000000 1.997010e+07 1.000000 0.000000
25% 5506.000000 1.997022e+07 1.000000 14.490000
50% 11410.000000 1.997042e+07 2.000000 25.980000
75% 17273.000000 1.997111e+07 3.000000 43.700000
max 23570.000000 1.998063e+07 99.000000 1286.010000
  • 大部分订单只有1-3的销量,平均值为2.4,且存在一定极值干扰
  • 用户消费金额比较稳定,平均数是36,中位数是25,存在一定极值干扰
In [4]:
df['DATE']=pd.to_datetime(df['DATE'],format='%Y%m%d')
df['month']=df.DATE.values.astype('datetime64[M]')
df
Out[4]:
  ID DATE NUM TOTAL month
0 1 1997-01-01 1 11.77 1997-01-01
1 2 1997-01-12 1 12.00 1997-01-01
2 2 1997-01-12 5 77.00 1997-01-01
3 3 1997-01-02 2 20.76 1997-01-01
4 3 1997-03-30 2 20.76 1997-03-01
5 3 1997-04-02 2 19.54 1997-04-01
6 3 1997-11-15 5 57.45 1997-11-01
7 3 1997-11-25 4 20.96 1997-11-01
8 3 1998-05-28 1 16.99 1998-05-01
9 4 1997-01-01 2 29.33 1997-01-01
10 4 1997-01-18 2 29.73 1997-01-01
11 4 1997-08-02 1 14.96 1997-08-01
12 4 1997-12-12 2 26.48 1997-12-01
13 5 1997-01-01 2 29.33 1997-01-01
14 5 1997-01-14 1 13.97 1997-01-01
15 5 1997-02-04 3 38.90 1997-02-01
16 5 1997-04-11 3 45.55 1997-04-01
17 5 1997-05-31 3 38.71 1997-05-01
18 5 1997-06-16 2 26.14 1997-06-01
19 5 1997-07-22 2 28.14 1997-07-01
20 5 1997-09-15 3 40.47 1997-09-01
21 5 1997-12-08 4 46.46 1997-12-01
22 5 1997-12-12 3 40.47 1997-12-01
23 5 1998-01-03 3 37.47 1998-01-01
24 6 1997-01-01 1 20.99 1997-01-01
25 7 1997-01-01 2 28.74 1997-01-01
26 7 1997-10-11 7 97.43 1997-10-01
27 7 1998-03-22 9 138.50 1998-03-01
28 8 1997-01-01 1 9.77 1997-01-01
29 8 1997-02-13 1 13.97 1997-02-01
... ... ... ... ... ...
69629 23556 1997-09-27 3 31.47 1997-09-01
69630 23556 1998-01-03 2 28.98 1998-01-01
69631 23556 1998-06-07 2 28.98 1998-06-01
69632 23557 1997-03-25 1 14.37 1997-03-01
69633 23558 1997-03-25 2 28.13 1997-03-01
69634 23558 1997-05-18 3 45.51 1997-05-01
69635 23558 1997-06-24 2 23.74 1997-06-01
69636 23558 1998-02-25 4 48.22 1998-02-01
69637 23559 1997-03-25 2 23.54 1997-03-01
69638 23559 1997-05-18 3 35.31 1997-05-01
69639 23559 1997-06-27 3 52.80 1997-06-01
69640 23560 1997-03-25 1 18.36 1997-03-01
69641 23561 1997-03-25 2 30.92 1997-03-01
69642 23561 1998-01-28 1 15.49 1998-01-01
69643 23561 1998-05-29 3 37.05 1998-05-01
69644 23562 1997-03-25 2 29.33 1997-03-01
69645 23563 1997-03-25 1 10.77 1997-03-01
69646 23563 1997-10-04 2 47.98 1997-10-01
69647 23564 1997-03-25 1 11.77 1997-03-01
69648 23564 1997-05-21 1 11.77 1997-05-01
69649 23564 1997-11-30 3 46.47 1997-11-01
69650 23565 1997-03-25 1 11.77 1997-03-01
69651 23566 1997-03-25 2 36.00 1997-03-01
69652 23567 1997-03-25 1 20.97 1997-03-01
69653 23568 1997-03-25 1 22.97 1997-03-01
69654 23568 1997-04-05 4 83.74 1997-04-01
69655 23568 1997-04-22 1 14.99 1997-04-01
69656 23569 1997-03-25 2 25.74 1997-03-01
69657 23570 1997-03-25 3 51.12 1997-03-01
69658 23570 1997-03-26 2 42.96 1997-03-01

69659 rows × 5 columns

1.进行用户消费趋势分析(按月)

  • 每月的消费总金额
  • 每月的消费次数
  • 每月的产品购买量
  • 每月的消费人数
In [5]:
grouped_month=df.groupby(by='month')
order_month_amount=grouped_month.TOTAL.sum()
order_month_amount.head()
Out[5]:
month
1997-01-01    299060.17
1997-02-01    379590.03
1997-03-01    393155.27
1997-04-01    142824.49
1997-05-01    107933.30
Name: TOTAL, dtype: float64
In [6]:
#可视化显示在页面
%matplotlib inline
#更改设计风格
plt.style.use('ggplot')
order_month_amount.plot()
Out[6]:
<matplotlib.axes._subplots.AxesSubplot at 0x21c6a8a8208>
七周成为数据分析师(案例python脚本)

由上图可知,消费金额在前三个月达到最高峰,后续消费较为稳定,有轻微下降趋势

In [7]:
grouped_month.ID.count().plot()
Out[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x21c6a449860>
七周成为数据分析师(案例python脚本)

由上图可知,消费次数在前三个月达到最高峰,后续消费次数较为稳定,有轻微下降趋势

In [8]:
grouped_month.NUM.sum().plot()
Out[8]:
<matplotlib.axes._subplots.AxesSubplot at 0x21c6a4ec5c0>
七周成为数据分析师(案例python脚本)

由上图可知,每月购买产品量在前三个月达到最高峰,后续购买量较为稳定,有轻微下降趋势

In [9]:
grouped_month.ID.apply(lambda x :len(set(x))).plot()
Out[9]:
<matplotlib.axes._subplots.AxesSubplot at 0x21c6a906438>
七周成为数据分析师(案例python脚本)

由上图可知,每月消费人数在前三个月达到最高峰,每月消费人数较为稳定,有轻微下降趋势

2.用户个体消费分析

  • 用户消费金额、消费次数的描述统计
  • 用户消费金额和消费次数的散点图
  • 用户消费金额的分布图
  • 用户消费次数的分布图
  • 用户累计消费金额占比
In [10]:
grouped_user=df.groupby('ID')
In [11]:
grouped_user.sum().describe()
Out[11]:
  NUM TOTAL
count 23570.000000 23570.000000
mean 7.122656 106.080426
std 16.983531 240.925195
min 1.000000 0.000000
25% 1.000000 19.970000
50% 3.000000 43.395000
75% 7.000000 106.475000
max 1033.000000 13990.930000
  • 用户平均购买了7张CD,但中位值只有3,说明小部分用户购买大多数的CD
  • 用户平均消费106元,中位值有43,判断同上,有极值干扰
In [12]:
grouped_user.sum().query('TOTAL<4000').plot.scatter(x='TOTAL',y='NUM')
Out[12]:
<matplotlib.axes._subplots.AxesSubplot at 0x21c6abad7b8>
七周成为数据分析师(案例python脚本)

用户消费总金额集中在1500元以下,购买总CD数量集中在100张以下

In [13]:
grouped_user.sum().query('NUM<100').TOTAL.plot.hist(bins=20)
Out[13]:
<matplotlib.axes._subplots.AxesSubplot at 0x21c6abcc128>
七周成为数据分析师(案例python脚本)

大多数用户消费金额在250元以下

In [14]:
grouped_user.sum().query('NUM<100').NUM.plot.hist(bins=20)
Out[14]:
<matplotlib.axes._subplots.AxesSubplot at 0x21c6bbd75c0>
七周成为数据分析师(案例python脚本)

大多数用户购买CD数量在20张以下

In [15]:
user_cumsum=grouped_user.sum().sort_values('TOTAL').apply(lambda x :x.cumsum()/x.sum())
user_cumsum.reset_index().TOTAL.plot()
Out[15]:
<matplotlib.axes._subplots.AxesSubplot at 0x21c6bd3d198>
七周成为数据分析师(案例python脚本)

按照用户消费金额进行升序排列,50%的用户仅贡献了15%的消费额度,消费总额排名前5000的贡献了60%的消费额度

3.用户消费行为

  • 用户第一次消费(首购)
  • 用户最后一次消费
  • 新老客消费比
    • 多数用户仅消费一次?
    • 每月新老客占比?
  • 用户分层
    • RFM
    • 新、老、活跃、回流、流失
  • 用户购买周期(按订单)
    • 用户消费周期描述
    • 用户消费周期分布
  • 用户生命周期(按照第一次&最后一次消费)
    • 用户生命周期描述
    • 用户生命周期分布
In [16]:
grouped_user.min().DATE.value_counts().plot()
Out[16]:
<matplotlib.axes._subplots.AxesSubplot at 0x21c6bd0c128>
七周成为数据分析师(案例python脚本)

用户第一次购买分布,集中在前三个月。并且在2-11至2-25有一次剧烈波动,具体再分析

In [17]:
grouped_user.max().DATE.value_counts().plot()
Out[17]:
<matplotlib.axes._subplots.AxesSubplot at 0x21c6bddde48>
七周成为数据分析师(案例python脚本)

用户最后一次购买比第一次购买分布广,大部分最后一次购买集中在前三个月,说明很多用户第一次购买后不再进行购买,随着时间,最后一次购买数在增加,消费呈流失上升的状况。

In [18]:
user_life=grouped_user.DATE.agg(['min','max']).reset_index()
user_life.head()
Out[18]:
  ID min max
0 1 1997-01-01 1997-01-01
1 2 1997-01-12 1997-01-12
2 3 1997-01-02 1998-05-28
3 4 1997-01-01 1997-12-12
4 5 1997-01-01 1998-01-03
In [19]:
(user_life['min']==user_life['max']).value_counts()
Out[19]:
True     12054
False    11516
dtype: int64

有一半用户, 就 消费了一次

In [20]:
user_life['regDate']=user_life['min'].values.astype('datetime64[M]')
user_life
Out[20]:
  ID min max regDate
0 1 1997-01-01 1997-01-01 1997-01-01
1 2 1997-01-12 1997-01-12 1997-01-01
2 3 1997-01-02 1998-05-28 1997-01-01
3 4 1997-01-01 1997-12-12 1997-01-01
4 5 1997-01-01 1998-01-03 1997-01-01
5 6 1997-01-01 1997-01-01 1997-01-01
6 7 1997-01-01 1998-03-22 1997-01-01
7 8 1997-01-01 1998-03-29 1997-01-01
8 9 1997-01-01 1998-06-08 1997-01-01
9 10 1997-01-21 1997-01-21 1997-01-01
10 11 1997-01-01 1998-02-20 1997-01-01
11 12 1997-01-01 1997-01-01 1997-01-01
12 13 1997-01-01 1997-01-01 1997-01-01
13 14 1997-01-01 1997-01-01 1997-01-01
14 15 1997-01-01 1997-01-01 1997-01-01
15 16 1997-01-01 1997-09-10 1997-01-01
16 17 1997-01-01 1997-01-01 1997-01-01
17 18 1997-01-04 1997-01-04 1997-01-01
18 19 1997-01-01 1997-06-10 1997-01-01
19 20 1997-01-01 1997-01-18 1997-01-01
20 21 1997-01-01 1997-01-13 1997-01-01
21 22 1997-01-01 1997-01-01 1997-01-01
22 23 1997-01-01 1997-01-01 1997-01-01
23 24 1997-01-01 1998-01-20 1997-01-01
24 25 1997-01-01 1998-06-08 1997-01-01
25 26 1997-01-01 1997-01-26 1997-01-01
26 27 1997-01-07 1997-01-12 1997-01-01
27 28 1997-01-01 1997-03-08 1997-01-01
28 29 1997-01-01 1998-04-26 1997-01-01
29 30 1997-01-01 1997-02-14 1997-01-01
... ... ... ... ...
23540 23541 1997-03-25 1997-04-02 1997-03-01
23541 23542 1997-03-25 1997-03-25 1997-03-01
23542 23543 1997-03-25 1997-03-25 1997-03-01
23543 23544 1997-03-25 1998-01-24 1997-03-01
23544 23545 1997-03-25 1997-03-25 1997-03-01
23545 23546 1997-03-25 1997-03-25 1997-03-01
23546 23547 1997-03-25 1997-04-07 1997-03-01
23547 23548 1997-03-25 1997-03-25 1997-03-01
23548 23549 1997-03-25 1997-03-25 1997-03-01
23549 23550 1997-03-25 1997-03-25 1997-03-01
23550 23551 1997-03-25 1997-09-11 1997-03-01
23551 23552 1997-03-25 1997-04-03 1997-03-01
23552 23553 1997-03-25 1997-03-28 1997-03-01
23553 23554 1997-03-25 1998-02-01 1997-03-01
23554 23555 1997-03-25 1998-06-10 1997-03-01
23555 23556 1997-03-25 1998-06-07 1997-03-01
23556 23557 1997-03-25 1997-03-25 1997-03-01
23557 23558 1997-03-25 1998-02-25 1997-03-01
23558 23559 1997-03-25 1997-06-27 1997-03-01
23559 23560 1997-03-25 1997-03-25 1997-03-01
23560 23561 1997-03-25 1998-05-29 1997-03-01
23561 23562 1997-03-25 1997-03-25 1997-03-01
23562 23563 1997-03-25 1997-10-04 1997-03-01
23563 23564 1997-03-25 1997-11-30 1997-03-01
23564 23565 1997-03-25 1997-03-25 1997-03-01
23565 23566 1997-03-25 1997-03-25 1997-03-01
23566 23567 1997-03-25 1997-03-25 1997-03-01
23567 23568 1997-03-25 1997-04-22 1997-03-01
23568 23569 1997-03-25 1997-03-25 1997-03-01
23569 23570 1997-03-25 1997-03-26 1997-03-01

23570 rows × 4 columns

In [21]:
user_life_month=df.groupby(['month','ID']).count().reset_index()
In [22]:
user_life_month=user_life_month.merge(user_life,on='ID')
In [23]:
user_life_month
Out[23]:
  month ID DATE NUM TOTAL min max regDate
0 1997-01-01 1 1 1 1 1997-01-01 1997-01-01 1997-01-01
1 1997-01-01 2 2 2 2 1997-01-12 1997-01-12 1997-01-01
2 1997-01-01 3 1 1 1 1997-01-02 1998-05-28 1997-01-01
3 1997-03-01 3 1 1 1 1997-01-02 1998-05-28 1997-01-01
4 1997-04-01 3 1 1 1 1997-01-02 1998-05-28 1997-01-01
5 1997-11-01 3 2 2 2 1997-01-02 1998-05-28 1997-01-01
6 1998-05-01 3 1 1 1 1997-01-02 1998-05-28 1997-01-01
7 1997-01-01 4 2 2 2 1997-01-01 1997-12-12 1997-01-01
8 1997-08-01 4 1 1 1 1997-01-01 1997-12-12 1997-01-01
9 1997-12-01 4 1 1 1 1997-01-01 1997-12-12 1997-01-01
10 1997-01-01 5 2 2 2 1997-01-01 1998-01-03 1997-01-01
11 1997-02-01 5 1 1 1 1997-01-01 1998-01-03 1997-01-01
12 1997-04-01 5 1 1 1 1997-01-01 1998-01-03 1997-01-01
13 1997-05-01 5 1 1 1 1997-01-01 1998-01-03 1997-01-01
14 1997-06-01 5 1 1 1 1997-01-01 1998-01-03 1997-01-01
15 1997-07-01 5 1 1 1 1997-01-01 1998-01-03 1997-01-01
16 1997-09-01 5 1 1 1 1997-01-01 1998-01-03 1997-01-01
17 1997-12-01 5 2 2 2 1997-01-01 1998-01-03 1997-01-01
18 1998-01-01 5 1 1 1 1997-01-01 1998-01-03 1997-01-01
19 1997-01-01 6 1 1 1 1997-01-01 1997-01-01 1997-01-01
20 1997-01-01 7 1 1 1 1997-01-01 1998-03-22 1997-01-01
21 1997-10-01 7 1 1 1 1997-01-01 1998-03-22 1997-01-01
22 1998-03-01 7 1 1 1 1997-01-01 1998-03-22 1997-01-01
23 1997-01-01 8 1 1 1 1997-01-01 1998-03-29 1997-01-01
24 1997-02-01 8 1 1 1 1997-01-01 1998-03-29 1997-01-01
25 1997-06-01 8 1 1 1 1997-01-01 1998-03-29 1997-01-01
26 1997-07-01 8 1 1 1 1997-01-01 1998-03-29 1997-01-01
27 1997-11-01 8 2 2 2 1997-01-01 1998-03-29 1997-01-01
28 1997-12-01 8 1 1 1 1997-01-01 1998-03-29 1997-01-01
29 1998-03-01 8 1 1 1 1997-01-01 1998-03-29 1997-01-01
... ... ... ... ... ... ... ... ...
55349 1997-06-01 23556 1 1 1 1997-03-25 1998-06-07 1997-03-01
55350 1997-07-01 23556 2 2 2 1997-03-25 1998-06-07 1997-03-01
55351 1997-09-01 23556 1 1 1 1997-03-25 1998-06-07 1997-03-01
55352 1998-01-01 23556 1 1 1 1997-03-25 1998-06-07 1997-03-01
55353 1998-06-01 23556 1 1 1 1997-03-25 1998-06-07 1997-03-01
55354 1997-03-01 23557 1 1 1 1997-03-25 1997-03-25 1997-03-01
55355 1997-03-01 23558 1 1 1 1997-03-25 1998-02-25 1997-03-01
55356 1997-05-01 23558 1 1 1 1997-03-25 1998-02-25 1997-03-01
55357 1997-06-01 23558 1 1 1 1997-03-25 1998-02-25 1997-03-01
55358 1998-02-01 23558 1 1 1 1997-03-25 1998-02-25 1997-03-01
55359 1997-03-01 23559 1 1 1 1997-03-25 1997-06-27 1997-03-01
55360 1997-05-01 23559 1 1 1 1997-03-25 1997-06-27 1997-03-01
55361 1997-06-01 23559 1 1 1 1997-03-25 1997-06-27 1997-03-01
55362 1997-03-01 23560 1 1 1 1997-03-25 1997-03-25 1997-03-01
55363 1997-03-01 23561 1 1 1 1997-03-25 1998-05-29 1997-03-01
55364 1998-01-01 23561 1 1 1 1997-03-25 1998-05-29 1997-03-01
55365 1998-05-01 23561 1 1 1 1997-03-25 1998-05-29 1997-03-01
55366 1997-03-01 23562 1 1 1 1997-03-25 1997-03-25 1997-03-01
55367 1997-03-01 23563 1 1 1 1997-03-25 1997-10-04 1997-03-01
55368 1997-10-01 23563 1 1 1 1997-03-25 1997-10-04 1997-03-01
55369 1997-03-01 23564 1 1 1 1997-03-25 1997-11-30 1997-03-01
55370 1997-05-01 23564 1 1 1 1997-03-25 1997-11-30 1997-03-01
55371 1997-11-01 23564 1 1 1 1997-03-25 1997-11-30 1997-03-01
55372 1997-03-01 23565 1 1 1 1997-03-25 1997-03-25 1997-03-01
55373 1997-03-01 23566 1 1 1 1997-03-25 1997-03-25 1997-03-01
55374 1997-03-01 23567 1 1 1 1997-03-25 1997-03-25 1997-03-01
55375 1997-03-01 23568 1 1 1 1997-03-25 1997-04-22 1997-03-01
55376 1997-04-01 23568 2 2 2 1997-03-25 1997-04-22 1997-03-01
55377 1997-03-01 23569 1 1 1 1997-03-25 1997-03-25 1997-03-01
55378 1997-03-01 23570 2 2 2 1997-03-25 1997-03-26 1997-03-01

55379 rows × 8 columns

In [24]:
user_life_month['isNew']=user_life_month.apply(lambda x:True if x.month==x.regDate else np.NaN,axis=1)
In [25]:
user_life_month_count=user_life_month.groupby('month').count().reset_index()
In [26]:
user_life_month_count
Out[26]:
  month ID DATE NUM TOTAL min max regDate isNew
0 1997-01-01 7846 7846 7846 7846 7846 7846 7846 7846
1 1997-02-01 9633 9633 9633 9633 9633 9633 9633 8476
2 1997-03-01 9524 9524 9524 9524 9524 9524 9524 7248
3 1997-04-01 2822 2822 2822 2822 2822 2822 2822 0
4 1997-05-01 2214 2214 2214 2214 2214 2214 2214 0
5 1997-06-01 2339 2339 2339 2339 2339 2339 2339 0
6 1997-07-01 2180 2180 2180 2180 2180 2180 2180 0
7 1997-08-01 1772 1772 1772 1772 1772 1772 1772 0
8 1997-09-01 1739 1739 1739 1739 1739 1739 1739 0
9 1997-10-01 1839 1839 1839 1839 1839 1839 1839 0
10 1997-11-01 2028 2028 2028 2028 2028 2028 2028 0
11 1997-12-01 1864 1864 1864 1864 1864 1864 1864 0
12 1998-01-01 1537 1537 1537 1537 1537 1537 1537 0
13 1998-02-01 1551 1551 1551 1551 1551 1551 1551 0
14 1998-03-01 2060 2060 2060 2060 2060 2060 2060 0
15 1998-04-01 1437 1437 1437 1437 1437 1437 1437 0
16 1998-05-01 1488 1488 1488 1488 1488 1488 1488 0
17 1998-06-01 1506 1506 1506 1506 1506 1506 1506 0
In [27]:
user_life_month_count['isOld']=pd.DataFrame(user_life_month_count['ID']-user_life_month_count['isNew'])
user_life_month_count.drop(columns=['DATE','NUM','TOTAL','min','max','regDate'],inplace=True)
In [28]:
user_life_month_count.rename(columns={'ID':'all'},inplace=True)
In [29]:
user_life_month_count
Out[29]:
  month all isNew isOld
0 1997-01-01 7846 7846 0
1 1997-02-01 9633 8476 1157
2 1997-03-01 9524 7248 2276
3 1997-04-01 2822 0 2822
4 1997-05-01 2214 0 2214
5 1997-06-01 2339 0 2339
6 1997-07-01 2180 0 2180
7 1997-08-01 1772 0 1772
8 1997-09-01 1739 0 1739
9 1997-10-01 1839 0 1839
10 1997-11-01 2028 0 2028
11 1997-12-01 1864 0 1864
12 1998-01-01 1537 0 1537
13 1998-02-01 1551 0 1551
14 1998-03-01 2060 0 2060
15 1998-04-01 1437 0 1437
16 1998-05-01 1488 0 1488
17 1998-06-01 1506 0 1506

新用户购买集中在前三个月,三个月后无新用户购买,均为老用户购买

In [30]:
rfm=df.pivot_table(index='ID',values=['NUM','TOTAL','DATE'],aggfunc={'NUM':'sum','TOTAL':'sum','DATE':'max'})
rfm
Out[30]:
  DATE NUM TOTAL
ID      
1 1997-01-01 1 11.77
2 1997-01-12 6 89.00
3 1998-05-28 16 156.46
4 1997-12-12 7 100.50
5 1998-01-03 29 385.61
6 1997-01-01 1 20.99
7 1998-03-22 18 264.67
8 1998-03-29 18 197.66
9 1998-06-08 6 95.85
10 1997-01-21 3 39.31
11 1998-02-20 4 58.55
12 1997-01-01 4 57.06
13 1997-01-01 4 72.94
14 1997-01-01 2 29.92
15 1997-01-01 4 52.87
16 1997-09-10 8 79.87
17 1997-01-01 5 73.22
18 1997-01-04 1 14.96
19 1997-06-10 11 175.12
20 1997-01-18 46 653.01
21 1997-01-13 4 75.11
22 1997-01-01 1 14.37
23 1997-01-01 2 24.74
24 1998-01-20 4 57.77
25 1998-06-08 12 137.53
26 1997-01-26 6 102.69
27 1997-01-12 10 135.87
28 1997-03-08 7 90.99
29 1998-04-26 28 435.81
30 1997-02-14 2 28.34
... ... ... ...
23541 1997-04-02 2 57.34
23542 1997-03-25 5 77.43
23543 1997-03-25 2 50.76
23544 1998-01-24 12 134.63
23545 1997-03-25 1 24.99
23546 1997-03-25 1 13.97
23547 1997-04-07 2 23.54
23548 1997-03-25 2 23.54
23549 1997-03-25 2 27.13
23550 1997-03-25 2 25.28
23551 1997-09-11 12 264.63
23552 1997-04-03 4 49.38
23553 1997-03-28 8 98.58
23554 1998-02-01 3 36.37
23555 1998-06-10 14 189.18
23556 1998-06-07 15 203.00
23557 1997-03-25 1 14.37
23558 1998-02-25 11 145.60
23559 1997-06-27 8 111.65
23560 1997-03-25 1 18.36
23561 1998-05-29 6 83.46
23562 1997-03-25 2 29.33
23563 1997-10-04 3 58.75
23564 1997-11-30 5 70.01
23565 1997-03-25 1 11.77
23566 1997-03-25 2 36.00
23567 1997-03-25 1 20.97
23568 1997-04-22 6 121.70
23569 1997-03-25 2 25.74
23570 1997-03-26 5 94.08

23570 rows × 3 columns

In [31]:
rfm['R']=-(rfm['DATE']-rfm['DATE'].max())/np.timedelta64(1,'D')
rfm
Out[31]:
  DATE NUM TOTAL R
ID        
1 1997-01-01 1 11.77 545.0
2 1997-01-12 6 89.00 534.0
3 1998-05-28 16 156.46 33.0
4 1997-12-12 7 100.50 200.0
5 1998-01-03 29 385.61 178.0
6 1997-01-01 1 20.99 545.0
7 1998-03-22 18 264.67 100.0
8 1998-03-29 18 197.66 93.0
9 1998-06-08 6 95.85 22.0
10 1997-01-21 3 39.31 525.0
11 1998-02-20 4 58.55 130.0
12 1997-01-01 4 57.06 545.0
13 1997-01-01 4 72.94 545.0
14 1997-01-01 2 29.92 545.0
15 1997-01-01 4 52.87 545.0
16 1997-09-10 8 79.87 293.0
17 1997-01-01 5 73.22 545.0
18 1997-01-04 1 14.96 542.0
19 1997-06-10 11 175.12 385.0
20 1997-01-18 46 653.01 528.0
21 1997-01-13 4 75.11 533.0
22 1997-01-01 1 14.37 545.0
23 1997-01-01 2 24.74 545.0
24 1998-01-20 4 57.77 161.0
25 1998-06-08 12 137.53 22.0
26 1997-01-26 6 102.69 520.0
27 1997-01-12 10 135.87 534.0
28 1997-03-08 7 90.99 479.0
29 1998-04-26 28 435.81 65.0
30 1997-02-14 2 28.34 501.0
... ... ... ... ...
23541 1997-04-02 2 57.34 454.0
23542 1997-03-25 5 77.43 462.0
23543 1997-03-25 2 50.76 462.0
23544 1998-01-24 12 134.63 157.0
23545 1997-03-25 1 24.99 462.0
23546 1997-03-25 1 13.97 462.0
23547 1997-04-07 2 23.54 449.0
23548 1997-03-25 2 23.54 462.0
23549 1997-03-25 2 27.13 462.0
23550 1997-03-25 2 25.28 462.0
23551 1997-09-11 12 264.63 292.0
23552 1997-04-03 4 49.38 453.0
23553 1997-03-28 8 98.58 459.0
23554 1998-02-01 3 36.37 149.0
23555 1998-06-10 14 189.18 20.0
23556 1998-06-07 15 203.00 23.0
23557 1997-03-25 1 14.37 462.0
23558 1998-02-25 11 145.60 125.0
23559 1997-06-27 8 111.65 368.0
23560 1997-03-25 1 18.36 462.0
23561 1998-05-29 6 83.46 32.0
23562 1997-03-25 2 29.33 462.0
23563 1997-10-04 3 58.75 269.0
23564 1997-11-30 5 70.01 212.0
23565 1997-03-25 1 11.77 462.0
23566 1997-03-25 2 36.00 462.0
23567 1997-03-25 1 20.97 462.0
23568 1997-04-22 6 121.70 434.0
23569 1997-03-25 2 25.74 462.0
23570 1997-03-26 5 94.08 461.0

23570 rows × 4 columns

In [32]:
rfm.rename(columns={'NUM':'F','TOTAL':'M'},inplace=True)
In [33]:
rfm.drop(columns=['DATE'],inplace=True)
rfm
Out[33]:
  F M R
ID      
1 1 11.77 545.0
2 6 89.00 534.0
3 16 156.46 33.0
4 7 100.50 200.0
5 29 385.61 178.0
6 1 20.99 545.0
7 18 264.67 100.0
8 18 197.66 93.0
9 6 95.85 22.0
10 3 39.31 525.0
11 4 58.55 130.0
12 4 57.06 545.0
13 4 72.94 545.0
14 2 29.92 545.0
15 4 52.87 545.0
16 8 79.87 293.0
17 5 73.22 545.0
18 1 14.96 542.0
19 11 175.12 385.0
20 46 653.01 528.0
21 4 75.11 533.0
22 1 14.37 545.0
23 2 24.74 545.0
24 4 57.77 161.0
25 12 137.53 22.0
26 6 102.69 520.0
27 10 135.87 534.0
28 7 90.99 479.0
29 28 435.81 65.0
30 2 28.34 501.0
... ... ... ...
23541 2 57.34 454.0
23542 5 77.43 462.0
23543 2 50.76 462.0
23544 12 134.63 157.0
23545 1 24.99 462.0
23546 1 13.97 462.0
23547 2 23.54 449.0
23548 2 23.54 462.0
23549 2 27.13 462.0
23550 2 25.28 462.0
23551 12 264.63 292.0
23552 4 49.38 453.0
23553 8 98.58 459.0
23554 3 36.37 149.0
23555 14 189.18 20.0
23556 15 203.00 23.0
23557 1 14.37 462.0
23558 11 145.60 125.0
23559 8 111.65 368.0
23560 1 18.36 462.0
23561 6 83.46 32.0
23562 2 29.33 462.0
23563 3 58.75 269.0
23564 5 70.01 212.0
23565 1 11.77 462.0
23566 2 36.00 462.0
23567 1 20.97 462.0
23568 6 121.70 434.0
23569 2 25.74 462.0
23570 5 94.08 461.0

23570 rows × 3 columns

In [34]:
def rfm_fumc(x):
    level = x.apply(lambda x:'1' if x>=1 else '0')
    label=level.R+level.F+level.M
    d={
        '111':'重要价值客户',
        '011':'重要保持客户',
        '101':'重要发展客户',
        '001':'重要挽留客户',
        '110':'一般价值客户',
        '010':'一般保持客户',
        '100':'一般发展客户',
        '000':'一般挽留客户',
    }
    result=d[label]
    return result
In [35]:
rfm['label']=rfm[['R','F','M']].apply(lambda x: x-x.mean()).apply(rfm_fumc,axis=1)
In [36]:
rfm.loc[rfm.label=='重要价值客户','color']='g'
rfm.loc[~(rfm.label=='重要价值客户'),'color']='r'
rfm.plot.scatter('F','R',c=rfm.color)
Out[36]:
<matplotlib.axes._subplots.AxesSubplot at 0x21c6c26a550>
七周成为数据分析师(案例python脚本)
In [37]:
rfm.groupby('label').sum()
Out[37]:
  F M R
label      
一般价值客户 182 1767.11 8512.0
一般保持客户 492 5100.77 7782.0
一般发展客户 29915 445233.28 6983699.0
一般挽留客户 15428 215075.77 621894.0
重要价值客户 9849 147180.09 286676.0
重要保持客户 105509 1555586.51 476502.0
重要发展客户 2322 49905.80 174340.0
重要挽留客户 4184 80466.30 96009.0

从RFM分层可知,大部分用户为重要保持客户,但是这是由于极值的影响,所有RFM的划分标准应该以业务为准

  • 尽量用小部分的用户覆盖大部分的额度
  • 不要为了数据好看划分等级
In [38]:
pivoted_counts=df.pivot_table(index='ID',columns='month',values='DATE',aggfunc='count').fillna(0)
pivoted_counts
Out[38]:
month 1997-01-01 00:00:00 1997-02-01 00:00:00 1997-03-01 00:00:00 1997-04-01 00:00:00 1997-05-01 00:00:00 1997-06-01 00:00:00 1997-07-01 00:00:00 1997-08-01 00:00:00 1997-09-01 00:00:00 1997-10-01 00:00:00 1997-11-01 00:00:00 1997-12-01 00:00:00 1998-01-01 00:00:00 1998-02-01 00:00:00 1998-03-01 00:00:00 1998-04-01 00:00:00 1998-05-01 00:00:00 1998-06-01 00:00:00
ID                                    
1 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 1.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
4 2.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0
5 2.0 1.0 0.0 1.0 1.0 1.0 1.0 0.0 1.0 0.0 0.0 2.0 1.0 0.0 0.0 0.0 0.0 0.0
6 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
7 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
8 1.0 1.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 2.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0
9 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
10 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
11 2.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0
12 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
13 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
14 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
15 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
16 1.0 0.0 0.0 0.0 0.0 0.0 2.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
17 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
18 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
19 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
20 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
21 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
22 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
24 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0
25 1.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 2.0
26 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
27 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
28 2.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
29 1.0 1.0 1.0 2.0 2.0 0.0 2.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
30 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
23541 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23542 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23543 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23544 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0
23545 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23546 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23547 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23548 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23549 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23550 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23551 0.0 0.0 1.0 0.0 0.0 1.0 0.0 2.0 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23552 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23553 0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23554 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0
23555 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 1.0
23556 0.0 0.0 1.0 0.0 0.0 1.0 2.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0
23557 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23558 0.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0
23559 0.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23560 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23561 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0
23562 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23563 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23564 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23565 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23566 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23567 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23568 0.0 0.0 1.0 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23569 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23570 0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

23570 rows × 18 columns

In [39]:
df_purchase=pivoted_counts.applymap(lambda x:1 if x>0 else 0)
df_purchase
Out[39]:
month 1997-01-01 00:00:00 1997-02-01 00:00:00 1997-03-01 00:00:00 1997-04-01 00:00:00 1997-05-01 00:00:00 1997-06-01 00:00:00 1997-07-01 00:00:00 1997-08-01 00:00:00 1997-09-01 00:00:00 1997-10-01 00:00:00 1997-11-01 00:00:00 1997-12-01 00:00:00 1998-01-01 00:00:00 1998-02-01 00:00:00 1998-03-01 00:00:00 1998-04-01 00:00:00 1998-05-01 00:00:00 1998-06-01 00:00:00
ID                                    
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 1 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0
4 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0
5 1 1 0 1 1 1 1 0 1 0 0 1 1 0 0 0 0 0
6 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0
8 1 1 0 0 0 1 1 0 0 0 1 1 0 0 1 0 0 0
9 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1
10 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
12 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
13 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
14 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
16 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0
17 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
18 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
19 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
20 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
21 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
22 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
24 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
25 1 0 0 0 0 0 1 1 0 1 0 0 0 0 0 1 1 1
26 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
27 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
28 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
29 1 1 1 1 1 0 1 0 1 0 1 0 0 0 0 1 0 0
30 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
23541 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23542 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23543 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23544 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0
23545 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23546 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23547 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23548 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23549 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23550 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23551 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0
23552 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23553 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23554 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
23555 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 1 1
23556 0 0 1 0 0 1 1 0 1 0 0 0 1 0 0 0 0 1
23557 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23558 0 0 1 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0
23559 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0
23560 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23561 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0
23562 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23563 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
23564 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0
23565 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23566 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23567 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23568 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23569 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23570 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

23570 rows × 18 columns

In [40]:
def active_status(data):
    status=[]
    for i in range(18):
        if data[i]==0:
            if len(status)>0:
                if status[i-1]=='unreg':
                    status.append('unreg')
                else:
                    status.append('unactive')
            else:
                status.append('unreg')
        else:
            if len(status)==0:
                status.append('new')
            else:
                if status[i-1]=='unactive':
                    status.append('return')                             
                elif status[i-1]=='unreg':
                     status.append('new')
                else:
                     status.append('active')
    return status    
    
In [41]:
puchaser_status=df_purchase.apply(active_status,axis=1)
In [42]:
puchaser_status
Out[42]:
month 1997-01-01 00:00:00 1997-02-01 00:00:00 1997-03-01 00:00:00 1997-04-01 00:00:00 1997-05-01 00:00:00 1997-06-01 00:00:00 1997-07-01 00:00:00 1997-08-01 00:00:00 1997-09-01 00:00:00 1997-10-01 00:00:00 1997-11-01 00:00:00 1997-12-01 00:00:00 1998-01-01 00:00:00 1998-02-01 00:00:00 1998-03-01 00:00:00 1998-04-01 00:00:00 1998-05-01 00:00:00 1998-06-01 00:00:00
ID                                    
1 new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
2 new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
3 new unactive return active unactive unactive unactive unactive unactive unactive return unactive unactive unactive unactive unactive return unactive
4 new unactive unactive unactive unactive unactive unactive return unactive unactive unactive return unactive unactive unactive unactive unactive unactive
5 new active unactive return active active active unactive return unactive unactive return active unactive unactive unactive unactive unactive
6 new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
7 new unactive unactive unactive unactive unactive unactive unactive unactive return unactive unactive unactive unactive return unactive unactive unactive
8 new active unactive unactive unactive return active unactive unactive unactive return active unactive unactive return unactive unactive unactive
9 new unactive unactive unactive return unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive return
10 new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
11 new unactive return unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive return unactive unactive unactive unactive
12 new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
13 new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
14 new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
15 new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
16 new unactive unactive unactive unactive unactive return unactive return unactive unactive unactive unactive unactive unactive unactive unactive unactive
17 new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
18 new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
19 new unactive unactive unactive unactive return unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
20 new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
21 new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
22 new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23 new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
24 new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive return unactive unactive unactive unactive unactive
25 new unactive unactive unactive unactive unactive return active unactive return unactive unactive unactive unactive unactive return active active
26 new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
27 new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
28 new unactive return unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
29 new active active active active unactive return unactive return unactive return unactive unactive unactive unactive return unactive unactive
30 new active unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
23541 unreg unreg new active unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23542 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23543 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23544 unreg unreg new unactive return unactive unactive unactive unactive unactive unactive unactive return unactive unactive unactive unactive unactive
23545 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23546 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23547 unreg unreg new active unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23548 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23549 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23550 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23551 unreg unreg new unactive unactive return unactive return active unactive unactive unactive unactive unactive unactive unactive unactive unactive
23552 unreg unreg new active unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23553 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23554 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive return unactive unactive unactive unactive
23555 unreg unreg new unactive unactive unactive unactive unactive unactive return unactive return unactive unactive unactive unactive return active
23556 unreg unreg new unactive unactive return active unactive return unactive unactive unactive return unactive unactive unactive unactive return
23557 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23558 unreg unreg new unactive return active unactive unactive unactive unactive unactive unactive unactive return unactive unactive unactive unactive
23559 unreg unreg new unactive return active unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23560 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23561 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive return unactive unactive unactive return unactive
23562 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23563 unreg unreg new unactive unactive unactive unactive unactive unactive return unactive unactive unactive unactive unactive unactive unactive unactive
23564 unreg unreg new unactive return unactive unactive unactive unactive unactive return unactive unactive unactive unactive unactive unactive unactive
23565 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23566 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23567 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23568 unreg unreg new active unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23569 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive
23570 unreg unreg new unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive unactive

23570 rows × 18 columns

In [43]:
purchase_static_ct=puchaser_status.replace('unreg',np.nan).apply(lambda x:pd.value_counts(x))
purchase_static_ct
Out[43]:
month 1997-01-01 00:00:00 1997-02-01 00:00:00 1997-03-01 00:00:00 1997-04-01 00:00:00 1997-05-01 00:00:00 1997-06-01 00:00:00 1997-07-01 00:00:00 1997-08-01 00:00:00 1997-09-01 00:00:00 1997-10-01 00:00:00 1997-11-01 00:00:00 1997-12-01 00:00:00 1998-01-01 00:00:00 1998-02-01 00:00:00 1998-03-01 00:00:00 1998-04-01 00:00:00 1998-05-01 00:00:00 1998-06-01 00:00:00
active NaN 1157.0 1681 1773.0 852.0 747.0 746.0 604.0 528.0 532.0 624.0 632.0 512.0 472.0 571.0 518.0 459.0 446.0
new 7846.0 8476.0 7248 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
return NaN NaN 595 1049.0 1362.0 1592.0 1434.0 1168.0 1211.0 1307.0 1404.0 1232.0 1025.0 1079.0 1489.0 919.0 1029.0 1060.0
unactive NaN 6689.0 14046 20748.0 21356.0 21231.0 21390.0 21798.0 21831.0 21731.0 21542.0 21706.0 22033.0 22019.0 21510.0 22133.0 22082.0 22064.0
In [44]:
purchase_static_ct.fillna(0).T.apply(lambda x:x/x.sum(),axis=1)
Out[44]:
  active new return unactive
month        
1997-01-01 0.000000 1.000000 0.000000 0.000000
1997-02-01 0.070886 0.519299 0.000000 0.409815
1997-03-01 0.071319 0.307510 0.025244 0.595927
1997-04-01 0.075223 0.000000 0.044506 0.880272
1997-05-01 0.036148 0.000000 0.057785 0.906067
1997-06-01 0.031693 0.000000 0.067543 0.900764
1997-07-01 0.031650 0.000000 0.060840 0.907510
1997-08-01 0.025626 0.000000 0.049555 0.924820
1997-09-01 0.022401 0.000000 0.051379 0.926220
1997-10-01 0.022571 0.000000 0.055452 0.921977
1997-11-01 0.026474 0.000000 0.059567 0.913958
1997-12-01 0.026814 0.000000 0.052270 0.920916
1998-01-01 0.021723 0.000000 0.043487 0.934790
1998-02-01 0.020025 0.000000 0.045779 0.934196
1998-03-01 0.024226 0.000000 0.063174 0.912601
1998-04-01 0.021977 0.000000 0.038990 0.939033
1998-05-01 0.019474 0.000000 0.043657 0.936869
1998-06-01 0.018922 0.000000 0.044972 0.936105

由上表可知,每月的用户消费状态变化

  • 活跃用户,持续消费的用户,对应的是消费运营的质量
  • 回流用户,之前不消费本月才消费,对应的是换回运营
  • 不活跃用户,对应的是流失
In [45]:
diff=grouped_user.apply(lambda x :x.DATE-x.DATE.shift())
In [46]:
diff.reset_index().DATE.describe()
Out[46]:
count                      46089
mean     68 days 23:22:13.567662
std      91 days 00:47:33.924168
min              0 days 00:00:00
25%             10 days 00:00:00
50%             31 days 00:00:00
75%             89 days 00:00:00
max            533 days 00:00:00
Name: DATE, dtype: object
In [47]:
(diff/np.timedelta64(1,'D')).hist(bins=20)
Out[47]:
<matplotlib.axes._subplots.AxesSubplot at 0x21c6e42ad30>
七周成为数据分析师(案例python脚本)
  • 订单周期呈指数分布
  • 用户的平均购买周期是68天
  • 绝大部分用户的购买周期都低于100天
In [48]:
(user_life['max']-user_life['min']).describe()
Out[48]:
count                       23570
mean     134 days 20:55:36.987696
std      180 days 13:46:43.039788
min               0 days 00:00:00
25%               0 days 00:00:00
50%               0 days 00:00:00
75%             294 days 00:00:00
max             544 days 00:00:00
dtype: object
In [49]:
((user_life['max']-user_life['min'])/np.timedelta64(1,'D')).hist(bins=40)
Out[49]:
<matplotlib.axes._subplots.AxesSubplot at 0x21c6c2622e8>
七周成为数据分析师(案例python脚本)
  • 用户的生命周期受只购买一次的用户影响比较厉害(可以排除)
  • 用户均消费134天, 中位数仅0天
In [50]:
u_1=((user_life['max']-user_life['min'])/np.timedelta64(1,'D')).reset_index()[0]
u_1[u_1>0].hist(bins=40)
Out[50]:
<matplotlib.axes._subplots.AxesSubplot at 0x21c6ef7b518>
七周成为数据分析师(案例python脚本)

4.复购率与回购率分析

  • 复购率
    • 自然月内,购买多次的用户占总用户比
  • 回购率
    • 曾经购买过的用户在某一时期内的再次购买的占比
In [51]:
pivoted_counts
Out[51]:
month 1997-01-01 00:00:00 1997-02-01 00:00:00 1997-03-01 00:00:00 1997-04-01 00:00:00 1997-05-01 00:00:00 1997-06-01 00:00:00 1997-07-01 00:00:00 1997-08-01 00:00:00 1997-09-01 00:00:00 1997-10-01 00:00:00 1997-11-01 00:00:00 1997-12-01 00:00:00 1998-01-01 00:00:00 1998-02-01 00:00:00 1998-03-01 00:00:00 1998-04-01 00:00:00 1998-05-01 00:00:00 1998-06-01 00:00:00
ID                                    
1 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 1.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
4 2.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0
5 2.0 1.0 0.0 1.0 1.0 1.0 1.0 0.0 1.0 0.0 0.0 2.0 1.0 0.0 0.0 0.0 0.0 0.0
6 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
7 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
8 1.0 1.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 2.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0
9 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
10 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
11 2.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0
12 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
13 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
14 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
15 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
16 1.0 0.0 0.0 0.0 0.0 0.0 2.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
17 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
18 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
19 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
20 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
21 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
22 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
24 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0
25 1.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 2.0
26 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
27 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
28 2.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
29 1.0 1.0 1.0 2.0 2.0 0.0 2.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
30 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
23541 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23542 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23543 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23544 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0
23545 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23546 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23547 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23548 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23549 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23550 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23551 0.0 0.0 1.0 0.0 0.0 1.0 0.0 2.0 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23552 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23553 0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23554 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0
23555 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 1.0
23556 0.0 0.0 1.0 0.0 0.0 1.0 2.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0
23557 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23558 0.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0
23559 0.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23560 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23561 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0
23562 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23563 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23564 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23565 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23566 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23567 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23568 0.0 0.0 1.0 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23569 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23570 0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

23570 rows × 18 columns

In [52]:
(((pivoted_counts[pivoted_counts>1]).count())/((pivoted_counts[pivoted_counts>0]).count())).plot()
Out[52]:
<matplotlib.axes._subplots.AxesSubplot at 0x21c70a66f28>
七周成为数据分析师(案例python脚本)