机器学习之置信区间上界算法

零、算法原理

机器学习之置信区间上界算法

一、导入标准库

In [2]:
# Importing the libraries 导入库
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# 使图像能够调整
%matplotlib notebook 
#中文字体显示  
plt.rc('font', family='SimHei', size=8)

二、导入数据

In [8]:
dataset = pd.read_csv('Ads_CTR_Optimisation.csv') # 数据表示虚拟环境,模拟我将投放哪些广告
dataset
Out[8]:
  Ad 1 Ad 2 Ad 3 Ad 4 Ad 5 Ad 6 Ad 7 Ad 8 Ad 9 Ad 10
0 1 0 0 0 1 0 0 0 1 0
1 0 0 0 0 0 0 0 0 1 0
2 0 0 0 0 0 0 0 0 0 0
3 0 1 0 0 0 0 0 1 0 0
4 0 0 0 0 0 0 0 0 0 0
5 1 1 0 0 0 0 0 0 0 0
6 0 0 0 1 0 0 0 0 0 0
7 1 1 0 0 1 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0 0
9 0 0 1 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0
11 0 0 0 0 0 0 0 0 0 0
12 0 0 0 1 0 0 0 0 0 0
13 0 0 0 0 0 0 0 0 1 0
14 0 0 0 0 0 0 0 1 0 0
15 0 0 0 0 1 0 0 1 0 0
16 0 0 0 0 0 0 0 0 0 0
17 0 0 0 0 0 0 0 0 0 0
18 0 0 0 0 0 0 0 1 0 0
19 0 0 0 0 0 0 0 0 1 0
20 0 1 0 0 0 0 0 1 0 0
21 0 0 0 0 1 0 0 0 0 1
22 0 0 0 0 0 0 0 0 0 0
23 0 0 0 0 0 0 0 1 1 0
24 0 0 0 0 1 0 1 1 0 0
25 0 0 0 0 0 0 0 0 0 0
26 0 1 0 0 1 0 0 1 0 0
27 0 1 0 1 0 0 0 0 0 0
28 0 0 0 0 0 0 0 0 0 0
29 0 0 0 0 1 0 0 1 1 0
... ... ... ... ... ... ... ... ... ... ...
9970 0 0 0 0 0 0 0 0 0 0
9971 0 0 0 0 0 0 0 1 0 0
9972 0 0 0 0 0 0 0 0 0 0
9973 0 0 0 0 1 0 0 0 0 0
9974 0 0 0 0 0 0 0 1 1 0
9975 0 0 0 0 1 0 1 0 1 0
9976 0 0 0 0 1 0 0 1 0 0
9977 0 1 0 0 1 0 1 0 0 0
9978 0 0 0 0 1 0 0 0 0 0
9979 0 0 1 0 0 0 1 0 0 0
9980 1 1 0 1 0 0 0 0 0 0
9981 0 0 0 0 0 0 0 0 0 0
9982 0 1 0 0 0 0 0 0 0 0
9983 0 0 0 0 1 0 0 1 1 0
9984 0 0 0 0 1 0 0 0 0 0
9985 0 0 0 0 0 0 0 1 0 0
9986 0 0 0 0 1 0 0 0 0 0
9987 0 0 0 0 1 0 0 0 0 0
9988 1 0 0 0 1 0 0 0 0 0
9989 0 0 0 0 0 0 0 0 0 0
9990 0 0 0 1 0 0 0 0 0 0
9991 0 1 0 1 1 0 1 0 0 0
9992 0 0 0 1 0 0 1 0 0 0
9993 0 0 0 0 1 0 0 0 1 0
9994 0 0 1 0 0 0 0 0 1 0
9995 0 0 1 0 0 0 0 1 0 0
9996 0 0 0 0 0 0 0 0 0 0
9997 0 0 0 0 0 0 0 0 0 0
9998 1 0 0 0 0 0 0 1 0 0
9999 0 1 0 0 0 0 0 0 0 0

10000 rows × 10 columns

问题描述

三、每个用户随机抽选广告得到的点击数

In [27]:
import random
N = 10000  # 1000个用户
d = 10     # 10个广告
ads_selected = [] # 广告选择
total_reward = 0 
for n in range(0, N):      # 每个用户循环
    ad = random.randrange(d) # 随机选择广告
    ads_selected.append(ad)  # 将选择的广告加入list中
    reward = dataset.values[n, ad] # 取出数据集中n行ad列查看是否命中,命中值为1,未命中值为0(1即为奖励)
    total_reward = total_reward + reward  # 每轮奖励累计相加

print(total_reward)
# 画图
plt.hist(ads_selected)
plt.title(u'广告选择直方图')
plt.xlabel(u'广告')
plt.ylabel(u'每个广告的点击数')
plt.show()
1282
机器学习之置信区间上界算法

图中我们可以看到,由于是随机的,10000个人选择10种广告,每个人平均会在1000次左右

四、置信区间上界算法

In [55]:
import math
N = 10000  # 1000个用户
d = 10     # 10个广告
ads_selected = [] # 广告选择
numbers_of_selections = [0] * d # 多项选择
sums_of_rewards = [0] * d  # 奖励总和
total_reward = 0
for n in range(0, N): # 第n个用户
    ad = 0 # 广告初始化
    max_upper_bound = 0  # 最大上界初始化
    for i in range(0, d): # 第i个广告
        if (numbers_of_selections[i] > 0):
            average_reward = float(sums_of_rewards[i]) / float(numbers_of_selections[i]) # 平均奖励,这里如果python2记得用float
            delta_i = math.sqrt(3/2 * math.log(n + 1) / numbers_of_selections[i]) # 置信区间
            upper_bound = average_reward + delta_i # 置信区间上界
#             print(average_reward)
#             print(delta_i )
        else:
            upper_bound = 10000
        if upper_bound > max_upper_bound:
            max_upper_bound = upper_bound
            ad = i
#     print(ad)
    ads_selected.append(ad)
    numbers_of_selections[ad] = numbers_of_selections[ad] + 1
    reward = dataset.values[n, ad]
    sums_of_rewards[ad] = sums_of_rewards[ad] + reward
    total_reward = total_reward + reward
# print(ads_selected)
print(total_reward)
# print(numbers_of_selections)
# print(sums_of_rewards)
2358
In [56]:
# 画图
plt.hist(ads_selected)
plt.title(u'广告选择直方图')
plt.xlabel(u'广告')
plt.ylabel(u'每个广告的点击数')
plt.show()
机器学习之置信区间上界算法

从图中可以看到,广告5被点击数明显比较多。总奖励也提高了一倍,证明我们的算法很NB

五、项目地址