QQ图初学经验总结
s = pd.DataFrame(np.random.randn(1000)+10,columns = ['value'])
# 创建随机函数
mean = s['value'].mean()
std = s['value'].std()
print('平均数:',mean)
print('标准差:',std)
st = s['value'].describe()
print('s的统计:\n',st)
x1,y1 = 0.25,st['25%']
x2,y2 = 0.75,st['75%']
print('%.3f位数是:%.5f,%.3f位数是:%.5f,'%(x1,y1,x2,y2))
s.sort_values(by = 'value',inplace = True)
# 排序
s_r = s.reset_index(drop = False)
# 重新设置index
s_r['p'] = (s_r['value'].index - 0.5)/len(s_r)
#计算出每个数据对应的百分位p{i} ,即p(i)=(i-0.5)/n
s_r['q'] = (s_r['value'] - mean)/std # 标准值,可能无用
print('-----------')
fig = plt.figure(figsize = (10,12))
# 大图
ax1 = fig.add_subplot(3,1,1)
ax1.scatter(s.index,s['value'])
# 散点图
ax2 = fig.add_subplot(3,1,2)
s.hist(bins = 30,ax = ax2)
s.plot(kind = 'kde',ax = ax2,secondary_y=True)
# 直方图 + 密度图
ax3 = fig.add_subplot(3,1,3)
ax3.plot(s_r['p'],s_r['value'],color = 'k')
ax3.plot([x1,x2],[y1,y2],'r',alpha = 0.5)
# QQ图 + 四分位之(1,3)图
输出结果:
平均数: 9.987506073796338
标准差: 0.9879192032907618
s的统计:
count 1000.000000
mean 9.987506
std 0.987919
min 6.804521
25% 9.317649
50% 9.974168
75% 10.677966
max 12.873057
Name: value, dtype: float64
0.250位数是:9.31765,0.750位数是:10.67797,
-----------
Out[44]:
[<matplotlib.lines.Line2D at 0x28381bd9780>]