Python，sklearn多项式回归处理非线性问题

from sklearn.preprocessing import PolynomialFeatures as PF
from sklearn.linear_model import LinearRegression
import numpy as np
rnd = np.random.RandomState(42) #设置随机数种子
X = rnd.uniform(-3, 3, size=100)
y = np.sin(X) + rnd.normal(size=len(X)) / 3 #将X升维，准备好放入sklearn中
X = X.reshape(-1,1) # 要对X进行升维，sklearn不接受一维
X.shape

Out[26]: (100, 1)

#创建测试数据，均匀分布在训练集X的取值范围内的一千个点
line = np.linspace(-3, 3, 1000, endpoint=False).reshape(-1, 1)
#原始特征矩阵的拟合结果
LinearR = LinearRegression().fit(X, y)

#对训练数据的拟合
LinearR.score(X,y)

Out[28]: 0.5361526059318595

#对测试数据的拟合
LinearR.score(line,np.sin(line))

Out[29]: 0.6800102369793312

#多项式拟合，设定高次项
d=5 #进行高此项转换
poly = PF(degree=d)
X_ = poly.fit_transform(X)
line_ = PF(degree=d).fit_transform(line) #训练数据的拟合
LinearR_ = LinearRegression().fit(X_, y)
LinearR_.score(X_,y) #测试数据的拟合

Out[30]: 0.8561679370344799

# 测试数据模型的得分
LinearR_.score(line_,np.sin(line))

Out[31]: 0.9868904451787978

## 将这个过程可视化
import matplotlib.pyplot as plt
d=6 #和上面展示一致的建模流程
LinearR = LinearRegression().fit(X, y)
X_ = PF(degree=d).fit_transform(X)
LinearR_ = LinearRegression().fit(X_, y)
line = np.linspace(-3, 3, 1000, endpoint=False).reshape(-1, 1)
line_ = PF(degree=d).fit_transform(line)
#放置画布
fig, ax1 = plt.subplots(1) #将测试数据带入predict接口，获得模型的拟合效果并进行绘制
ax1.plot(line, LinearR.predict(line), linewidth=2, color='green'
,label="linear regression")
ax1.plot(line, LinearR_.predict(line_), linewidth=2, color='red'
,label="Polynomial regression") #将原数据上的拟合绘制在图像上
ax1.plot(X[:, 0], y, 'o', c='k') #其他图形选项
ax1.legend(loc="best")
ax1.set_ylabel("Regression output")
ax1.set_xlabel("Input feature")
ax1.set_title("Linear Regression ordinary vs poly")
plt.tight_layout()
plt.show()
#随后可以试试看较低和较高的次方会发生什么变化

Python，sklearn多项式回归处理非线性问题

接下来可以采用交叉验证，观察选取多项式的次数是多少时，得分最高。

Python，sklearn多项式回归处理非线性问题

相关推荐