ML之Sklearn:利用八种机器学习算法对根据大量糖尿病数据集案例对新个体是否患糖尿病进行预测
利用八种机器学习算法对根据提取的九个特征对是否患糖尿病进行预测
1、预测
2、k-NN
k-NN:Accuracy of K-NN classifier on training set: 0.79
k-NN:Accuracy of K-NN classifier on test set: 0.78
3、LoR
LoR:C1 Training set accuracy: 0.781
LoR:C1 Test set accuracy: 0.771
LoR:C100 Training set accuracy: 0.785
LoR:C100 Test set accuracy: 0.766
LoR:C001 Training set accuracy: 0.700
LoR:C001 Test set accuracy: 0.703
4、DT
DT:Accuracy on training set: 1.000
DT:Accuracy on test set: 0.714
DT:Accuracy on training set: 0.773
DT:Accuracy on test set: 0.740
5、RF
RF:Accuracy on training set: 1.000
RF:Accuracy on test set: 0.786
RF:max_depth=3 Accuracy on training set: 0.800
RF:max_depth=3 Accuracy on test set: 0.755
6、GB
GB:Accuracy on training set: 0.917
GB:Accuracy on test set: 0.792
GB:Accuracy on training set: 0.804
GB:Accuracy on test set: 0.781
GB:Accuracy on training set: 0.802
GB:Accuracy on test set: 0.776
7、SVM
SVM:Accuracy on training set: 1.00
SVM:Accuracy on test set: 0.65
SVM:MinMaxScaler Accuracy on training set: 0.77
SVM:MinMaxScaler Accuracy on test set: 0.77
SVM:C=500 Accuracy on training set: 0.790
SVM:C=500 Accuracy on test set: 0.792
SVM:C=1000 Accuracy on training set: 0.790
SVM:C=1000 Accuracy on test set: 0.797
SVM:C=2000 Accuracy on training set: 0.800
SVM:C=2000 Accuracy on test set: 0.797
8、NN
利用多层神经网络
NN:Data standardization—Accuracy on training set: 0.823
NN:Data standardization—Accuracy on test set: 0.802
NN:Data standardization(max_iter=1000)—Accuracy on training set: 0.877
NN:Data standardization(max_iter=1000)—Accuracy on test set: 0.755
NN:Data standardization(max_iter=1000,alpha=1)—Accuracy on training set: 0.795
NN:Data standardization(max_iter=1000,alpha=1)—Accuracy on test set: 0.792
全部代码稍后公布!有任何算法理解问题,可留言共同探讨!