Machine learning potorfolio in manufacturing intelligence practice 2

  • Attribute Interations

you can change the size of graphic picture in Weka.  

Machine learning potorfolio in manufacturing intelligence practice 2

We can see good separation between the classes on the scatter plots. For example, petalwidth versus sepallength and petalwidth versus  sepalwidth are good examples.

This suggest that linear methods and maybe decision trees and instance-based methods may do well on this problem. It also suggest that we probably do not need to spend too much time tuning or using advanced modeling techniques and ensembles. It may be a straightforward modeling problem.


Spot Check Algorithms (Evaluate Algorithms) 

Weka has Experiment interface to help check accuracy performance cross multiple algorithms.

In this practice, I will add bellow multiclass classification algorithms:

-rules.ZeroR

-bayes.NaiveBayes

-functions.Logistic

-functions.SMO

-lazy.IBK (Change KNN parameter from 1 to 3)

-rules.PART

-trees.REPTree

-trees.J48

Machine learning potorfolio in manufacturing intelligence practice 2

After run, we can analysis result when we default choose ZeroR as test basic, result as bellow:

Machine learning potorfolio in manufacturing intelligence practice 2

We can see from above that all models have skill. The score of other algorithms are better than ZeroR and difference is statistically significant.

The results suggest both Logistic Regression and SVM achieved the highest accuracy. If there is no other reason, we can choose sample Logistic Algorithm.


We can futher compare all of results to Logistic Regression result as the test base.

Machine learning potorfolio in manufacturing intelligence practice 2

We now see a very different story. Although the results for Logistic look better, the analysis suggests that the difference between these results and the results from all of the other algorithms are not statistically significant. From here we could choose an algorithm based on other criteria, like understandability or complexity.

From this perspective Logistic Regression and Naive Bayes are good candidates. We could also seek to further improve the results of one or more of these algorithms and see if we can achieve a significant improvement.

"Significant parameter" - default value is 0.05(5%)

which inform us whether the differences between andy of pairwise algorithm performance comparisons we review are statistically significant with a confidence of 95%.
Note '*' next to algorithm indicates the results are significant different from test base algothrim, but the score is lower, 'V' next to the algorithms indicates the score is larger then test base algorithm.
we can choose from algorithms with different significant by score immediately, and we can also choose from others who are not significant different.

Finanlly, I choose Logistic Algorithm and diplay only this algorithm with standard diviation number 3.38%.
Machine learning potorfolio in manufacturing intelligence practice 2

We can see that the estimated accuracy of the model on unseen data is 96.33% with a standard deviation of 3.38%.

Final modle and present result:

Use Weka Explore interface with classify button to choose use dataset to train the modle with entire dataset.
Machine learning potorfolio in manufacturing intelligence practice 2
Present result:
This model can then be loaded at a later time and used to make predictions on new
ower measurements. We can use the mean and standard deviation of the model accuracy collected in the last section to help quantify the expected variability in the estimated accuracy of the model on unseen data. For example, we know that 95% of model accuracies will fall within two standard deviations of the mean model accuracy. Or, restated in a way we can explain to other people, we can generally expect that the performance of the model on unseen data will be 96.33% plus or minus 2 3:38 or 6.76, or between 87.57% and 100% accurate.