seeds <- read.table("https://archive.ics.uci.edu/ml/machine-learning-databases/00236/seeds_dataset.txt", col.names = c("area", "perimemter", "compactness", "lengthkernel", "widthkernel", "asymcoef", "lengthgroove", "kernnumber"))
#Create New Variables for each Kernal Size
library(tidyverse)
seeds <- seeds %>%
mutate(kernname = case_when(kernnumber == 1 ~ "Kama", kernnumber == 2 ~ "Rosa", kernnumber == 3 ~ "Canadian"))
seeds$kernname <- factor(seeds$kernname)
summary(seeds)
area perimemter compactness lengthkernel
Min. :10.59 Min. :12.41 Min. :0.8081 Min. :4.899
1st Qu.:12.27 1st Qu.:13.45 1st Qu.:0.8569 1st Qu.:5.262
Median :14.36 Median :14.32 Median :0.8734 Median :5.524
Mean :14.85 Mean :14.56 Mean :0.8710 Mean :5.629
3rd Qu.:17.30 3rd Qu.:15.71 3rd Qu.:0.8878 3rd Qu.:5.980
Max. :21.18 Max. :17.25 Max. :0.9183 Max. :6.675
widthkernel asymcoef lengthgroove kernnumber
Min. :2.630 Min. :0.7651 Min. :4.519 Min. :1
1st Qu.:2.944 1st Qu.:2.5615 1st Qu.:5.045 1st Qu.:1
Median :3.237 Median :3.5990 Median :5.223 Median :2
Mean :3.259 Mean :3.7002 Mean :5.408 Mean :2
3rd Qu.:3.562 3rd Qu.:4.7687 3rd Qu.:5.877 3rd Qu.:3
Max. :4.033 Max. :8.4560 Max. :6.550 Max. :3
kernname
Canadian:70
Kama :70
Rosa :70
set.seed(934)
in_train <- createDataPartition(y = seeds$kernname,
p = 0.70,
list = FALSE)
training <- seeds[in_train, ]
testing <- seeds[-in_train, ]
dim(training)
[1] 147 9
dim(testing)
[1] 63 9
set.seed(21)
tree_model <- train(kernname ~ . -kernnumber,
data = training,
method = "rpart",
tuneLength = 10,
trControl = trainControl(method = "repeatedcv",
number = 10,
repeats = 5)
)
tree_model
CART
147 samples
8 predictor
3 classes: 'Canadian', 'Kama', 'Rosa'
No pre-processing
Resampling: Cross-Validated (10 fold, repeated 5 times)
Summary of sample sizes: 133, 132, 132, 132, 133, 132, ...
Resampling results across tuning parameters:
cp Accuracy Kappa
0.00000000 0.8884982 0.83268059
0.05328798 0.8952747 0.84277805
0.10657596 0.8994652 0.84901972
0.15986395 0.8994652 0.84901972
0.21315193 0.8994652 0.84901972
0.26643991 0.8994652 0.84901972
0.31972789 0.8994652 0.84901972
0.37301587 0.8994652 0.84901972
0.42630385 0.6501612 0.47596566
0.47959184 0.3819927 0.08716224
Accuracy was used to select the optimal model using the largest value.
The final value used for the model was cp = 0.3730159.
tree_model$bestTune
cp
8 0.3730159
seeds_tree <- rpart(kernname ~ . - kernnumber, data = training, method = "class", cp = tree_model$bestTune)
rpart.plot::rpart.plot(seeds_tree, type = 1)
class_prediction <- predict(tree_model, newdata = testing, type = "raw")
confusionMatrix(class_prediction, testing$kernname)
Confusion Matrix and Statistics
Reference
Prediction Canadian Kama Rosa
Canadian 21 6 0
Kama 0 15 0
Rosa 0 0 21
Overall Statistics
Accuracy : 0.9048
95% CI : (0.8041, 0.9642)
No Information Rate : 0.3333
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.8571
Mcnemar's Test P-Value : NA
Statistics by Class:
Class: Canadian Class: Kama Class: Rosa
Sensitivity 1.0000 0.7143 1.0000
Specificity 0.8571 1.0000 1.0000
Pos Pred Value 0.7778 1.0000 1.0000
Neg Pred Value 1.0000 0.8750 1.0000
Prevalence 0.3333 0.3333 0.3333
Detection Rate 0.3333 0.2381 0.3333
Detection Prevalence 0.4286 0.2381 0.3333
Balanced Accuracy 0.9286 0.8571 1.0000
set.seed(21)
rf_model <- train(kernname ~ . -kernnumber,
data = training,
method = "ranger",
tuneLength = 10,
trControl = trainControl(method = "repeatedcv",
number = 10,
repeats = 5)
)
note: only 6 unique complexity parameters in default grid. Truncating the grid to 6 .
rf_model
Random Forest
147 samples
8 predictor
3 classes: 'Canadian', 'Kama', 'Rosa'
No pre-processing
Resampling: Cross-Validated (10 fold, repeated 5 times)
Summary of sample sizes: 133, 132, 132, 132, 133, 132, ...
Resampling results across tuning parameters:
mtry splitrule Accuracy Kappa
2 gini 0.9119414 0.8677411
2 extratrees 0.9172747 0.8757578
3 gini 0.9172747 0.8757411
3 extratrees 0.9199414 0.8797578
4 gini 0.9212747 0.8817411
4 extratrees 0.9240366 0.8859117
5 gini 0.9199414 0.8797411
5 extratrees 0.9280366 0.8919117
6 gini 0.9226081 0.8837411
6 extratrees 0.9311136 0.8965751
7 gini 0.9198462 0.8796207
7 extratrees 0.9322418 0.8982536
Tuning parameter 'min.node.size' was held constant at a value of 1
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were mtry = 7, splitrule =
extratrees and min.node.size = 1.
class_prediction <- predict(rf_model, newdata = testing, type = "raw")
confusionMatrix(class_prediction, testing$kernname)
Confusion Matrix and Statistics
Reference
Prediction Canadian Kama Rosa
Canadian 21 4 0
Kama 0 17 0
Rosa 0 0 21
Overall Statistics
Accuracy : 0.9365
95% CI : (0.8453, 0.9824)
No Information Rate : 0.3333
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.9048
Mcnemar's Test P-Value : NA
Statistics by Class:
Class: Canadian Class: Kama Class: Rosa
Sensitivity 1.0000 0.8095 1.0000
Specificity 0.9048 1.0000 1.0000
Pos Pred Value 0.8400 1.0000 1.0000
Neg Pred Value 1.0000 0.9130 1.0000
Prevalence 0.3333 0.3333 0.3333
Detection Rate 0.3333 0.2698 0.3333
Detection Prevalence 0.3968 0.2698 0.3333
Balanced Accuracy 0.9524 0.9048 1.0000
set.seed(21)
en_model <- train(kernname ~ . -kernnumber,
data = training,
method = "glmnet",
tuneLength = 10,
trControl = trainControl(method = "repeatedcv",
number = 10,
repeats = 5)
)
en_model
glmnet
147 samples
8 predictor
3 classes: 'Canadian', 'Kama', 'Rosa'
No pre-processing
Resampling: Cross-Validated (10 fold, repeated 5 times)
Summary of sample sizes: 133, 132, 132, 132, 133, 132, ...
Resampling results across tuning parameters:
alpha lambda Accuracy Kappa
0.1 0.0001926872 0.9389231 0.9083614
0.1 0.0004451325 0.9389231 0.9083614
0.1 0.0010283139 0.9320513 0.8980191
0.1 0.0023755386 0.9226227 0.8838485
0.1 0.0054878024 0.9145128 0.8716369
0.1 0.0126775356 0.9075458 0.8611821
0.1 0.0292867523 0.9088791 0.8631821
0.1 0.0676561979 0.9130842 0.8695241
0.1 0.1562945961 0.9114505 0.8670609
0.1 0.3610607975 0.9114505 0.8670609
0.2 0.0001926872 0.9430183 0.9144826
0.2 0.0004451325 0.9389231 0.9083614
0.2 0.0010283139 0.9333846 0.9000191
0.2 0.0023755386 0.9239560 0.8858485
0.2 0.0054878024 0.9158462 0.8736369
0.2 0.0126775356 0.9115458 0.8671821
0.2 0.0292867523 0.9116410 0.8673360
0.2 0.0676561979 0.9158462 0.8736779
0.2 0.1562945961 0.9114505 0.8670609
0.2 0.3610607975 0.9087839 0.8630609
0.3 0.0001926872 0.9456850 0.9184826
0.3 0.0004451325 0.9389231 0.9083614
0.3 0.0010283139 0.9362564 0.9043614
0.3 0.0023755386 0.9266227 0.8898485
0.3 0.0054878024 0.9171795 0.8756369
0.3 0.0126775356 0.9129744 0.8693360
0.3 0.0292867523 0.9131795 0.8696779
0.3 0.0676561979 0.9158462 0.8736779
0.3 0.1562945961 0.9142125 0.8711819
0.3 0.3610607975 0.9004029 0.8504781
0.4 0.0001926872 0.9470183 0.9204826
0.4 0.0004451325 0.9389231 0.9083614
0.4 0.0010283139 0.9362564 0.9043614
0.4 0.0023755386 0.9266227 0.8898485
0.4 0.0054878024 0.9157509 0.8735480
0.4 0.0126775356 0.9158462 0.8736369
0.4 0.0292867523 0.9116410 0.8673360
0.4 0.0676561979 0.9158462 0.8736779
0.4 0.1562945961 0.9155458 0.8731819
0.4 0.3610607975 0.8896410 0.8343076
0.5 0.0001926872 0.9470183 0.9204826
0.5 0.0004451325 0.9416850 0.9124826
0.5 0.0010283139 0.9375897 0.9063614
0.5 0.0023755386 0.9279560 0.8918485
0.5 0.0054878024 0.9184176 0.8775480
0.5 0.0126775356 0.9173846 0.8759788
0.5 0.0292867523 0.9145128 0.8716779
0.5 0.0676561979 0.9158462 0.8736779
0.5 0.1562945961 0.9142125 0.8711819
0.5 0.3610607975 0.8896410 0.8343076
0.6 0.0001926872 0.9470183 0.9204826
0.6 0.0004451325 0.9416850 0.9124826
0.6 0.0010283139 0.9375897 0.9063614
0.6 0.0023755386 0.9266227 0.8898485
0.6 0.0054878024 0.9197509 0.8795480
0.6 0.0126775356 0.9173846 0.8759788
0.6 0.0292867523 0.9145128 0.8716779
0.6 0.0676561979 0.9171795 0.8756779
0.6 0.1562945961 0.9142125 0.8711819
0.6 0.3610607975 0.8488645 0.7731243
0.7 0.0001926872 0.9470183 0.9204826
0.7 0.0004451325 0.9416850 0.9124826
0.7 0.0010283139 0.9375897 0.9063614
0.7 0.0023755386 0.9293846 0.8940191
0.7 0.0054878024 0.9210842 0.8815480
0.7 0.0126775356 0.9187179 0.8779788
0.7 0.0292867523 0.9145128 0.8716779
0.7 0.0676561979 0.9171795 0.8756779
0.7 0.1562945961 0.9061172 0.8590445
0.7 0.3610607975 0.8183883 0.7274322
0.8 0.0001926872 0.9471136 0.9205877
0.8 0.0004451325 0.9430183 0.9144826
0.8 0.0010283139 0.9389231 0.9083614
0.8 0.0023755386 0.9322564 0.8983614
0.8 0.0054878024 0.9268278 0.8901909
0.8 0.0126775356 0.9200513 0.8799788
0.8 0.0292867523 0.9145128 0.8716779
0.8 0.0676561979 0.9214799 0.8821532
0.8 0.1562945961 0.9063223 0.8593864
0.8 0.3610607975 0.8086593 0.7129840
0.9 0.0001926872 0.9484469 0.9225877
0.9 0.0004451325 0.9456850 0.9184826
0.9 0.0010283139 0.9403516 0.9104826
0.9 0.0023755386 0.9375897 0.9063614
0.9 0.0054878024 0.9294945 0.8941909
0.9 0.0126775356 0.9187179 0.8779788
0.9 0.0292867523 0.9173846 0.8759994
0.9 0.0676561979 0.9202418 0.8802906
0.9 0.1562945961 0.9133846 0.8699827
0.9 0.3610607975 0.7859560 0.6807828
1.0 0.0001926872 0.9416850 0.9124338
1.0 0.0004451325 0.9470183 0.9204826
1.0 0.0010283139 0.9510183 0.9264826
1.0 0.0023755386 0.9429231 0.9143614
1.0 0.0054878024 0.9389231 0.9083614
1.0 0.0126775356 0.9173846 0.8759788
1.0 0.0292867523 0.9188132 0.8781368
1.0 0.0676561979 0.9133846 0.8699994
1.0 0.1562945961 0.9080513 0.8619994
1.0 0.3610607975 0.6373700 0.4590265
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were alpha = 1 and lambda
= 0.001028314.
en_model$bestTune
alpha lambda
93 1 0.001028314
# coef(en_model$finalModel)
plot(en_model)
Note: \(\alpha = 1 \rightarrow\) LASSO.
class_prediction <- predict(en_model, newdata = testing, type = "raw")
confusionMatrix(class_prediction, testing$kernname)
Confusion Matrix and Statistics
Reference
Prediction Canadian Kama Rosa
Canadian 20 1 0
Kama 1 20 0
Rosa 0 0 21
Overall Statistics
Accuracy : 0.9683
95% CI : (0.89, 0.9961)
No Information Rate : 0.3333
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.9524
Mcnemar's Test P-Value : NA
Statistics by Class:
Class: Canadian Class: Kama Class: Rosa
Sensitivity 0.9524 0.9524 1.0000
Specificity 0.9762 0.9762 1.0000
Pos Pred Value 0.9524 0.9524 1.0000
Neg Pred Value 0.9762 0.9762 1.0000
Prevalence 0.3333 0.3333 0.3333
Detection Rate 0.3175 0.3175 0.3333
Detection Prevalence 0.3333 0.3333 0.3333
Balanced Accuracy 0.9643 0.9643 1.0000
resamples()
ANS <- resamples(list(TR = tree_model, RF = rf_model, EN = en_model))
summary(ANS)
Call:
summary.resamples(object = ANS)
Models: TR, RF, EN
Number of resamples: 50
Accuracy
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
TR 0.7333333 0.8666667 0.9285714 0.8994652 0.9333333 1 0
RF 0.7333333 0.9297619 0.9333333 0.9322418 1.0000000 1 0
EN 0.8000000 0.9333333 0.9333333 0.9510183 1.0000000 1 0
Kappa
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
TR 0.6 0.8000000 0.8918903 0.8490197 0.9 1 0
RF 0.6 0.8942308 0.9000000 0.8982536 1.0 1 0
EN 0.7 0.9000000 0.9000000 0.9264826 1.0 1 0
bwplot(ANS)