iris dataset analysis

Author

Haky Im

Published

January 1, 1999

clasify iris data

Code
# Load the iris dataset
data(iris)

# Perform Fisher's discriminant analysis
library(MASS)
lda_model <- lda(Species ~ ., data = iris)

# Print the summary of the analysis
print(lda_model)
Call:
lda(Species ~ ., data = iris)

Prior probabilities of groups:
    setosa versicolor  virginica 
 0.3333333  0.3333333  0.3333333 

Group means:
           Sepal.Length Sepal.Width Petal.Length Petal.Width
setosa            5.006       3.428        1.462       0.246
versicolor        5.936       2.770        4.260       1.326
virginica         6.588       2.974        5.552       2.026

Coefficients of linear discriminants:
                    LD1         LD2
Sepal.Length  0.8293776  0.02410215
Sepal.Width   1.5344731  2.16452123
Petal.Length -2.2012117 -0.93192121
Petal.Width  -2.8104603  2.83918785

Proportion of trace:
   LD1    LD2 
0.9912 0.0088 
Code
# Predict the species using the model
predicted_species <- predict(lda_model, iris)$class

# Compare the predicted species with the actual species
accuracy <- mean(predicted_species == iris$Species)
cat("Accuracy:", accuracy * 100, "%\n")
Accuracy: 98 %

show performance

Code
# Load the iris dataset
data(iris)

# Perform Fisher's discriminant analysis
library(MASS)
lda_model <- lda(Species ~ ., data = iris)

# Predict the species using the model
predicted_species <- predict(lda_model, iris)$class

# Create a confusion matrix
library(caret)
Loading required package: ggplot2
Loading required package: lattice
Code
confusion <- confusionMatrix(predicted_species, iris$Species)
print(confusion)
Confusion Matrix and Statistics

            Reference
Prediction   setosa versicolor virginica
  setosa         50          0         0
  versicolor      0         48         1
  virginica       0          2        49

Overall Statistics
                                          
               Accuracy : 0.98            
                 95% CI : (0.9427, 0.9959)
    No Information Rate : 0.3333          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.97            
                                          
 Mcnemar's Test P-Value : NA              

Statistics by Class:

                     Class: setosa Class: versicolor Class: virginica
Sensitivity                 1.0000            0.9600           0.9800
Specificity                 1.0000            0.9900           0.9800
Pos Pred Value              1.0000            0.9796           0.9608
Neg Pred Value              1.0000            0.9802           0.9899
Prevalence                  0.3333            0.3333           0.3333
Detection Rate              0.3333            0.3200           0.3267
Detection Prevalence        0.3333            0.3267           0.3400
Balanced Accuracy           1.0000            0.9750           0.9800
Code
# Create a classification plot
library(ggplot2)
iris_predicted <- data.frame(iris, Predicted_Species = predicted_species)
ggplot(iris_predicted, aes(x = Species, fill = Predicted_Species)) +
  geom_bar(position = "fill") +
  labs(title = "LDA Classification Plot") +
  scale_fill_manual(values = c("#E41A1C", "#377EB8", "#4DAF4A")) +
  theme_minimal()