The basic concept of classification accuracy check is to identify the misclassification error rate in a prediction. There are several metrics to evaluate the classifier’s performance in its predictions. In this article, we’ll learn how to calculate the below accuracy metrics in R.

- Accuracy
- Precision
- Recall (sensitivity)
- Specificity
- Prevalence
- Kappa,
- F1-score

`actual = c("unknown", "target", "unknown","unknown","target","target","unknown",`

` "target","target","target", "target")`

` `

`predicted = c("unknown", "target", "target","unknown","target",`

` "unknown", "unknown", "target", "target","target","unknown" )`

As you may have noticed, there are 3 incorrect answers in prediction.

Next, we'll create a cross-table called confusion matrix based on the above data.

**The confusion matrix**is a table with columns containing actual classes and the rows with predicted classes, and it describes the classifier's performance against the known test data.

Target-Positive | Unknown-Negative | |
---|---|---|

Predicted target | 5 (tp) | 2 (fp) |

Predicted unknown | 1 (fn) | 3 (tn) |

**True positive**(TP) means that the label is 'target' and it is correctly predicted as 'target'.

**True negative**(TN) means that the label is not 'target' and it is correctly predicted as 'unknown'.

**False-positive**(FP) means that the label is not 'target' and it is wrongly predicted as a 'target'.

**False-negative**(FN) means that the label is 'target' and it is wrongly predicted as 'unknown'.

We'll get values from the matrix data

tp = 5 tn = 3 fp = 2 fn = 1

Now we can check the metrics and evaluate the model performance in R.

**Accuracy**

Accuracy represents the ratio of correct predictions. The sum of true positive and false negative is divided by the total number of events.

accuracy = function(tp, tn, fp, fn) { correct = tp+tn total = tp+tn+fp+fn return(correct/total) }

` `

accuracy(tp, tn, fp, fn) [1] 0.7272727

**Precision**

Precision identifies how accurately the model predicted the positive classes. The number of true positive events is divided by the sum of positive true and false events.

`precision = function(tp, fp)`

```
{
return(tp/(tp+fp))
}
```

` `

precision(tp, fp) [1] 0.7142857

**Recall or Sensitivity**

Recall (sensitivity) measures the ratio of predicted the positive classes. The number of true positive events is divided by the sum of true positive and false negative events.

recall = function(tp, fn) { return(tp/(tp+fn)) }

` `

recall(tp, fn) [1] 0.8333333

**F1-Score**

F1-score is the weighted average score of recall and precision. The value at 1 is the best performance and at 0 is the worst.

f1_score = function(tp, tn, fp, fn) { p=precision(tp, fp) r=recall(tp, fn) return(2*p*r/(p+r)) }

f1_score(tp, tn, fp, fn) [1] 0.7692308

**Specificity or True Negative Rate**

Specificity (true negative rate) measures the rate of actual negatives identified correctly.

specificity = function(tn,fp) { return(tn/(tn+fp)) }

specificity(tn,fp) [1] 0.6

**Prevalence**

Prevalence represents how often positive events occurred. The sum of true positive and false negative events are divided by the total number of events.

prevelence = function(tp,tn,fp,fn) { t=tp+fn total=tp+tn+fp+fn return(t/total) }

prevelence(tp,tn,fp,fn) [1] 0.5454545

**Kappa**

Kappa (Cohen’s Kappa) identifies how well the model is predicting. The lower Kappa value is, the better the model is. First, we’ll count the results by category. Actual data contains 7 target and 4 unknown labels. Predicted data contains 6 target and 5 unknown labels.

length(actual[actual=="target"]) [1] 7 length(predicted[predicted=="target"]) [1] 6

total=tp+tn+fp+fn observed_acc=(tp+tn)/total expected_acc=((6*7/total)+(4*5/total))/total Kappa = (observed_acc-expected_acc)/(1-expected_acc) print(Kappa) [1] 0.440678

**Using confusionMatrix()**

We can get all those metrics with one command in R. We load the 'caret' package and run the confusionMatrix() command with actual and predicted data.

`library(caret)`

```
```

confusionMatrix(as.factor(actual),as.factor(predicted)) Confusion Matrix and Statistics Reference Prediction target unknown target 5 2 unknown 1 3 Accuracy : 0.7273 95% CI : (0.3903, 0.9398) No Information Rate : 0.5455 P-Value [Acc > NIR] : 0.1829 Kappa : 0.4407 Mcnemar's Test P-Value : 1.0000 Sensitivity : 0.8333 Specificity : 0.6000 Pos Pred Value : 0.7143 Neg Pred Value : 0.7500 Prevalence : 0.5455 Detection Rate : 0.4545 Detection Prevalence : 0.6364 Balanced Accuracy : 0.7167 'Positive' Class : target

The result shows the model's accuracy results.

In this post, we have briefly learned some of the accuracy metrics to evaluate the classification model. Thank you for reading!

**Source code listing**

`library(caret)`

`actual = c("unknown", "target", "unknown","unknown","target","target","unknown",`

` "target","target","target", "target")`

`predicted = c("unknown", "target", "target","unknown","target",`

` "unknown", "unknown", "target", "target","target","unknown" )`

` `

tp = 5 tn = 3 fp = 2 fn = 1

accuracy = function(tp, tn, fp, fn) { correct = tp+tn total = tp+tn+fp+fn return(correct/total) }

`precision = function(tp, fp)`

```
{
return(tp/(tp+fp))
}
```

` `

```
recall = function(tp, fn)
{
return(tp/(tp+fn))
}
```

f1_score = function(tp, tn, fp, fn) { p=precision(tp, fp) r=recall(tp, fn) return(2*p*r/(p+r)) }

specificity = function(tn,fp) { return(tn/(tn+fp)) }

prevelence = function(tp,tn,fp,fn) { t=tp+fn total=tp+tn+fp+fn return(t/total) }

length(actual[actual=="target"]) length(predicted[predicted=="target"])

total=tp+tn+fp+fn observed_acc=(tp+tn)/total expected_acc=((6*7/total)+(4*5/total))/total Kappa = (observed_acc-expected_acc)/(1-expected_acc) print(Kappa)

```
```

`cm = confusionMatrix(as.factor(actual),as.factor(predicted))`

`print(cm) `

## No comments:

## Post a Comment