## Pages

### Precision, Recall, Specificity, Prevalence, Kappa, F1-score check with R

Classification and regression models apply different methods to check the accuracy. In the previous post, we learned how to verify the regression model accuracy and related metrics. In this post, we’ll learn how to check classification model accuracy and its related metrics in R.
The basic concept of classification accuracy check is to identify the misclassification error rate in a prediction. There are several metrics to evaluate the classifier’s performance in its predictions. In this article, we’ll learn how to calculate the below accuracy metrics in R.
• Accuracy
• Precision
• Recall (sensitivity)
• Specificity
• Prevalence
• Kappa,
• F1-score
First, we prepare the actual and predicted results by the model to check the model’s performance. We can use the below example to check the accuracy metrics.

`actual = c("unknown", "target", "unknown","unknown","target","target","unknown",`
`           "target","target","target", "target")`

` `
`predicted = c("unknown", "target", "target","unknown","target",`
`            "unknown", "unknown", "target", "target","target","unknown" )`

As you may have noticed, there are 3 incorrect answers in prediction.
Next, we'll create a cross-table called confusion matrix based on the above data.

The confusion matrix is a table with columns containing actual classes and the rows with predicted classes, and it describes the classifier's performance against the known test data.

Target-Positive Unknown-Negative
Predicted target 5 (tp) 2 (fp)
Predicted unknown 1 (fn) 3 (tn)

True positive (TP) means that the label is 'target' and it is correctly predicted as 'target'.
True negative (TN) means that the label is not 'target' and it is correctly predicted as 'unknown'.
False-positive (FP) means that the label is not 'target' and it is wrongly predicted as a 'target'.
False-negative (FN) means that the label is 'target' and it is wrongly predicted as 'unknown'.

We'll get values from the matrix data

```tp = 5
tn = 3
fp = 2
fn = 1```

Now we can check the metrics and evaluate the model performance in R.

Accuracy

Accuracy represents the ratio of correct predictions. The sum of true positive and false negative is divided by the total number of events.

```accuracy = function(tp, tn, fp, fn)
{
correct = tp+tn
total = tp+tn+fp+fn
return(correct/total)
}```
` `
```accuracy(tp, tn, fp, fn)
 0.7272727```

Precision

Precision identifies how accurately the model predicted the positive classes. The number of true positive events is divided by the sum of positive true and false events.

`precision = function(tp, fp)`
```{
return(tp/(tp+fp))
}```
` `
```precision(tp, fp)
 0.7142857```

Recall or Sensitivity

Recall (sensitivity) measures the ratio of predicted the positive classes. The number of true positive events is divided by the sum of true positive and false negative events.

```recall = function(tp, fn)
{
return(tp/(tp+fn))
}```
` `
```recall(tp, fn)
 0.8333333```

F1-Score

F1-score is the weighted average score of recall and precision. The value at 1 is the best performance and at 0 is the worst.

```f1_score = function(tp, tn, fp, fn)
{
p=precision(tp, fp)
r=recall(tp, fn)
return(2*p*r/(p+r))
}
```
```f1_score(tp, tn, fp, fn)
 0.7692308```

Specificity or True Negative Rate

Specificity (true negative rate) measures the rate of actual negatives identified correctly.

```specificity = function(tn,fp)
{
return(tn/(tn+fp))
}
```
```specificity(tn,fp)
 0.6```

Prevalence

Prevalence represents how often positive events occurred. The sum of true positive and false negative events are divided by the total number of events.

```prevelence = function(tp,tn,fp,fn)
{
t=tp+fn
total=tp+tn+fp+fn
return(t/total)
}
```
```prevelence(tp,tn,fp,fn)
 0.5454545```

Kappa

Kappa (Cohen’s Kappa) identifies how well the model is predicting. The lower Kappa value is, the better the model is. First, we’ll count the results by category. Actual data contains 7 target and 4 unknown labels. Predicted data contains 6 target and 5 unknown labels.

```length(actual[actual=="target"])
 7
length(predicted[predicted=="target"])
 6 ```

```total=tp+tn+fp+fn
observed_acc=(tp+tn)/total
expected_acc=((6*7/total)+(4*5/total))/total

Kappa = (observed_acc-expected_acc)/(1-expected_acc)
print(Kappa)
 0.440678```

Using confusionMatrix()

We can get all those metrics with one command in R. We load the 'caret' package and run the confusionMatrix() command with actual and predicted data.

`library(caret)`
```
```
```confusionMatrix(as.factor(actual),as.factor(predicted))
Confusion Matrix and Statistics

Reference
Prediction target unknown
target       5       2
unknown      1       3

Accuracy : 0.7273
95% CI : (0.3903, 0.9398)
No Information Rate : 0.5455
P-Value [Acc > NIR] : 0.1829

Kappa : 0.4407
Mcnemar's Test P-Value : 1.0000

Sensitivity : 0.8333
Specificity : 0.6000
Pos Pred Value : 0.7143
Neg Pred Value : 0.7500
Prevalence : 0.5455
Detection Rate : 0.4545
Detection Prevalence : 0.6364
Balanced Accuracy : 0.7167

'Positive' Class : target ```

The result shows the model's accuracy results.

In this post, we have briefly learned some of the accuracy metrics to evaluate the classification model. Thank you for reading!

Source code listing

`library(caret)`
`actual = c("unknown", "target", "unknown","unknown","target","target","unknown",`
`           "target","target","target", "target")`
`predicted = c("unknown", "target", "target","unknown","target",`
`            "unknown", "unknown", "target", "target","target","unknown" )`
` `

```tp = 5
tn = 3
fp = 2
fn = 1```

```accuracy = function(tp, tn, fp, fn)
{
correct = tp+tn
total = tp+tn+fp+fn
return(correct/total)
}
```
`precision = function(tp, fp)`
```{
return(tp/(tp+fp))
}```
` `
```recall = function(tp, fn)
{
return(tp/(tp+fn))
} ```

```f1_score = function(tp, tn, fp, fn)
{
p=precision(tp, fp)
r=recall(tp, fn)
return(2*p*r/(p+r))
}```

```specificity = function(tn,fp)
{
return(tn/(tn+fp))
} ```

```prevelence = function(tp,tn,fp,fn)
{
t=tp+fn
total=tp+tn+fp+fn
return(t/total)
}
```
```length(actual[actual=="target"])
length(predicted[predicted=="target"])
```

```total=tp+tn+fp+fn
observed_acc=(tp+tn)/total
expected_acc=((6*7/total)+(4*5/total))/total

Kappa = (observed_acc-expected_acc)/(1-expected_acc)
print(Kappa)
```

```
```
`cm = confusionMatrix(as.factor(actual),as.factor(predicted))`
`print(cm) `