The basic concept of classification accuracy check is to identify the misclassification error rate in a prediction. There are several metrics to evaluate the classifier’s performance in its predictions. In this article, we’ll learn how to calculate the below accuracy metrics in R.

- Accuracy
- Precision
- Recall (sensitivity)
- Specificity
- Prevalence
- Kappa,
- F1-score

`actual = c("unknown", "target", "unknown","unknown","target","target","unknown",`

` "target","target","target", "target")`

` `

`predicted = c("unknown", "target", "target","unknown","target",`

` "unknown", "unknown", "target", "target","target","unknown" )`

As you may have noticed, there are 3 incorrect answers in prediction.

Next, we'll create a cross-table called confusion matrix based on the above data.

**The confusion matrix**is a table with columns containing actual classes and the rows with predicted classes, and it describes the classifier's performance against the known test data.

Target-Positive | Unknown-Negative | |
---|---|---|

Predicted target | 5 (tp) | 2 (fp) |

Predicted unknown | 1 (fn) | 3 (tn) |

**True positive**(TP) means that the label is 'target' and it is correctly predicted as 'target'.

**True negative**(TN) means that the label is not 'target' and it is correctly predicted as 'unknown'.

**False-positive**(FP) means that the label is not 'target' and it is wrongly predicted as a 'target'.

**False-negative**(FN) means that the label is 'target' and it is wrongly predicted as 'unknown'.

We'll get values from the matrix data

```
tp = 5
tn = 3
fp = 2
fn = 1
```

Now we can check the metrics and evaluate the model performance in R.

**Accuracy**

Accuracy represents the ratio of correct predictions. The sum of true positive and false negative is divided by the total number of events.

```
accuracy = function(tp, tn, fp, fn)
{
correct = tp+tn
total = tp+tn+fp+fn
return(correct/total)
}
```

` `

```
accuracy(tp, tn, fp, fn)
[1] 0.7272727
```

**Precision**

Precision identifies how accurately the model predicted the positive classes. The number of true positive events is divided by the sum of positive true and false events.

`precision = function(tp, fp)`

```
{
return(tp/(tp+fp))
}
```

` `

```
precision(tp, fp)
[1] 0.7142857
```

**Recall or Sensitivity**

Recall (sensitivity) measures the ratio of predicted the positive classes. The number of true positive events is divided by the sum of true positive and false negative events.

```
recall = function(tp, fn)
{
return(tp/(tp+fn))
}
```

` `

```
recall(tp, fn)
[1] 0.8333333
```

**F1-Score**

F1-score is the weighted average score of recall and precision. The value at 1 is the best performance and at 0 is the worst.

```
f1_score = function(tp, tn, fp, fn)
{
p=precision(tp, fp)
r=recall(tp, fn)
return(2*p*r/(p+r))
}
```

```
f1_score(tp, tn, fp, fn)
[1] 0.7692308
```

**Specificity or True Negative Rate**

Specificity (true negative rate) measures the rate of actual negatives identified correctly.

```
specificity = function(tn,fp)
{
return(tn/(tn+fp))
}
```

```
specificity(tn,fp)
[1] 0.6
```

**Prevalence**

Prevalence represents how often positive events occurred. The sum of true positive and false negative events are divided by the total number of events.

```
prevelence = function(tp,tn,fp,fn)
{
t=tp+fn
total=tp+tn+fp+fn
return(t/total)
}
```

```
prevelence(tp,tn,fp,fn)
[1] 0.5454545
```

**Kappa**

Kappa (Cohen’s Kappa) identifies how well the model is predicting. The lower Kappa value is, the better the model is. First, we’ll count the results by category. Actual data contains 7 target and 4 unknown labels. Predicted data contains 6 target and 5 unknown labels.

```
length(actual[actual=="target"])
[1] 7
length(predicted[predicted=="target"])
[1] 6
```

```
total=tp+tn+fp+fn
observed_acc=(tp+tn)/total
expected_acc=((6*7/total)+(4*5/total))/total
Kappa = (observed_acc-expected_acc)/(1-expected_acc)
print(Kappa)
[1] 0.440678
```

**Using confusionMatrix()**

We can get all those metrics with one command in R. We load the 'caret' package and run the confusionMatrix() command with actual and predicted data.

`library(caret)`

```
```

```
confusionMatrix(as.factor(actual),as.factor(predicted))
Confusion Matrix and Statistics
Reference
Prediction target unknown
target 5 2
unknown 1 3
Accuracy : 0.7273
95% CI : (0.3903, 0.9398)
No Information Rate : 0.5455
P-Value [Acc > NIR] : 0.1829
Kappa : 0.4407
Mcnemar's Test P-Value : 1.0000
Sensitivity : 0.8333
Specificity : 0.6000
Pos Pred Value : 0.7143
Neg Pred Value : 0.7500
Prevalence : 0.5455
Detection Rate : 0.4545
Detection Prevalence : 0.6364
Balanced Accuracy : 0.7167
'Positive' Class : target
```

The result shows the model's accuracy results.

In this post, we have briefly learned some of the accuracy metrics to evaluate the classification model. Thank you for reading!

**Source code listing**

`library(caret)`

`actual = c("unknown", "target", "unknown","unknown","target","target","unknown",`

` "target","target","target", "target")`

`predicted = c("unknown", "target", "target","unknown","target",`

` "unknown", "unknown", "target", "target","target","unknown" )`

` `

```
tp = 5
tn = 3
fp = 2
fn = 1
```

```
accuracy = function(tp, tn, fp, fn)
{
correct = tp+tn
total = tp+tn+fp+fn
return(correct/total)
}
```

`precision = function(tp, fp)`

```
{
return(tp/(tp+fp))
}
```

` `

```
recall = function(tp, fn)
{
return(tp/(tp+fn))
}
```

```
f1_score = function(tp, tn, fp, fn)
{
p=precision(tp, fp)
r=recall(tp, fn)
return(2*p*r/(p+r))
}
```

```
specificity = function(tn,fp)
{
return(tn/(tn+fp))
}
```

```
prevelence = function(tp,tn,fp,fn)
{
t=tp+fn
total=tp+tn+fp+fn
return(t/total)
}
```

```
length(actual[actual=="target"])
length(predicted[predicted=="target"])
```

```
total=tp+tn+fp+fn
observed_acc=(tp+tn)/total
expected_acc=((6*7/total)+(4*5/total))/total
Kappa = (observed_acc-expected_acc)/(1-expected_acc)
print(Kappa)
```

```
```

`cm = confusionMatrix(as.factor(actual),as.factor(predicted))`

`print(cm) `

## No comments:

## Post a Comment