In this article, we'll briefly learn how to calculate the regression model accuracy by using the above-mentioned metrics in Python. The post covers:

- Regression accuracy metrics
- Preparing data
- Metrics calculation by formula
- Metrics calculation by sklearn.metrics

**Regression accuracy metrics**

The MSE, MAE, RMSE, and R-Squared are mainly used metrics to evaluate the prediction error rates and model performance in regression analysis.

**MAE**(Mean absolute error) represents the difference between the original and predicted values extracted by averaged the absolute difference over the data set.**MSE**(Mean Squared Error) represents the difference between the original and predicted values extracted by squared the average difference over the data set.**RMSE**(Root Mean Squared Error) is the error rate by the square root of MSE.**R-squared**(Coefficient of determination) represents the coefficient of how well the values fit compared to the original values. The value from 0 to 1 interpreted as percentages. The higher the value is, the better the model is.

**Preparing data**

Original target data

*and predicted label the*

**y****are the main sources to evaluate the model. We'll start by loading the required modules for this tutorial.**

*yhat*import numpy as np import sklearn.metrics as metrics import matplotlib.pyplot as plt

Next, we'll create sample

**and**

*y***data to evaluate the model by the above metrics.**

*yhat*y = np.array([-3, -1, -2, 1, -1, 1, 2, 1, 3, 4, 3, 5]) yhat = np.array([-2, 1, -1, 0, -1, 1, 2, 2, 3, 3, 3, 5]) x = list(range(len(y)))

We can visualize them in a plot to check the difference visually.

plt.scatter(x, y, color="blue", label="original") plt.plot(x, yhat, color="red", label="predicted") plt.legend() plt.show()

**Metrics calculation by formula**

By using the above formulas, we can easily calculate them in Python.

# calculate manually d = y - yhat mse_f = np.mean(d**2) mae_f = np.mean(abs(d)) rmse_f = np.sqrt(mse_f) r2_f = 1-(sum(d**2)/sum((y-np.mean(y))**2)) print("Results by manual calculation:") print("MAE:",mae_f) print("MSE:", mse_f) print("RMSE:", rmse_f) print("R-Squared:", r2_f)

Results by manual calculation: MAE: 0.5833333333333334 MSE: 0.75 RMSE: 0.8660254037844386 R-Squared: 0.8655043586550436

**Metrics calculation by sklearn.metrics**

Sklearn provides the number of metrics to evaluate accuracy. The next method is to calculate metrics with sklearn functions.

mae = metrics.mean_absolute_error(y, yhat) mse = metrics.mean_squared_error(y, yhat) rmse = np.sqrt(mse) # or mse**(0.5) r2 = metrics.r2_score(y,yhat) print("Results of sklearn.metrics:") print("MAE:",mae) print("MSE:", mse) print("RMSE:", rmse) print("R-Squared:", r2)

Results of sklearn.metrics: MAE: 0.5833333333333334 MSE: 0.75 RMSE: 0.8660254037844386 R-Squared: 0.8655043586550436

The results are the same in both methods. You can use any method according to your convenience in your regression analysis.

In this post, we've briefly learned how to calculate MSE, MAE, RMSE, and R-Squared accuracy metrics in Python. The full source code is listed below.

**Source code listing**

import numpy as np import sklearn.metrics as metrics import matplotlib.pyplot as plt y = np.array([-3, -1, -2, 1, -1, 1, 2, 1, 3, 4, 3, 5]) yhat = np.array([-2, 1, -1, 0, -1, 1, 2, 2, 3, 3, 3, 5]) x = list(range(len(y))) plt.scatter(x, y, color="blue", label="original") plt.plot(x, yhat, color="red", label="predicted") plt.legend() plt.show() # calculate manually d = y - yhat mse_f = np.mean(d**2) mae_f = np.mean(abs(d)) rmse_f = np.sqrt(mse_f) r2_f = 1-(sum(d**2)/sum((y-np.mean(y))**2)) print("Results by manual calculation:") print("MAE:",mae_f) print("MSE:", mse_f) print("RMSE:", rmse_f) print("R-Squared:", r2_f) mae = metrics.mean_absolute_error(y, yhat) mse = metrics.mean_squared_error(y, yhat) rmse = np.sqrt(mse) #mse**(0.5) r2 = metrics.r2_score(y,yhat) print("Results of sklearn.metrics:") print("MAE:",mae) print("MSE:", mse) print("RMSE:", rmse) print("R-Squared:", r2)

Good summary!

ReplyDelete