In this article, we'll briefly learn how to calculate the regression model accuracy by using the above-mentioned metrics in Python. The post covers:
- Regression accuracy metrics
- Preparing data
- Metrics calculation by formula
- Metrics calculation by sklearn.metrics
Regression accuracy metrics
The MSE, MAE, RMSE, and R-Squared are mainly used metrics to evaluate the prediction error rates and model performance in regression analysis.
- MAE (Mean absolute error) represents the difference between the original and predicted values extracted by averaged the absolute difference over the data set.
- MSE (Mean Squared Error) represents the difference between the original and predicted values extracted by squared the average difference over the data set.
- RMSE (Root Mean Squared Error) is the error rate by the square root of MSE.
- R-squared (Coefficient of determination) represents the coefficient of how well the values fit compared to the original values. The value from 0 to 1 interpreted as percentages. The higher the value is, the better the model is.
Preparing data
Original target data y and predicted label the yhat are the main sources to evaluate the model. We'll start by loading the required modules for this tutorial.
import numpy as np import sklearn.metrics as metrics import matplotlib.pyplot as plt
Next, we'll create sample y and yhat data to evaluate the model by the above metrics.
y = np.array([-3, -1, -2, 1, -1, 1, 2, 1, 3, 4, 3, 5]) yhat = np.array([-2, 1, -1, 0, -1, 1, 2, 2, 3, 3, 3, 5]) x = list(range(len(y)))
We can visualize them in a plot to check the difference visually.
plt.scatter(x, y, color="blue", label="original") plt.plot(x, yhat, color="red", label="predicted") plt.legend() plt.show()
Metrics calculation by formula
By using the above formulas, we can easily calculate them in Python.
# calculate manually d = y - yhat mse_f = np.mean(d**2) mae_f = np.mean(abs(d)) rmse_f = np.sqrt(mse_f) r2_f = 1-(sum(d**2)/sum((y-np.mean(y))**2)) print("Results by manual calculation:") print("MAE:",mae_f) print("MSE:", mse_f) print("RMSE:", rmse_f) print("R-Squared:", r2_f)
Results by manual calculation: MAE: 0.5833333333333334 MSE: 0.75 RMSE: 0.8660254037844386 R-Squared: 0.8655043586550436
Metrics calculation by sklearn.metrics
Sklearn provides the number of metrics to evaluate accuracy. The next method is to calculate metrics with sklearn functions.
mae = metrics.mean_absolute_error(y, yhat) mse = metrics.mean_squared_error(y, yhat) rmse = np.sqrt(mse) # or mse**(0.5) r2 = metrics.r2_score(y,yhat) print("Results of sklearn.metrics:") print("MAE:",mae) print("MSE:", mse) print("RMSE:", rmse) print("R-Squared:", r2)
Results of sklearn.metrics: MAE: 0.5833333333333334 MSE: 0.75 RMSE: 0.8660254037844386 R-Squared: 0.8655043586550436
The results are the same in both methods. You can use any method according to your convenience in your regression analysis.
In this post, we've briefly learned how to calculate MSE, MAE, RMSE, and R-Squared accuracy metrics in Python. The full source code is listed below.
Source code listing
import numpy as np import sklearn.metrics as metrics import matplotlib.pyplot as plt y = np.array([-3, -1, -2, 1, -1, 1, 2, 1, 3, 4, 3, 5]) yhat = np.array([-2, 1, -1, 0, -1, 1, 2, 2, 3, 3, 3, 5]) x = list(range(len(y))) plt.scatter(x, y, color="blue", label="original") plt.plot(x, yhat, color="red", label="predicted") plt.legend() plt.show() # calculate manually d = y - yhat mse_f = np.mean(d**2) mae_f = np.mean(abs(d)) rmse_f = np.sqrt(mse_f) r2_f = 1-(sum(d**2)/sum((y-np.mean(y))**2)) print("Results by manual calculation:") print("MAE:",mae_f) print("MSE:", mse_f) print("RMSE:", rmse_f) print("R-Squared:", r2_f) mae = metrics.mean_absolute_error(y, yhat) mse = metrics.mean_squared_error(y, yhat) rmse = np.sqrt(mse) #mse**(0.5) r2 = metrics.r2_score(y,yhat) print("Results of sklearn.metrics:") print("MAE:",mae) print("MSE:", mse) print("RMSE:", rmse) print("R-Squared:", r2)
Good summary!
ReplyDeleteExcellent article with concepts and formulas, thank you to share your knowledge
ReplyDeletedeserve mroe love
ReplyDeletewhat about predictions from a large dataset
ReplyDeletethanks
ReplyDeleteif you do easy things, then you can be difficult things...
ReplyDeletethanks for the example... HV