DataTechNotes: Regression Example with K-Nearest Neighbors in Python

K-Nearest Neighbors or KNN is a supervised machine learning algorithm and it can be used for classification and regression problems. KNN utilizes the entire dataset. Based on k neighbors value and distance calculation method (Minkowski, Euclidean, etc.), the model predicts the elements. The KNN regressor uses a mean or median value of k neighbors to predict the target element.
In this post, we'll briefly learn how to use the sklearn KNN regressor model for the regression problem in Python. The tutorial covers:

Preparing sample data
Constructing KNeighborRefressor model
Predicting and checking the accuracy

We'll start by importing the required libraries.

import random
import math
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error 
from sklearn.neighbors import KNeighborsRegressor

Preparing sample data

Here, I'll generate a simple regression dataset for this tutorial.

random.seed(123)
def getData(N):
 x,y =[],[]
 for i in range(N):  
  a = i/10+random.uniform(-1,1)
  yy =math.sin(a)+3+random.uniform(-1,1)
  x.append([a])
  y.append([yy])
  
 return np.array(x), np.array(y)

x,y=getData(200)

Constructing KNeighborRefressor model

We'll use KNeighborsRegressor class of sklearn library. I'll set 8 to the n_neighbors parameter, you can change it. The distance is calculated by default method, Minkowski.

model = KNeighborsRegressor(n_neighbors=8)
print(model)

KNeighborsRegressor(algorithm='auto', leaf_size=30, metric='minkowski',
          metric_params=None, n_jobs=1, n_neighbors=8, p=2,
          weights='uniform')

Next, we'll fit the model with x input data.

model.fit(x,y)

Predicting and checking the accuracy

We'll predict x input data with a fitted knn model.

pred_y = model.predict(x)

Next, we'll check the model prediction accuracy.

score=model.score(x,y)
print(score)

0.5777770279320101

mse =mean_squared_error(y, pred_y)
print("Mean Squared Error:",mse)

Mean Squared Error: 0.3273700949466075

rmse = math.sqrt(mse)
print("Root Mean Squared Error:", rmse)

Root Mean Squared Error: 0.5721626472836264

Finally, we'll plot the predicted result.

x_ax=range(200)
plt.scatter(x_ax, y, s=5, color="blue", label="original")
plt.plot(x_ax, pred_y, lw=1.5, color="red", label="predicted")
plt.legend()
plt.show()

In this post, we've briefly learned how to use KNeighborsRegressor for regression problem in python. The full source code is listed below.

import random
import math
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error 
from sklearn.neighbors import KNeighborsRegressor

random.seed(123)
def getData(N):
 x,y =[],[]
 for i in range(N):  
  a = i/10+random.uniform(-1,1)
  yy =math.sin(a)+3+random.uniform(-1,1)
  x.append([a])
  y.append([yy])
  
 return np.array(x), np.array(y)

x,y=getData(200)
model = KNeighborsRegressor(n_neighbors=8)
print(model)

model.fit(x,y)
pred_y = model.predict(x)

score=model.score(x,y)
print(score)

mse =mean_squared_error(y, pred_y)
print("Mean Squared Error:",mse)

rmse = math.sqrt(mse)
print("Root Mean Squared Error:", rmse)

x_ax=range(200)
plt.scatter(x_ax, y, s=5, color="blue", label="original")
plt.plot(x_ax, pred_y, lw=1.5, color="red", label="predicted")
plt.legend()
plt.show()

Reference:

sklearn.KNeighborsRegressor

DataTechNotes

Pages

Regression Example with K-Nearest Neighbors in Python

2 comments: