Multi-output Regression Example with MultiOutputRegressor in Python

   We studied many methods of multioutput regression analysis with Keras in previous posts. In this tutorial, we'll learn how to fit and predict multioutput regression data with scikit-learn's MultiOutputRegressor class. Multioutput data contains more than one target labels for a given x input data. The tutorial covers:
  1. Preparing the data
  2. Defining the model
  3. Predicting and visualizing the result
  4. Source code listing
We'll start by loading the required libraries for this tutorial.

from numpy import array, hstack, math
from numpy.random import uniform
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.multioutput import MultiOutputRegressor

Preparing the data

First, we 'll create a multi-output dataset for this tutorial. It is randomly generated data with some rules below. There are three inputs and two outputs in this dataset. We'll plot the generated data to check it visually.

def create_data(n):
 x1=array([math.sin(i)*(i/10)+uniform(-5,5) for i in range(n)]).reshape(n,1)
 x2=array([math.cos(i)*(i/10)+uniform(-9,5) for i in range(n)]).reshape(n,1)
 x3=array([(i/50)+uniform(-10,10) for i in range(n)]).reshape(n,1)

 y1 = [x1[i]+x2[i]+x3[i]+uniform(-1,4)+15 for i in range(n)]
 y2 = [x1[i]-x2[i]-x3[i]-uniform(-4,2)-10 for i in range(n)]
 X = hstack((x1, x2, x3))
 Y = hstack((y1, y2))
 return X, Y

n = 300
X, Y = create_data(n)
 
f = plt.figure()
f.add_subplot(1,2,1)
plt.title("Xs input data")
plt.plot(X)
plt.xlabel("Samples")
f.add_subplot(1,2,2)
plt.title("Ys output data")
plt.plot(Y)
plt.xlabel("Samples")
plt.show()

Next, we'll split the dataset into the train and test parts and check the data shapes.

xtrain, xtest, ytrain, ytest=train_test_split(X, Y, test_size=0.15)
print("xtrain:", xtrain.shape, "ytrian:", ytrain.shape)
xtrain: (255, 3) ytrian: (255, 2) 
print("xtest:", xtest.shape, "ytest:", ytest.shape)
xtest: (45, 3) ytest: (45, 2) 


Defining the model

We'll define the model with the MultiOutputRegressor class of sklearn. As an estimator, we'll implement GradientBoostingRegressor with default parameters and then we'll include the estimator into the MultiOutputRegressor class. You can check the parameters of the model by the print command.

gbr = GradientBoostingRegressor()
model = MultiOutputRegressor(estimator=gbr)
print(model)
 
Now, we can fit the model with train data and check the training score.

model.fit(xtrain, ytrain)
score = model.score(xtrain, ytrain)
print("Training score:", score)
Training score: 0.9952671502749106


Predicting and visualizing the result 

We'll predict the test data with a trained model and check the MSE rate for both y1 and y2 outputs.

ypred = model.predict(xtest)
print("y1 MSE:%.4f" % mean_squared_error(ytest[:,0], ypred[:,0]))
y1 MSE:10.9138 
print("y2 MSE:%.4f" % mean_squared_error(ytest[:,1], ypred[:,1]))
y2 MSE:10.8929 
 
Finally, we'll visualize the results in the plot and check them visually.

x_ax = range(len(xtest))
plt.plot(x_ax, ytest[:,0], label="y1-test", color='c')
plt.plot(x_ax, ypred[:,0], label="y1-pred", color='b')
plt.plot(x_ax, ytest[:,1], label="y2-test", color='m')
plt.plot(x_ax, ypred[:,1], label="y2-pred", color='r')
plt.legend()
plt.show()

   In this tutorial, we've briefly learned how to MultiOutputRegressor class in Python. We've trained the multioutput dataset and predicted test data. 


Source code listing 
 
from numpy import array, hstack, math
from numpy.random import uniform
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.multioutput import MultiOutputRegressor
 
def create_data(n):
 x1=array([math.sin(i)*(i/10)+uniform(-5,5) for i in range(n)]).reshape(n,1)
 x2=array([math.cos(i)*(i/10)+uniform(-9,5) for i in range(n)]).reshape(n,1)
 x3=array([(i/50)+uniform(-10,10) for i in range(n)]).reshape(n,1)

 y1 = [x1[i]+x2[i]+x3[i]+uniform(-1,4)+15 for i in range(n)]
 y2 = [x1[i]-x2[i]-x3[i]-uniform(-4,2)-10 for i in range(n)]
 X = hstack((x1, x2, x3))
 Y = hstack((y1, y2))
 return X, Y

n = 300
X, Y = create_data(n)
 
f = plt.figure()
f.add_subplot(1,2,1)
plt.title("Xs input data")
plt.plot(X)
plt.xlabel("Samples")
f.add_subplot(1,2,2)
plt.title("Ys output data")
plt.plot(Y)
plt.xlabel("Samples")
plt.show()
 
xtrain, xtest, ytrain, ytest=train_test_split(X, Y, test_size=0.15)
print("xtrain:", xtrain.shape, "ytrian:", ytrain.shape)
print("xtest:", xtest.shape, "ytest:", ytest.shape)

gbr = GradientBoostingRegressor()
model = MultiOutputRegressor(estimator=gbr)
print(model)

model.fit(xtrain, ytrain)
score = model.score(xtrain, ytrain)
print("Training score:", score)

ypred = model.predict(xtest)
print("y1 MSE:%.4f" % mean_squared_error(ytest[:,0], ypred[:,0]))
print("y2 MSE:%.4f" % mean_squared_error(ytest[:,1], ypred[:,1]))

x_ax = range(len(xtest))
plt.plot(x_ax, ytest[:,0], label="y1-test", color='c')
plt.plot(x_ax, ypred[:,0], label="y1-pred", color='b')
plt.plot(x_ax, ytest[:,1], label="y2-test", color='m')
plt.plot(x_ax, ypred[:,1], label="y2-pred", color='r')
plt.legend()
plt.show() 


No comments:

Post a Comment