Multi-output Regression Example with MultiOutputRegressor in Python

   We studied many methods of multioutput regression analysis with Keras in previous posts. In this tutorial, we'll learn how to fit and predict multioutput regression data with scikit-learn's MultiOutputRegressor class. Multioutput data contains more than one target labels for a given x input data. The tutorial covers:
  1. Preparing the data
  2. Defining the model
  3. Predicting and visualizing the result
  4. Source code listing
We'll start by loading the required libraries for this tutorial.

from numpy import array, hstack, math
from numpy.random import uniform
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.multioutput import MultiOutputRegressor

Preparing the data

First, we 'll create a multi-output dataset for this tutorial. It is randomly generated data with some rules below. There are three inputs and two outputs in this dataset. We'll plot the generated data to check it visually.

def create_data(n):
 x1=array([math.sin(i)*(i/10)+uniform(-5,5) for i in range(n)]).reshape(n,1)
 x2=array([math.cos(i)*(i/10)+uniform(-9,5) for i in range(n)]).reshape(n,1)
 x3=array([(i/50)+uniform(-10,10) for i in range(n)]).reshape(n,1)

 y1 = [x1[i]+x2[i]+x3[i]+uniform(-1,4)+15 for i in range(n)]
 y2 = [x1[i]-x2[i]-x3[i]-uniform(-4,2)-10 for i in range(n)]
 X = hstack((x1, x2, x3))
 Y = hstack((y1, y2))
 return X, Y

n = 300
X, Y = create_data(n)
 
f = plt.figure()
f.add_subplot(1,2,1)
plt.title("Xs input data")
plt.plot(X)
plt.xlabel("Samples")
f.add_subplot(1,2,2)
plt.title("Ys output data")
plt.plot(Y)
plt.xlabel("Samples")
plt.show()

Next, we'll split the dataset into the train and test parts and check the data shapes.

xtrain, xtest, ytrain, ytest=train_test_split(X, Y, test_size=0.15)
print("xtrain:", xtrain.shape, "ytrian:", ytrain.shape)
xtrain: (255, 3) ytrian: (255, 2) 
print("xtest:", xtest.shape, "ytest:", ytest.shape)
xtest: (45, 3) ytest: (45, 2) 


Defining the model

We'll define the model with the MultiOutputRegressor class of sklearn. As an estimator, we'll implement GradientBoostingRegressor with default parameters and then we'll include the estimator into the MultiOutputRegressor class. You can check the parameters of the model by the print command.

gbr = GradientBoostingRegressor()
model = MultiOutputRegressor(estimator=gbr)
print(model)
 
Now, we can fit the model with train data and check the training score.

model.fit(xtrain, ytrain)
score = model.score(xtrain, ytrain)
print("Training score:", score)
Training score: 0.9952671502749106


Predicting and visualizing the result 

We'll predict the test data with a trained model and check the MSE rate for both y1 and y2 outputs.

ypred = model.predict(xtest)
print("y1 MSE:%.4f" % mean_squared_error(ytest[:,0], ypred[:,0]))
y1 MSE:10.9138 
print("y2 MSE:%.4f" % mean_squared_error(ytest[:,1], ypred[:,1]))
y2 MSE:10.8929 
 
Finally, we'll visualize the results in the plot and check them visually.

x_ax = range(len(xtest))
plt.plot(x_ax, ytest[:,0], label="y1-test", color='c')
plt.plot(x_ax, ypred[:,0], label="y1-pred", color='b')
plt.plot(x_ax, ytest[:,1], label="y2-test", color='m')
plt.plot(x_ax, ypred[:,1], label="y2-pred", color='r')
plt.legend()
plt.show()

   In this tutorial, we've briefly learned how to MultiOutputRegressor class in Python. We've trained the multioutput dataset and predicted test data. 


Source code listing 
 
from numpy import array, hstack, math
from numpy.random import uniform
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.multioutput import MultiOutputRegressor
 
def create_data(n):
 x1=array([math.sin(i)*(i/10)+uniform(-5,5) for i in range(n)]).reshape(n,1)
 x2=array([math.cos(i)*(i/10)+uniform(-9,5) for i in range(n)]).reshape(n,1)
 x3=array([(i/50)+uniform(-10,10) for i in range(n)]).reshape(n,1)

 y1 = [x1[i]+x2[i]+x3[i]+uniform(-1,4)+15 for i in range(n)]
 y2 = [x1[i]-x2[i]-x3[i]-uniform(-4,2)-10 for i in range(n)]
 X = hstack((x1, x2, x3))
 Y = hstack((y1, y2))
 return X, Y

n = 300
X, Y = create_data(n)
 
f = plt.figure()
f.add_subplot(1,2,1)
plt.title("Xs input data")
plt.plot(X)
plt.xlabel("Samples")
f.add_subplot(1,2,2)
plt.title("Ys output data")
plt.plot(Y)
plt.xlabel("Samples")
plt.show()
 
xtrain, xtest, ytrain, ytest=train_test_split(X, Y, test_size=0.15)
print("xtrain:", xtrain.shape, "ytrian:", ytrain.shape)
print("xtest:", xtest.shape, "ytest:", ytest.shape)

gbr = GradientBoostingRegressor()
model = MultiOutputRegressor(estimator=gbr)
print(model)

model.fit(xtrain, ytrain)
score = model.score(xtrain, ytrain)
print("Training score:", score)

ypred = model.predict(xtest)
print("y1 MSE:%.4f" % mean_squared_error(ytest[:,0], ypred[:,0]))
print("y2 MSE:%.4f" % mean_squared_error(ytest[:,1], ypred[:,1]))

x_ax = range(len(xtest))
plt.plot(x_ax, ytest[:,0], label="y1-test", color='c')
plt.plot(x_ax, ypred[:,0], label="y1-pred", color='b')
plt.plot(x_ax, ytest[:,1], label="y2-test", color='m')
plt.plot(x_ax, ypred[:,1], label="y2-pred", color='r')
plt.legend()
plt.show() 


8 comments:

  1. hey, It's very good read. However, more detailed explanation of topic would have been great.

    The Multi-Target Regression focused here is by taking all targets together while fitting the model and during evaluation. Do you think taking one Target at a time would fetch more better results? I wonder why this idea is not taken into account. Appreciate your comments. Thanks again.

    ReplyDelete
    Replies
    1. You are welcome! Yes, you can do it. But it becomes simple regression model that fits and predicts each target in multiple steps. Here I wanted to show multi-output prediction case in a single training and prediction.

      Delete
    2. I think the Sklearn MultiObjectRegressor works in the same way as Sudheer mentioned.

      Delete
  2. Thank you! It is possible to do a feature importance as well? Multiple feature importance or it needs to be done separately?

    ReplyDelete
    Replies
    1. You are welcome! Yes, you need to extract the important features in your data preparation section before training your model on it.

      Delete
  3. Hi, thanks for sharing this interesting topic. I wonder what is the mathematic behind mutioutputregressor? Essentially you can plug it to any regression model, right?

    ReplyDelete
  4. I am also keen to know the math behind multioutput regressor. It is true you can plug it in to any model. It seems that it fits one model to a set of independent variables and one target variable at a time.

    ReplyDelete