SelectFromModel Feature Selection Example in Python

     Scikit-learn API provides SelectFromModel class for extracting best features of given dataset according to the importance of weights. The SelectFromModel is a meta-estimator that determines the weight importance by comparing to the given threshold value. 

    In this tutorial, we'll briefly learn how to select best features of regression data by using the SelectFromModel in Python. The tutorial covers:

  1. SelectFromModel for regression data
  2. Source code listing
   We'll start by loading the required libraries and functions.

from sklearn.feature_selection import SelectFromModel
from sklearn.ensemble import AdaBoostRegressor
from sklearn.datasets import load_boston
from numpy import array 
 


SelectFromModel for regression data

   We use Boston housing price dataset in this tutorial. We'll load the dataset and check the dimensions of feature data. 

boston = load_boston()
x = boston.data
y = boston.target

print("Feature data dimension: ", x.shape)
 
Feature data dimension:  (506, 13) 
 

SelectFromModel requires an estimator and we can use AdaBoostRegressor class for this purpose. An estimator model must have attributes to provide the indexes of selected data like 'get_support()' function. We'll define model by default value which applies median method to set the threshold value and fit the model on x and y data.

estimator = AdaBoostRegressor(random_state=0, n_estimators=50)
selector = SelectFromModel(estimator)
selector = selector.fit(x, y) 
 

After the training, we'll get status of each feature data. To identify the selected features we can use get_support() function and filter out them from the features list. Finally, we'll get selected features names and respective data from the x data.

status = selector.get_support()
print("Selection status: ", status) 
 
Selection status:  [False False False False False  True False  True False False False False
True]
 

features = array(boston.feature_names)
print("All features:")
print(features) 
  
print("Selected features:")
print(features[filter])
selector.transform(x) 
 

All features:
['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO'
'B' 'LSTAT']
 
Selected features:
['RM' 'DIS' 'LSTAT']
array([[6.575 , 4.09  , 4.98  ],
[6.421 , 4.9671, 9.14 ],
[7.185 , 4.9671, 4.03 ],
...,
[6.976 , 2.1675, 5.64 ],
[6.794 , 2.3889, 6.48 ],
[6.03 , 2.505 , 7.88 ]]) 
   
 
   In this tutorial, we've briefly learned how to select important features in a dataset by using sklearn SelectFromModel class in python. The full source code is listed below.


Source code listing
 
 
from sklearn.feature_selection import SelectFromModel
from sklearn.ensemble import AdaBoostRegressor
from sklearn.datasets import load_boston
from numpy import array

boston = load_boston()
x = boston.data
y = boston.target

print("Feature data dimension: ", x.shape)

estimator = AdaBoostRegressor(random_state=0, n_estimators=50)
selector = SelectFromModel(estimator)
selector = selector.fit(x, y)

status = selector.get_support()
print("Selection status: ", status)

features = array(boston.feature_names)
print("All features:")
print(features)

print("Selected features:")
print(features[status])
selector.transform(x) 
   
 


References:

No comments:

Post a Comment