LightGBM Classification Example in Python

     LightGBM is an open-source gradient boosting framework that based on tree learning algorithm and designed to process data faster and provide better accuracy. It can handle large datasets with lower memory usage and supports distributed learning. You can find all the information about the API in this link.

    LightGBM can be used for regression, classification, ranking and other machine learning tasks. In this tutorial, you'll briefly learn how to fit and predict classification data by using LightGBM in Python. The tutorial covers:

  1. Preparing the data
  2. Building the model
  3. Prediction and accuracy check
  4. Source code listing
   We'll start by loading the required libraries for this tutorial.

import lightgbm as lgb
from sklearn.datasets import load_iris
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
from pandas import DataFrame
from numpy import argmax 
 

If you've not installed LightGBM yet, you can install it via pip in Python.

 
pip install lightgbm 
 

 

Preparing the data

   We use Iris dataset as a target classification data and we can easily load it from sklearn.datasets module. To keep the feature column names, I'll use pandas DataFrame type for feature data. Then, we'll splint data into train and test parts.

 
iris = load_iris()
x, y = iris.data, iris.target

x_df = DataFrame(x, columns= iris.feature_names)
x_train, x_test, y_train, y_test = train_test_split(x_df, y, test_size=0.15)
 
 
 
 
Building the model 
 
First, we'll define regression model parameters as shown below. You can change values according to your evaluation targets. 
 
  
# defining parameters 
params = {
    'boosting': 'gbdt',
    'objective': 'multiclass',
    'num_leaves': 10,
    'num_class': 3
 
 
Next, we'll load the train and test data into the LightGBM dataset object. Below code shows how to load train and evaluation test data.  


# laoding data
lgb_train = lgb.Dataset(x_train, y_train)
lgb_eval = lgb.Dataset(x_test, y_test, reference=lgb_train)
 
 
Now, we can train the model with defined variables above. 


# fitting the model
model = lgb.train(params,
                 train_set=lgb_train,
                 valid_sets=lgb_eval,
                 early_stopping_rounds=30)
  
 

Prediction and Accuracy Check

   After training the model, we can predict test data and check prediction accuracy. We'll find the classification report and confusion matrix. 
 

# prediction
y_pred = model.predict(x_test)

y_pred = argmax(y_pred, axis=1)
cr = classification_report(y_test, y_pred)
cm = confusion_matrix(y_test, y_pred)
 
print(cr)
print(cm) 
 
              precision    recall  f1-score   support

0 1.00 1.00 1.00 8
1 1.00 0.88 0.93 8
2 0.88 1.00 0.93 7

accuracy 0.96 23
macro avg 0.96 0.96 0.96 23
weighted avg 0.96 0.96 0.96 23

[[8 0 0]
[0 7 1]
[0 0 7]] 
 
   
     
 
LightGBM provides plot_importance() method to plot feature importance. Below code shows how to plot it.
 
 
# plotting feature importance
lgb.plot_importance(model, height=.5)
  
 
 

   In this tutorial, we've briefly learned how to fit and predict classification data by using LightGBM classification method in Python. The full source code is listed below.


Source code listing
 
 
import lightgbm as lgb
from sklearn.datasets import load_iris
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
from pandas import DataFrame
from numpy import argmax


iris = load_iris()
x, y = iris.data, iris.target

x_df = DataFrame(x, columns= iris.feature_names)
x_train, x_test, y_train, y_test = train_test_split(x_df, y, test_size=0.15)

# defining parameters 
params = {
    'boosting': 'gbdt',
    'objective': 'multiclass',
    'num_leaves': 10,
    'num_class': 3
}

# laoding data
lgb_train = lgb.Dataset(x_train, y_train)
lgb_eval = lgb.Dataset(x_test, y_test, reference=lgb_train)

# fitting the model
model = lgb.train(params,
                 train_set=lgb_train,
                 valid_sets=lgb_eval,
                 early_stopping_rounds=30)

# prediction
y_pred = model.predict(x_test)

y_pred = argmax(y_pred, axis=1)
cr = classification_report(y_test, y_pred)
cm = confusion_matrix(y_test, y_pred)

print(cr)
print(cm)

lgb.plot_importance(model, height=.5)
  

 
References:



No comments:

Post a Comment