PySpark Decision Tree Classification Example

         PySpark MLlib library provides a DecisionTreeClassifier model to implement classification with decision tree method. A decision tree method is one of the well known and powerful supervised machine learning algorithms that can be used for classification and regression tasks. It is a tree-like, top-down flow learning method to extract rules from the training data. The branches of the tree are based on certain decision outcomes.

    In this tutorial, we'll briefly learn how to fit and classify data by using PySpark DecisionTreeClassifier. The tutorial covers:

  1. Preparing the data
  2. Prediction and accuracy check
  3. Source code listing
   We'll start by loading the required libraries for this tutorial.

MLlib Gradient-boosted Tree Regression Example with PySpark

         PySpark MLlib library provides a GBTRegressor model to implement gradient-boosted tree regression method. Gradient tree boosting is an ensemble of decision trees model to solve regression and classification tasks in machine learning. Improving the weak learners by different set of train data is the main concept of this model. 

    In this tutorial, we'll briefly learn how to fit and predict regression data by using PySpark GBTRegressor in Python. The tutorial covers:

  1. Preparing the data
  2. Prediction and accuracy check
  3. Visualizing the results
  4. Source code listing
   We'll start by loading the required libraries for this tutorial.

MLLib Linear Regression Example with PySpark

         Apache Spark is an analytic engine to process large scale dataset by using tools such as Spark SQL, MLLib and others. PySpark is a Python API to execute Spark applications in Python.

    In this tutorial, we'll briefly learn how to fit and predict regression data by using PySpark and MLLib Linear Regression model. The tutorial covers:

  1. Preparing the data
  2. Fitting and accuracy check
  3. Visualizing the results
  4. Source code listing
   We'll start by loading the required libraries for this tutorial.

SelectFromModel Feature Selection Example in Python

     Scikit-learn API provides SelectFromModel class for extracting best features of given dataset according to the importance of weights. The SelectFromModel is a meta-estimator that determines the weight importance by comparing to the given threshold value. 

    In this tutorial, we'll briefly learn how to select best features of regression data by using the SelectFromModel in Python. The tutorial covers:

  1. SelectFromModel for regression data
  2. Source code listing
   We'll start by loading the required libraries and functions.

Recursive Feature Elimination (RFE) Example in Python

     Extracting influential features of dataset is essential part of data preparation to train model in machine learning. Scikit-learn API provides RFE class that ranks features by recursive feature elimination to select best features. The method recursively eliminates the least important features based on specific attributes taken by estimator.

    In this tutorial, we'll briefly learn how to select best features of dataset by using the RFE in Python. The tutorial covers:

  1. RFE Example with Boston dataset
  2. Source code listing
   We'll start by loading the required libraries and functions.

Reading Texts on Image by Using Tesseract and PyOCR in Python

    Optical Character Recognition (OCR) is a conversion of typed or handwritten letters on an image into the machine encoded texts.  There are several methods and libraries that can be used to read text on image.

    In this tutorial, we'll briefly learn how to read letters in an image by using the Tesseract and PyOCR in Python. The tutorial covers:

  1. Installing Tesseract and PyOCR
  2. Reading texts on image
  3. Source code listing
   Let's get started.