## Pages

### Classification Example with an Extra-Trees Method in Python

Extremely Randomized Trees (or Extra-Trees) is an ensemble learning method. The method creates extra trees in sub-samples of datasets and applies majority voting to improve the predictivity of the classifier. By this approach, the method reduces the variance. The method applies a random thresholds for each features of sub-samples to obtain the best of the thresholds as a splitting rule.

In this tutorial, we'll briefly learn how to classify data by using Scikit-learn's ExtraTreesClassifier class in Python. The tutorial covers:
1. Preparing the data
2. Training the model
3. Predicting and accuracy check
4. Source code listing
5. Video tutorial

```from sklearn.ensemble import ExtraTreesClassifier
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.metrics import confusion_matrix
```

Preparing the data

In this tutorial, we'll use the Iris dataset as target data to classify. We'll define the x and y data parts.

```iris = load_iris()
x, y = iris.data, iris.target
```

Then, we'll split them into train and test parts. Here, we'll extract 15 percent of the dataset as test data.

```xtrain, xtest, ytrain, ytest=train_test_split(x, y, test_size=0.15)
```

Training the model

Next, we'll define the classifier by using the ExtraTreesClassifier class. We can set the estimator number, here I'll set 100 to the estimator's number.

```clf = ExtraTreesClassifier(n_estimators=100)
print(clf)

ExtraTreesClassifier(bootstrap=False, ccp_alpha=0.0, class_weight=None,
criterion='gini', max_depth=None, max_features='auto',
max_leaf_nodes=None, max_samples=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100,
n_jobs=None, oob_score=False, random_state=None, verbose=0,
warm_start=False) ```

Then, we'll fit the model on train data and check the model accuracy score.

```clf.fit(xtrain, ytrain)score = clf.score(xtrain, ytrain)
print("Score: ", score)

Score:  1.0
```

We can also apply a cross-validation method to the model and check the training accuracy.

```cv_scores = cross_val_score(clf, xtrain, ytrain, cv=5 )
print("CV average score: %.2f" % cv_scores.mean())

CV average score: 0.96```

Predicting and accuracy check

Now, we can predict the test data by using the trained model. After the prediction, we'll check the accuracy level by using the confusion matrix function.

```ypred = clf.predict(xtest)

cm = confusion_matrix(ytest, ypred)
print(cm)

[[5 0 0]
[0 4 0]
[0 0 6]]```

In this tutorial, we've briefly learned how to classify data by using Scikit-learn API's ExtraTreesClassifier class in Python. The full source code is listed below.

Source code listing

```from sklearn.ensemble import ExtraTreesClassifier
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.metrics import confusion_matrix

x, y = iris.data, iris.target
xtrain, xtest, ytrain, ytest=train_test_split(x, y, test_size=0.15)

clf = ExtraTreesClassifier(n_estimators=100)
print(clf)

clf.fit(xtrain, ytrain)
score = clf.score(xtrain, ytrain)
print("Score: ", score)

cv_scores = cross_val_score(clf, xtrain, ytrain, cv=5 )
print("CV average score: %.2f" % cv_scores.mean())

ypred = clf.predict(xtest)

cm = confusion_matrix(ytest, ypred)
print(cm)
```

Video tutorial

References: