• Passio Consulting

Improving classification supported by ensemble technique

One of the hardest tasks while developing a predictive model is the selection of the right algorithm, in my predictive data mining lectures I came across with ensemble technique and it had a major role on the developed project.

Ensemble is based on the psychological phenomena from the wisdom of the crowd, where instead of having a single answer, we will have many different answers, combine them will, in the end, create a better solution.

The basic idea from ensemble classifiers is the ability to learn from a set of classifiers simultaneously, from several models, allowing them to vote simultaneously. The final goal is improving the model effectiveness and accuracy that performs better and on the other hand, reduce the general error. There are several techniques for voting:

1. Majority Vote

2. Bagging supported with bootstrap

3. Random forest

4. Boosting

5. Stacking

To have this diversity is important, the combination of good models and poor models for the final accuracy classification, that way we area enabling diversity of our final classification accuracy. This combination feature is an important characteristic of the ensemble technique.

Ensemble is defined as a machine learning technique, for a classification purpose, that from combining several classification models results produces the optimal classification model (ex. aggregate several DT output).

Of course, like everything, there are pros and cons:

Pros: we have generally an improvement on the accuracy of the predictive model

Cons: model that results from the combination of several models is harder to interpreter, due to the feeding by several classifiers. On the other hand, understand each model outcome individually it`s easier.

by Pedro Veiga

Data Analyst Consultant @ Passio Consulting