machine learning pt.6
Machine Learning pt.6
- Decision Tree
- non-linear model
- Through the trained model, this model predicts the result values.
- Using step-by-step questions, the groups are divided by the similar groups hat have the similar size.
Information Gain
: How many information you can get as you also ask many questions?- Impurity metrics: Entropy & Gini coefficient (
purity = 100%
meansimpurity = 0
, so stop the modeling)
Following that the cost is lowering, find the questions that lower the entropy, but specify the number of features.
According to the depth, the complexity of a model is decided. - recursive partitioning = repeatedly split the data because overfitting problem can occur
- summary:
- Decision Tree is simple when the depth of question is low, vice versa. However, the degree of training for the training data is high but for the test data is not.
- Decision Tree is explainable that the reason is obvious about the result of prediction (how it classifies the data using which criteria)
- Decision Tree can be used as the Regressional model. Variance is the error of
decision tree regressor. - the value of entropy is large when the labels are even.
- Ensemble
- Bagging(Bootstrap Aggregating)
Bootstrap
meansextracting the copy data from the original data
.Aggregating
meansextracting the results from the different models with the majority vote
.
example: The results from three different models are the following: - A: X with 0.4 probability, B: O with 0.9 probability, C: X with 0.3 probability
- Average probability is about 0.5. The result ratio is 2:1 = X:O.
- Q: Which result of models is correct?
Like above an example, you can choose soft or hard voting method in Bagging.
When extracting the copy data, extract it with sampling with replacement (random sampling).
- Random Forest: combining various decision tree models with reinforcing the generalization by training the models independently and voting the one of the results, but underfitting problem often occurs.
- Boosting
Adaboost / GBM / XGBoost / Light GBM
- the way to train is the same as Bagging, but the process of training is different.
one model is trained first and then after the results of modeling are identified, imposing the weights on the wrong prediction and start training the next model (repeating these process) - As influenced by the previous model training, the models are not independent.
The models of boosting are different from the performance. The performance of Light GBM is the fastest model in the training.
- the way to train is the same as Bagging, but the process of training is different.