How to split a data set

Data Science
Author

Benjamin DK Luong

Published

August 25, 2019

We should use Stratified split because we usually have unbalanced target classes. We want the model to see all target classes.

Stratified K-Fold method divides the data into k blocks, and it makes sure that each set contains approximately the same percentage of samples of each target class as the complete set.

Shuffle Split method randomly selects observations, and it makes sure that each set contains approximately the same percentage of samples of each target class as the complete set.