ML Data Sets

To see how well a model will generalize to new cases, the data is split into a training set, a test set and a validation set.

It is common to use 80% of the data for training and hold out 20% for testing.

In cross-validation, a validation set is randomly held out from the training set during training.

The No Free Lunch Theorem states that there is no model that is guaranteed to work best on a given dataset. The only way to know for sure is to evaluate them all.


References