1 min readNov 11, 2018

Great question! I’ve typically observed that if a feature has very different trends across one set of train/validation data, it will have a different trend across a different set as well. The noisiness is inherent to the feature and will manifest in whichever split you look at it. I personally recommend time based splits as that takes care of model not deteriorating in future (due to changing nature of features with time). For very large datasets, splitting randomly might give you datasets so similar that you’ll never any difference in train/validation trends.

You can also try calculating average of trend-correlation across different folds and use the average of it. At the end of day, feature selection should be based on the one that gives best results on a set-aside validation or local test data.

Abhay Pawar
Abhay Pawar

Written by Abhay Pawar

ML @ StitchFix, Instacart. Columbia and IIT Madras alum

No responses yet