2 min readNov 30, 2018

Hi Jin,

Very good points raised. I’ll talk about your second point first as that’s more important. Completely agree with what you said. Fact that test AUC doesn’t help with selecting the best LB AUC model is a problem here. And that’s why I say in the post:

It’s also interesting and concerning that test AUC doesn’t change as much as LB AUC. Getting your validation strategy right such that local test AUC follows LB AUC is also important.

I’ve seen cases where dropping noisy features does lead to improvement in local test as well. This dataset is probably not the right example to demonstrate that or I didn’t put in much effort to get the validation strategy right (this is not the competition I competed in).

Also, in real life I always go for out-of-time validation as that helps with gauging if the model will deteriorate in future. And you would be checking for noisiness in terms of changes with time which is much more effective than random splitting for validation. Unfortunately, on kaggle this is not always the case.

Also, I wouldn’t ignore a model just because it’s local test AUC is not the best. In real life, I would be more inclined towards choosing a model with less noisy features and features I trust, even if it leads to slight degradation in validation AUC. On kaggle as well, I would select one of the model with such feature selection for final submission.

Regarding your first point, I don’t think this a really specific case. In majority of cases, especially when you are dealing with customer data, the distribution and trends don’t change much with time or with validation set. In fact, for finance companies like Home Credit, the distribution of feature values is so stable that one of the entities that are monitored for a model is each feature’s distribution. So, I would say feature distribution changing too much is a specific case and not other way around.

Abhay Pawar
Abhay Pawar

Written by Abhay Pawar

ML @ StitchFix, Instacart. Columbia and IIT Madras alum

No responses yet