The battle for Greece’s next top model may have been over for this year…😂
However, the battle’s still on in almost every machine learning task a data scientist comes across in their daily -and nightly- lives.
The long standing questions in these cases are:
- Which model is the best?
- Can I know that in advance?
- If not, how am I supposed to try every possible different model and decide afterwards, without creating the messiest 🍝 spaghetti code in the whole of Italy?
- And most importantly, should I eat more cake?…err sorry, that’s always a yes!
Fortunately, the machine learning community can offer some real engineering gems for deploying your machine learning models easily and with the least possible pain – my favourite ones being keras/tensorflow and scikit-learn (sklearn). Today, I’ll focus on scikit-learn only, which offers great implementations for a large set of popular “traditional” machine learning classifiers (aka non deep-learning based).
Yet, if I had to try every possible classifier integrated within scikit-learn, would I be able to do that in a coherent/non-redundant/straightforward way?…🤔 i.e. write my code once and seamlessly run it for every possible classifier? (unlike Java…)
Today’s our lucky day as the answer to that is: (you guessed it) yes!
The very first step for doing that is defining a class -let’s call it SklearnWrapper– that can invoke standard built-in methods, such as fit, predict, etc., seamlessly across multiple sklearn classifiers.
This class would look like this:
class SklearnWrapper(object):
def __init__(self, clf, params={}):
self.clf = clf(**params)
def fit(self, x, y):
return self.clf.fit(x, y)
def predict(self, x):
return self.clf.predict(x)
def predict_proba(self, x):
return self.clf.predict_proba(x)
def evaluate(self, x, y, verbose=0):
self.clf.evaluate(x, y, verbose)
def feature_importances(self, x, y):
return (self.clf.fit(x, y).feature_importances_)
def get_coef_(self):
return self.clf.coef_
The critical part in this case is passing a clf and params argument that are then instantiated into a specific sklearn object by calling the clf(**params) constructor.
For example, you can create a new Random Forest classifier like that:
from sklearn.ensemble import RandomForestClassifier
# Random Forest Classifier parameters
rf_params = {
'n_jobs': -1,
'n_estimators': 100,
'max_features' : 'auto',
'max_depth': 15,
'min_samples_leaf': 2,
'min_samples_split': 4,
'warm_start': False,
'verbose': 0
}
rf_model = SklearnWrapper(clf=eval("RandomForestClassifier"), params=rf_params)
Note that the clf argument’s input value is a string representing a classifier that is eval‘ed and thus interpreted into the respective sklearn classifier object. This means that the string passed onto clf has to be a valid sklearn module name implementing a machine learning classifier! The clf variable/argument in that case is merely a placeholder name for a real sklearn module (that needs to be explicitly imported), while params are a dictionary of classifier-specific input parameters.
Similarly, we can create a Support Vector Classifier model as such:
from sklearn.svm import SVC
# Support Vector Classifier parameters
svc_params = {
'C': 0.01,
'kernel': 'linear',
'gamma': 'auto',
'probability': True,
'shrinking': True
}
svc_model = SklearnWrapper(clf=eval("SVC"), params=svc_params)
So far so good… We’re now just missing hooking up our model definitions with data objects (train/test) to actually train our models and make predictions. In a next post, I’ll talk about creating a higher level class that implements further functionality on top of SklearnWrapper to provide a standard interface for building the model, training, evaluating and extracting predictions. 🙂

Leave a comment