mardi 23 décembre 2014

Import Bootstrap aggregating (bagging) Method to Spark MLlib: Design

In case you don't understand what Bootstrap is: http://ift.tt/1mJLrsY


One word: a classification model that classify things with other model as sub-classifier


I'm writing a Bagging Classification Model in Spark Classification Model Style. So I design my class like this as every model in mllib:



class ModifiedBaggingsModel extends ClassificationModel {
//using Array[ClassificationModel] to perform predict()
}


class ModifiedBaggings {
override def run(TrainingData:RDD[LabeledPoint]):ModifiedBaggingsModel = {
//for every kind of ClassificationModel chosen to use this function should be same except the following line:
//PROBLEM OCCURS HERE: use SubClassifiers.train(TrainingData) to get Array[ClassificationModel]
}

Ojbect ModifiedBaggings {
def train(data) = new ModifiedBaggings.run(data)
}


Problem happens in run(): for every classifier in Spark MLlib, the way to declare one is through Model.train() or new Model.run(). using SVM as sub-classifier I should write SVMwithSGD.train(), for Naive Bayes I should write NaiveBayes.train() . If I want to adapt this to 100 kind of sub-classifier I need to write 100 different run() because the little difference in ONE line <= HORRIBLE & STUPID, oop could have better way.


Is there a better way to design/abstract the run()?


Aucun commentaire:

Enregistrer un commentaire