design-patterns: Import Bootstrap aggregating (bagging) Method to Spark MLlib: Design

mardi 23 décembre 2014

Import Bootstrap aggregating (bagging) Method to Spark MLlib: Design

In case you don't understand what Bootstrap is: http://ift.tt/1mJLrsY

One word: a classification model that classify things with other model as sub-classifier

I'm writing a Bagging Classification Model in Spark Classification Model Style. So I design my class like this as every model in mllib:


class ModifiedBaggingsModel extends ClassificationModel {
    //using Array[ClassificationModel] to perform predict()
}


class ModifiedBaggings {
    override def run(TrainingData:RDD[LabeledPoint]):ModifiedBaggingsModel = {
        //for every kind of ClassificationModel chosen to use this function should be same except the following line:
        //PROBLEM OCCURS HERE: use SubClassifiers.train(TrainingData) to get Array[ClassificationModel]
}

Ojbect ModifiedBaggings {
    def train(data) = new ModifiedBaggings.run(data)
}

Problem happens in run(): for every classifier in Spark MLlib, the way to declare one is through Model.train() or new Model.run(). using SVM as sub-classifier I should write SVMwithSGD.train(), for Naive Bayes I should write NaiveBayes.train() . If I want to adapt this to 100 kind of sub-classifier I need to write 100 different run() because the little difference in ONE line <= HORRIBLE & STUPID, oop could have better way.

Is there a better way to design/abstract the run()?

design-patterns

mardi 23 décembre 2014

Import Bootstrap aggregating (bagging) Method to Spark MLlib: Design

Aucun commentaire:

Enregistrer un commentaire