In case you don't understand what Bootstrap is: http://ift.tt/1mJLrsY
One word: a classification model that classify things with other model as sub-classifier
I'm writing a Bagging Classification Model in Spark Classification Model Style. So I design my class like this as every model in mllib:
class ModifiedBaggingsModel extends ClassificationModel {
//using Array[ClassificationModel] to perform predict()
}
class ModifiedBaggings {
override def run(TrainingData:RDD[LabeledPoint]):ModifiedBaggingsModel = {
//for every kind of ClassificationModel chosen to use this function should be same except the following line:
//PROBLEM OCCURS HERE: use SubClassifiers.train(TrainingData) to get Array[ClassificationModel]
}
Ojbect ModifiedBaggings {
def train(data) = new ModifiedBaggings.run(data)
}
Problem happens in run(): for every classifier in Spark MLlib, the way to declare one is through Model.train() or new Model.run(). using SVM as sub-classifier I should write SVMwithSGD.train(), for Naive Bayes I should write NaiveBayes.train() . If I want to adapt this to 100 kind of sub-classifier I need to write 100 different run() because the little difference in ONE line <= HORRIBLE & STUPID, oop could have better way.
Is there a better way to design/abstract the run()?
Aucun commentaire:
Enregistrer un commentaire