design-patterns: Design Pattern For Feature Extraction [Python]

dimanche 1 mars 2015

Design Pattern For Feature Extraction [Python]

I am writing a machine learning classifier using Python's scikit-learn library (using Python 2.7.9).

I am looking for a "design pattern" to extract a feature vector from an object, with these traits:

Can easily add features (only adding code, not changing anything written).

Can easily choose a subset of features to use (something like a list of names of features that one can choose from), to test performance of the system according to the subset used.

(Optional) Knows the number of features before creating a feature vector.

Proposed use:


def get_feature_vectors(objects):
    extractor = FeatureExtraction()
    feature_vectors = np.array([])
    for object in objects:
        feature_vector = extractor.extract(object)
        feature_vectors = np.vstack([feature_vectors, feature_vector])
    return feature_vectors

I've come up with this naive implementation:


class FeatureExtraction(object):
   def __init__(self):
       self.__features = []

   def extract(self, object):
       self.__analyze_object(object)
       return np.array(self.__features)

   def __analyze_object(self, object):
       self.__extract_feature1(object)
       self.__extract_feature2(object)
       self.__extract_feature3(object)
       self.__extract_feature4(object)
       # ...

   def __extract_feature1(self, object):
       feature = ... # extract the first feature from object
       self.__features.append(feature)

This implementation only has the first trait I'm looking for, and is a bit clumsy. I'm guessing there's a more elegant solution using design-pattern-like OOP tricks.

design-patterns

dimanche 1 mars 2015

Design Pattern For Feature Extraction [Python]

Aucun commentaire:

Enregistrer un commentaire