dimanche 1 mars 2015

Design Pattern For Feature Extraction [Python]

I am writing a machine learning classifier using Python's scikit-learn library (using Python 2.7.9).


I am looking for a "design pattern" to extract a feature vector from an object, with these traits:



  • Can easily add features (only adding code, not changing anything written).

  • Can easily choose a subset of features to use (something like a list of names of features that one can choose from), to test performance of the system according to the subset used.

  • (Optional) Knows the number of features before creating a feature vector.


Proposed use:



def get_feature_vectors(objects):
extractor = FeatureExtraction()
feature_vectors = np.array([])
for object in objects:
feature_vector = extractor.extract(object)
feature_vectors = np.vstack([feature_vectors, feature_vector])
return feature_vectors


I've come up with this naive implementation:



class FeatureExtraction(object):
def __init__(self):
self.__features = []

def extract(self, object):
self.__analyze_object(object)
return np.array(self.__features)

def __analyze_object(self, object):
self.__extract_feature1(object)
self.__extract_feature2(object)
self.__extract_feature3(object)
self.__extract_feature4(object)
# ...

def __extract_feature1(self, object):
feature = ... # extract the first feature from object
self.__features.append(feature)


This implementation only has the first trait I'm looking for, and is a bit clumsy. I'm guessing there's a more elegant solution using design-pattern-like OOP tricks.


Aucun commentaire:

Enregistrer un commentaire