I am writing a machine learning classifier using Python's scikit-learn library (using Python 2.7.9).
I am looking for a "design pattern" to extract a feature vector from an object, with these traits:
- Can easily add features (only adding code, not changing anything written).
- Can easily choose a subset of features to use (something like a list of names of features that one can choose from), to test performance of the system according to the subset used.
- (Optional) Knows the number of features before creating a feature vector.
Proposed use:
def get_feature_vectors(objects):
extractor = FeatureExtraction()
feature_vectors = np.array([])
for object in objects:
feature_vector = extractor.extract(object)
feature_vectors = np.vstack([feature_vectors, feature_vector])
return feature_vectors
I've come up with this naive implementation:
class FeatureExtraction(object):
def __init__(self):
self.__features = []
def extract(self, object):
self.__analyze_object(object)
return np.array(self.__features)
def __analyze_object(self, object):
self.__extract_feature1(object)
self.__extract_feature2(object)
self.__extract_feature3(object)
self.__extract_feature4(object)
# ...
def __extract_feature1(self, object):
feature = ... # extract the first feature from object
self.__features.append(feature)
This implementation only has the first trait I'm looking for, and is a bit clumsy. I'm guessing there's a more elegant solution using design-pattern-like OOP tricks.
Aucun commentaire:
Enregistrer un commentaire