jeudi 31 mars 2016

Non-linear pipelining of generators in Python

I would like to write a data processing toolbox following the pipeline pattern.

Because two elements Process1 and Process2 of the pipeline could make use of the same data from the DataGenerator (non-linear pipeline), I created an element DataProcessor consuming the generator created in DataGenerator and executing the different processes.

I am not really happy with that because some subsequent element only needing data processed by one of Process1 or Process2 should wait for both to be executed.

I am not an expert in design pattern, so maybe I'm using the wrong pattern for my task.

Any suggestion?

class DataGenerator:
    """
    Dummy data generator
    """

    def __init__(self, size):
        self.size = size

    def execute(self):
        for i in range(1000):
            yield numpy.random.random((self.size))


class Process1:
    """
    Some process
    """

    def __init__(self, config):
        self.config = config

    # not used
    def execute(self, frames):
        for frame in frames:
            yield self.process(frame)

    def process(self, frame):
        # process data
        return processed_frame


class Process2:
    """
    Some process
    """

    def __init__(self, config):
        self.config = config

    # not used        
    def execute(self, frames):
        for frame in frames:
            yield self.process(frame)

    def process(self, frame):
        # process data
        return processed_frame


class DataProcessor:
    """
    Execute all processes
    """

    processes = []

    def add_process(self, process_instance):
        self.processes.append(process_instance)

    def execute(self, frames):
        for frame in frames:
            processed_frame = []
            for p in self.processes:
                processed_frame.append(p.process(frame))
            yield processed_frame

Aucun commentaire:

Enregistrer un commentaire