vendredi 10 mai 2019

Design Pattern for Customizable Pipelines with Nonstandard Properties

I am trying to facilitate the creation of pipelines which pass dictionaries between each element in a pipeline for data manipulation.

I am able to create a standard pipeline where you start with a "module" that returns a dictionary, which is then passed to the next element in the pipeline to perform some sort of processing/manipulation of the dictionary, which then passes it to the next element, etc... The problem I face is that with each "pipe" in a pipeline, the keys in the dictionary passed to the pipe may not match the keys that the pipe is looking for.

For example:

def add_last_name(data, **kwargs):
  last_name = kwargs["last_name"]
  return {"full_name": "%s %s"%(data["first_name"], last_name)}

def add_middle_name_to_first_and_last(data, **kwargs):
  middle_name = kwargs["middle_name"]
  # looks for property named 'whole_name', instead of 'full_name'
  first_name, last_name = data["whole_name"].split()
  return {"first_middle_last": " ".join([first_name, middle_name, last_name])

def add_age(data, **kwargs):
  data["age"] = kwargs["age"]
  return data

# data = {"first_name": "john"} -> 
#  add_last_name(data, **{"last_name": "smith"}) 
# . returns {"full_name": "john smith"} -> 
#  add_age(data, **{"age": 30})
# . returns {"full_name": "john smith", "age": 30} ->
#  add_middle_name_to_first_and_last(data, **{"middle_name": "doe"})
# . This fails because the "whole_name" key does not exist

Basically, it all comes down to the fact that you can't control the key names used from all of these different modules and I need a way to translate module properties between links in a pipeline. I realize that I could just have strict property names be used but I want what I'm working on the be data/property agnostic for maximum flexibility and leave these links up to the creator (user) of the pipeline, thus I need some sort of mechanism to link property names from a previous link to a current link. Modules should be classes and have their "required parameter names" defined internally.

Aucun commentaire:

Enregistrer un commentaire