I am trying to facilitate the creation of pipelines which pass dictionaries between each element in a pipeline for data manipulation.
I am able to create a standard pipeline where you start with a "module" that returns a dictionary, which is then passed to the next element in the pipeline to perform some sort of processing/manipulation of the dictionary, which then passes it to the next element, etc... The problem I face is that with each "pipe" in a pipeline, the keys in the dictionary passed to the pipe may not match the keys that the pipe is looking for.
For example:
def add_last_name(data, **kwargs):
last_name = kwargs["last_name"]
return {"full_name": "%s %s"%(data["first_name"], last_name)}
def add_middle_name_to_first_and_last(data, **kwargs):
middle_name = kwargs["middle_name"]
# looks for property named 'whole_name', instead of 'full_name'
first_name, last_name = data["whole_name"].split()
return {"first_middle_last": " ".join([first_name, middle_name, last_name])
def add_age(data, **kwargs):
data["age"] = kwargs["age"]
return data
# data = {"first_name": "john"} ->
# add_last_name(data, **{"last_name": "smith"})
# . returns {"full_name": "john smith"} ->
# add_age(data, **{"age": 30})
# . returns {"full_name": "john smith", "age": 30} ->
# add_middle_name_to_first_and_last(data, **{"middle_name": "doe"})
# . This fails because the "whole_name" key does not exist
Basically, it all comes down to the fact that you can't control the key names used from all of these different modules and I need a way to translate module properties between links in a pipeline. I realize that I could just have strict property names be used but I want what I'm working on the be data/property agnostic for maximum flexibility and leave these links up to the creator (user) of the pipeline, thus I need some sort of mechanism to link property names from a previous link to a current link. Modules should be classes and have their "required parameter names" defined internally.
Aucun commentaire:
Enregistrer un commentaire