jeudi 20 avril 2017

How to ergonomically iterate over functions with different signatures (python)

I have the following situation in python:

I have a "genotype" of sorts which contains genes. What those genes represent is not important, they are just arbitrary objects which can all be referenced as "gene objects".

I need to mutate this gene, via several methods, however not all of the function signatures match up. Given a starting gene a new gene is created with a random chance to select one of these methods (or no method) for mutation.

for example, I have duplicate(gene), replace(gene, othergene), insert(gene, othergene), delete(gene), othermutation(gene, genotype). All of these return a list of genes (even if the list only contains one element, or zero elements) in order to maintain homogeneity among the function signatures.

I want to generalize the situation to a list of these mutation functions and associated percentage chance to be used. I already have methods of selecting these genes via binary search and cumulative distributions, I generate an R and can retrieve the correct function based on the rounded binary index from R. This roughly allows me to do the following:

def mutate(genotype, mutation_list, cumulative_probabilities)
    mutated_genotype = []
    for gene in genotype:
        r = random.random()
        mutation = mutation_list(cumulative_probabilities(r))
        mutated_genotype.extend(mutation(gene))
    return mutated_genotype

Ideally I don't need to know the mutation on the fifth line, I just need to have a mutation list somewhere with associated probabilities. As you can see, what happens with replace(gene, othergene) which requires a second parameter? Or othermutation(gene, genotype) which requires a different but also separate parameter?

In order to solve this problem I've come to several solutions. First, I could assimilate all function signatures to be exactly the same. What I mean by this is that even though duplicate(gene) doesn't need othergene I would still put that in the function definition, it just wouldn't use it or it would do something trivial with it. But the downside with this solution is that every time I need to add a new function with new parameters, I'll need to change all function signatures, which violates SRP in a class sort of way for all functions and would be annoying to deal with. But I could wrap it as one parameter object where I set needed parameters on each iteration. In the event that a new function with new types of arguments were needed, I could simply add that argument to the "mutation parameter" object, and pass this object to every function, and every function would only get what it needs, for example:

for gene in genotype:
    #note other gene is created from some other function or is a generator itself
    mutation_parameter = MutationParameter(gene, genotype, othergene)
    r = random.random()
    mutation = mutation_list(cumulative_probabilities(r))
    mutated_genotype.extend(mutation(mutation_parameter))

The stickler here is that the members of MutationParameter aren't necessarily all that related to one another and we have to know about what the mutation signatures are before hand, you won't just be able to add a new signature, multiple sections of code will need to be updated.

Another way I could deal with this is I could make the function parameters generic, but in this way I would be forced to add an extra datastructure that handles the pulling of data into the function signature (so that all functions take *args or **kwargs) and this would possibly mean tailored functions for each signature and decreased performance due to the need to linearize or associate cumulative probabilities with a hashtable/dictionary.

I could handle the functions by making functors which store some of the data for the parameters in the calling 'function' itself (such as the "other gene" as say a generator that randomly produces a gene). This would require me to create a new class for each function in order to handle this situation for each function that required unique parameters. Even then, I couldn't put, say, the current list of genes, the genotype into othermutation with out creating the list of functions at call during the execution of the function itself (so not all functors couldn't be passed in as the mutation_list in mutate.

Still I might just have to bit the bullet and separate some mutation types that require certain types of information from others if I want to have a nice way of dealing with a generic mutation_list and I could just pass in two types of mutation lists.

I could additionally do a similar thing by making, say, the othergene parameter a static function/variable to the functor that all of the instantiations would have.

So to summarize three methods,

1: pass all parameters in via parameter object, functions pick and choose which parameters they need,

2: use *args to do effectively the same thing, but you also need an associated function for each signature to pull out data,

3: or use functors(either with or with out static variables) to hold the data with another separate passed in mutation list for mutation functions where extra parameters can't be determined before mutation is called.

I want to avoid having my mutate function care about what the underlying mutations are, such that it can be generic as possible.

Aucun commentaire:

Enregistrer un commentaire