samedi 1 avril 2017

designing complex data structure's dependences

I'm in the process of designing a program in Python that has a complex data structure: a list whose id is a dimension (0,1,2 or 3) and that stores dictionaries. These dictionaries have string keys (identifiers) and the values are numpy arrays (each row representing an element). I need to do this because the numpy arrays for a given dimension have different shapes depending on the string identifiers. So I decided to create a class that helps me when dealing with the elements stored. This is the class:

class ElementData(list):

    def __init__(self, *args, **kwargs):

        self.reset()
        super(ElementData, self).__init__(*args, **kwargs)

    def __iter__(self):
        for k, v in self[self.idx].items():
            for i, e in enumerate(v):
                yield (k,i,e) if not ma.is_masked(e) else (k,i, None)
        self.reset()


    def __call__(self, idx):
        self.idx = idx-1
        return self

    def __getitem__(self, index):
        if index >= len(self):
            self.expand(index)
        return super(ElementData, self).__getitem__(index)

    def __setitem__(self, index, value):
        if index >= len(self):
            self.expand(index)
        list.__setitem__(self, index, value)

    def __str__(self):
        return "Element dimensions present: {}\n".format([i for i in range(len(self)) if self[i]]) + super(ElementData, self).__str__()

    def keys(self):
        return flatten([list(self[i].keys()) for i in range(len(self))])

    def reset(self):
        self.idx = -1
        self.d = -1

    def expand(self, index):
        self.d = max(index, self.d)
        for i in range(index + 1 - len(self)):
            self.append(OrderedDict())

    def strip(self, value=None):
        if not callable(value):
            saved_value, value = value, lambda k,v: saved_value
        return ElementData([OrderedDict({k:value(k, v) for k,v in i.items()}) for i in super(ElementData, self).__iter__()])


    def numElements(self, d):

        def elementsOfDimension(d):
            # loop over etypes
            nelems = 0
            for v in self[d].values():
                nelems += v.shape[0] if not isinstance(v, ma.MaskedArray) else v.shape[0] - v.mask.any(axis=1).sum()
            return nelems

        # compute the number of all elements
        if d == -1:
            nelems = 0
            for i in range(self.d+1):
                nelems += elementsOfDimension(i)
            return nelems
        else: # of specific dimension only
            return elementsOfDimension(d)

The class works nicely, and it allows me to loop seamlessly through all items of a particular dimension. However, there are other data associated with each row of the row stored in the numpy arrays, for example the material of the element. Therefore, I decided to use the same data structure to refer to other properties. To that end I use the strip function of the class, to return me the entire structure without the numpy arrays.

The problem that I is that the original data structure is dynamic, and if I change it, I have to modify every other structure that depends on it. I really think I went in the wrong direction while designing this class. Perhaps there's a simpler way to approach this problem? I thought about storing the extra information next to the numpy arrays (as tuples for example), but I don't know whether this is good or not. The choices made while designing software can really make our life miserable later, and I'm starting realize this now.

Aucun commentaire:

Enregistrer un commentaire