I'm in the process of designing a program in Python that has a complex data structure: a list whose id is a dimension (0,1,2 or 3) and that stores dictionaries. These dictionaries have string keys (identifiers) and the values are numpy arrays (each row representing an element). I need to do this because the numpy arrays for a given dimension have different shapes depending on the string identifiers. So I decided to create a class that helps me when dealing with the elements stored. This is the class:
class ElementData(list):
def __init__(self, *args, **kwargs):
self.reset()
super(ElementData, self).__init__(*args, **kwargs)
def __iter__(self):
for k, v in self[self.idx].items():
for i, e in enumerate(v):
yield (k,i,e) if not ma.is_masked(e) else (k,i, None)
self.reset()
def __call__(self, idx):
self.idx = idx-1
return self
def __getitem__(self, index):
if index >= len(self):
self.expand(index)
return super(ElementData, self).__getitem__(index)
def __setitem__(self, index, value):
if index >= len(self):
self.expand(index)
list.__setitem__(self, index, value)
def __str__(self):
return "Element dimensions present: {}\n".format([i for i in range(len(self)) if self[i]]) + super(ElementData, self).__str__()
def keys(self):
return flatten([list(self[i].keys()) for i in range(len(self))])
def reset(self):
self.idx = -1
self.d = -1
def expand(self, index):
self.d = max(index, self.d)
for i in range(index + 1 - len(self)):
self.append(OrderedDict())
def strip(self, value=None):
if not callable(value):
saved_value, value = value, lambda k,v: saved_value
return ElementData([OrderedDict({k:value(k, v) for k,v in i.items()}) for i in super(ElementData, self).__iter__()])
def numElements(self, d):
def elementsOfDimension(d):
# loop over etypes
nelems = 0
for v in self[d].values():
nelems += v.shape[0] if not isinstance(v, ma.MaskedArray) else v.shape[0] - v.mask.any(axis=1).sum()
return nelems
# compute the number of all elements
if d == -1:
nelems = 0
for i in range(self.d+1):
nelems += elementsOfDimension(i)
return nelems
else: # of specific dimension only
return elementsOfDimension(d)
The class works nicely, and it allows me to loop seamlessly through all items of a particular dimension. However, there are other data associated with each row of the row stored in the numpy arrays, for example the material of the element. Therefore, I decided to use the same data structure to refer to other properties. To that end I use the strip
function of the class, to return me the entire structure without the numpy arrays.
The problem that I is that the original data structure is dynamic, and if I change it, I have to modify every other structure that depends on it. I really think I went in the wrong direction while designing this class. Perhaps there's a simpler way to approach this problem? I thought about storing the extra information next to the numpy arrays (as tuples for example), but I don't know whether this is good or not. The choices made while designing software can really make our life miserable later, and I'm starting realize this now.
Aucun commentaire:
Enregistrer un commentaire