mardi 16 mai 2017

How to correctly define a complex data structure?

I would like to define a data structure in Python with which I can easily write loader and dumpers (to and from various serialized formats). It should offer:

  • Easily access of content through direct attributes
  • Possibility to validate the content against a defined schema
  • Write representers and loaders

In order to illustrate the question, consider this piece of data:

data = {
   'location_a': [{
      'name': 'Library A',
      'city: 'London',
      'books': [
         {'title': 'foo', 'date': '2010-04', pages: 400},
      }]
   ]
}

This is the less restrictive solution because there is no type or content validation.

Another solution would be to define such classes:

class Book(DataContainer):
   _fields_ = [('title', str), ('date', str), ('pages', int)]

class Library(DataContainer):
   _fields_ = [('name', str), ('city', str), ('books', ListOf(Book))]

class Place(DataContainer):
   _fields_ = [{('location', str): ListOf(Library)}]

b = Book(title='foo', date='2010-04', 'pages':400)

l = Library()
l.name = 'Library A'
l.city = 'London'
l.books.append(b)

p = Place(location='location_a')
p.append(l)

data = Data()
data.append(p)

print data.to_xml()

data['location_a'][0].books.append(Book(title='bar'))

dictionary = data.to_dict()

This is of course a pretty ugly example, but I am looking for a clean and modular solution where I can easily create representers, loaders or even validaters:

Book.title.add_constraint(Match(r'[a-z0-9_]+'))

Is there any common pattern or known modules that offer such features?

For example I have tried Pandas, but it is more focused on data analysis and there is no support for empty fields. Only 'NaN' is allowed for 'float'. Voluptuous is a nice validation module, but It has to be used in more complete solution.

Any ideas?

Aucun commentaire:

Enregistrer un commentaire