jeudi 12 novembre 2015

Object oriented code and database storage

I'm trying to figure out how to link an object oriented code and a database storage, and I think I miss a design pattern.

I'm using Python / MongoDB but I suppose the question also applies to other languages and possibly other databases.

The model (simplified example)

Building

  • name: string
  • construction_date: date
  • apartments: list of Apartment entities

Apartment

  • reference: string
  • area: float
  • owner: Person entity

Person

  • name: string
  • favorite color: string

In Python, I would represent this with three classes.

In MongoDB, I'd have two collections : Buildings and Persons. An apartment would be an embedded document in a Building, and an owner would be an id referring to the Person collection.

The code (example use case)

In the Python code, I would have a manager class for each collection, with methods to return buildings, persons, etc., hiding the MongoDB queries from the rest of the code.

The code responds to user actions (most probably from a REST API in my case, but it could be a GUI or a CLI). Objects are generated on demand, not persistent.

Say I receive a user action that asks for the cumulated area of all buildings built after 1969.

  • In the DB manager class, I would have a get_buildings_by_age() method that would return a list of buildings built after a given date.

  • In the code responding to the user action, the sum_apartment_areas_by_age() method would call get_buildings_by_age(), then compute and return the cumulated area of all apartments.

I'm missing the pattern to go from the bson/json/dictionary world to the Python classes world and back.

Layer separation: from the DB driver to the objects

I can think of several ways, but none seems obviously better to me.

  1. No classes / all dictionaries

    The manager class get_buildings_by_age() method returns a building list as dictionaries. The sum_apartment_areas_by_age() method calls get_buildings_by_age(), then does the computations from the values in the dictionaries.

    No Building or Apartment class involved.

    I think this is missing the point of object oriented programming. I wouldn't do that.

  2. Classes in DB management layer

    The manager class get_buildings_by_age() method returns a building list as dictionaries.

    The sum_apartment_areas_by_age() method calls get_buildings_by_age() and instantiates Building and Apartment, then does the computations using the methods in Building and Apartment classes.

  3. Classes in main code

    The manager class get_buildings_by_age() method instantiates Building and Apartment and returns a building list as Building / Apartment instances.

    The sum_apartment_areas_by_age() method calls get_buildings_by_age(), then does the computations using the methods in Building and Apartment classes.

  4. DB layer in classes

    The calls to the DB manager classes are in the Building and Apartment classes themselves, hidden from the sum_apartment_areas_by_age() method.

    Building class has static methods to return Building instances. For instance, it has a get_buildings_by_age() class method that calls the get_buildings_by_age() method from the DB manager layer, creates required number of Building instances and returns them as a list.

    sum_apartment_areas_by_age() calls Building. get_buildings_by_age(), then works with Building and Apartment instances.

Object representation from DB representation

The Building constructor should be able to use as an input the dictionary returned by the MongoDB driver. How would it store the building attributes?

  1. Keep the dictionary as an attribute

    self.dict = building_dict
    
    

    Then each property of the building would be reachable through building.dict["property"].

    A better attribute name could be used. Maybe a one-letter attribute. This doesn't look so elegant.

  2. De-serialize dictionary into attributes

    self.name = building_dict['name']
    self.construction_date = building_dict['construction_date']
    
    

    Seems tedious and not worth the pain.

  3. Keep the dictionary as an attribute and define getters/setters

    self.dict = building_dict
    
    @property
    def name(self):
        return self.dict['age']
    
    

    Can be tedious as well.

    This approach allows to catch missing attributes in the getter with a dedicated Exception.

    I could create getters only for values that are accessible from the outside, but it is tempting to create more, so that even own methods can use the getters for better readability, among other advantages.

  4. Let Building inherit from dict to be its own directory

    I have the feeling this is not recommended.

Embedded documents and references

In any case, the constructor may receive a dictionary representing the document values. In fact, a subset of values, as projection may have stripped away values useless to the computation to improve query performance. So the constructor must assume that any value can be missing. There can be exceptions to that with mandatory values.

Some values in the dictionary refer to other classes: embedded documents, references.

My guess is that the constructor should go through the whole dictionary to seek embedded documents that have their specific class in the code and instantiate those classes right away. However, references should be kept as foreign keys, otherwise other DB queries would be triggered while data is probably not needed.

Any comment on this?

Aucun commentaire:

Enregistrer un commentaire