lundi 18 janvier 2016

SQLAlchemy Declarative: How to merge models and existing business logic classes

I would like to know what the best practices are for using SQLALchemy declarative models within business logic code. Perhaps stackexchange.codereview may have a been a better place to ask this, but I'm not sure.

Here's some background.

Let's say I have a bunch of classes doing various things. Most of them have little or nothing to do with each other.Each such class has between a hundred to thousand lines of code doing things that have precious little to do with the database. In fact, most of the classes aren't even database aware so far. I've gotten away with storing the actual information in flat files (csv, yaml, so on), and only maintaining a serial number table and a document path - serial number mapping in the database. Each object retrieves the files it needs by getting the correct paths from the database (by serial number) and reconstructs itself from there. This has been exceedingly convenient so far, since my 'models' have been (and admittedly, continue to be) more than fluid.

As I expand the involvement of the database in the codebase I currently have, I seem to have settled on the following model, separating the database bits and the business logic into two completely separate parts, and joining them using specific function calls instead of inheritance or even composition. Here is a basic example of the kind of code I have now (pseudocode-quality):

module/db/models.py:

class Example(Base):
    id = Column(...)
    some_var = Column(...)

module/db/controller.py:

from .models import Example

def get_example_by_id(id, session):
    return session.query(Example).filter_by(id=id).one()

def upsert_example(id=None, some_var=None, session):
    if id is not None:
        try:
            example_obj = get_example_by_id(id, session)
            example_obj.some_var = some_var
            return
        except:
            pass
    example_obj = Example(some_var=some_var)
    session.add(example_obj)
    session.flush()

module/example.py:

from db import controller

class Example(object):
    def __init__(self, id):
        self._id = id
        self._some_var = None
        try:
            self._load_from_db()
            self._defined = True
        except:
            self._defined = False

    def _load_from_db(self, session):
        db_obj = controller.get_example_by_id(self._id, session)
        self._some_var = db_obj.some_var

    def create(some_var, session):
        if self._defined is True:
            raise Exception
        self._some_var = some_var
        self._sync_to_db(session)

    def _sync_to_db(self, session):
        controller.upsert_example(self._some_var, session)

    @property
    def some_var(self):
        return self._some_var

    ... 

I'm not convinced this is the way to go.

I have a few models following this pattern, and many more that I should implement in time. The database is currently only used for persistence and archiving. Once something is in the database, it's more or less read only from there on in. However, querying on it is becoming important.

The reason I'm inclined to migrate from the flatfiles to the database is largely to improve scalability.

Thus far, if I wanted to find all instances (rows) of Example with some_var = 3, I'd have to construct all of the instances from the flat files and iterate through them. This seems like a waste of both processor time and memory. In many cases, some_var is actually a calculated property, and reached by a fairly expensive process using source data contained in the flat file.

With the structure above, what I would do is query on Example, obtain a list of 'id's which satisfy my criterion, and then reconstruct just those module instances.

The ORM approach, however, as I understand it, would use thick models, where the objects returned by the query are themselves the objects I would need. I'm wondering whether it makes sense to try to move to that kind of a structure.

To that end, I have the following 'questions' / thoughts:

  • My instinct is that the code snippets above are anti-patterns more than they are useful patterns. I can't put my finger on why, exactly, but I'm not very happy with it. Is there a real, tangible disadvantage to the structure as listed above? Would moving to a more ORM-ish design provide advantages in functionality / performance / maintainability over this approach?

  • I'm paranoid about tying myself down to a database schema. I'm also paranoid about regular DB migrations. The approach listed above gives me a certain peace of mind in knowing that if I do need to do some migration, it'll be limited to the _load_from_db and _sync_to_db functions, and let me mess around willy nilly with all the rest of the code.

    • I'm I wrong about the cost of migrations in the thick-Model approach being high?
    • Is my sense of security in restricting my code's db involvement more of a false sense of security rather than a useful separation?
  • If I wanted to integrate Example from module/db/models.py with Example from module/example.py in the example above, what would be the cleanest way to go about it. Alternatively, what is an accepted pattern for handling business-logic heavy models with SQLAlchemy?

  • In the code above, note that the business logic class keeps all of it's information in 'private' instance variables, while the Model class keeps all of it's information in class variables. How would integrating these two approaches actually work? Theoretically, they should still 'just work' even if put together in a single class definition. In practice, does it?

(The actual codebase is on github, though it's not likely to be very readable)

Aucun commentaire:

Enregistrer un commentaire