jeudi 23 février 2023

What is the right design for a single ORM used between multiple repositories?

I have two Python repositories (Data pipeline and API) which access the same database. They use sqlalchemy as an ORM to read/write the tables with models in this format:

from sqlalchemy import Column, Integer
from sqlalchemy.orm import declarative_base

Base = declarative_base()

class MyData(Base):
    id: int = Column(Integer, primary_key=True)
    ...

And then read/write in the usual way:

from sqlalchemy import create_engine
from sqlalchemy.orm import Session

engine = create_engine(connection_string)
Base.metadata.create_all(engine)
with Session(engine) as session:
    session.query(MyData).filter(MyData.id == 0).one_or_none()
    session.add(MyData(id=1))
    ...

I don't want to combine the two repos into a monolith to avoid this problem, so I see four options, none of which are good:

  1. Duplicate the ORM code across both repos - this is obviously a bad code smell, maintenance becomes labour-intensive.
  2. Avoid using ORM in one of the repos, e.g. use pandas read_sql/to_sql in the pipeline repo - this is a poor compromise, sacrificing functionality plus a serious hit to write speed.
  3. Create a separate ORM repo and add it as a git submodule to both repositories - this seems better but it strikes me as a bad code smell to have an __init__.py at the top level of this repo.
  4. Bundle an ORM repo as a pip package and add it as a dependency to the two repositories. This is what I have currently:
db_orm
  __init__.py
  models.py
setup.py

setup.py:

from setuptools import setup

setup(
    name="db_orm",
    version="0.0.1",
    packages=["db_orm"],
    package_dir={"db_orm": "db_orm"},
    install_requires=["SQLAlchemy==1.4.44"],
    classifiers=[
        "Programming Language :: Python :: 3",
    ],
)

This approach again is not great as I need to maintain credentials for both repositories to be able to install the ORM package via pip, I also need to ensure their versions are kept up to date.

Is there a better way to solve this problem?

Aucun commentaire:

Enregistrer un commentaire