I have two Python repositories (Data pipeline and API) which access the same database. They use sqlalchemy as an ORM to read/write the tables with models in this format:
from sqlalchemy import Column, Integer
from sqlalchemy.orm import declarative_base
Base = declarative_base()
class MyData(Base):
id: int = Column(Integer, primary_key=True)
...
And then read/write in the usual way:
from sqlalchemy import create_engine
from sqlalchemy.orm import Session
engine = create_engine(connection_string)
Base.metadata.create_all(engine)
with Session(engine) as session:
session.query(MyData).filter(MyData.id == 0).one_or_none()
session.add(MyData(id=1))
...
I don't want to combine the two repos into a monolith to avoid this problem, so I see four options, none of which are good:
- Duplicate the ORM code across both repos - this is obviously a bad code smell, maintenance becomes labour-intensive.
- Avoid using ORM in one of the repos, e.g. use pandas read_sql/to_sql in the pipeline repo - this is a poor compromise, sacrificing functionality plus a serious hit to write speed.
- Create a separate ORM repo and add it as a git submodule to both repositories - this seems better but it strikes me as a bad code smell to have an
__init__.py
at the top level of this repo. - Bundle an ORM repo as a pip package and add it as a dependency to the two repositories. This is what I have currently:
db_orm
__init__.py
models.py
setup.py
setup.py:
from setuptools import setup
setup(
name="db_orm",
version="0.0.1",
packages=["db_orm"],
package_dir={"db_orm": "db_orm"},
install_requires=["SQLAlchemy==1.4.44"],
classifiers=[
"Programming Language :: Python :: 3",
],
)
This approach again is not great as I need to maintain credentials for both repositories to be able to install the ORM package via pip, I also need to ensure their versions are kept up to date.
Is there a better way to solve this problem?
Aucun commentaire:
Enregistrer un commentaire