I've got a design/architecture question.
I'm developing an ETL application in Java (which I'm relatively inexperienced with). I'm leveraging Spring Boot as a base framework for component autowiring, dependency injection, etc. My application is intended to be ran on a scheduled basis in a batch manner. (FYI I've looked at Spring Batch, but it falls short of what I need).
In order to maximize flexibility, when my application is executed, it reads a user/developer-defined JSON configuration file at runtime. This is called the "job configuration." The job configuration file contains an array of 1 or more datasources from which to extract data, and each datasource configuration defines a specific query for what data to extract (along with various other parameters). Each of these datasource configurations also define an array of transforms, which define how to convert the extracted data into a desired data structure. Finally, the configuration defines an array of output datasources, which is where the data gets loaded after it has been joined (if needed) and merged into its final output structure(s). Pretty straight forward right?
Part of the challenge that I need help with is that, as the developer, I don't know what the structure of the final output structure will be, since it's configured, not hard-coded. So I can't just slap an ORM framework like Hibernate in around some predefined POJO / DOA's to help me avoid having to write a bunch of my own complex persistence logic. Does anyone know of an existing design pattern or perhaps a framework to help address challenges like these?
Any help is much appreciated. Thanks!
Aucun commentaire:
Enregistrer un commentaire