design-patterns: Foreign Key usage in Low-Moderate Complexity DB System Design

mardi 27 juin 2023

Foreign Key usage in Low-Moderate Complexity DB System Design

I'm trying to (re-)design a low-moderate complexity database schema and I'm stuck on trying to decide how many foreign keys to include or not include in reference to other tables.

Is it generally better to have a great many foreign keys densely connecting entities/tables to each other, or is it preferable to have as few foreign keys connecting entities/tables together as possible? Where do you draw the line? Many of the current foreign key columns are optional which means I can't rely upon them to actually return all relevant data so I originally erred towards a dense graph of foreign key relationships to prevent an intermediate entity's lack of a valid foreign key relation causing problems retrieving data on the 'other side' of that intermediate entry, but that's caused its own issues with difficulty maintaining the foreign keys in all of these different entities.

About The System

This is a pharmacy benefit related system. The simple way to think about the system is that there is a bunch of standards-based data and tables and that data is being collected and presented to a staff member. The staff member examines this data and creates an Analysis; and then based on that analysis they create a Suggestion.

From a CRUD perspective, there are few requirements. I see no reason why I can't have as many Foreign Keys as I want from a performance perspective.

From a Query perspective, we want to query to gather all pertinent information for an Analysis all at once. This is the problematic part because so much of the data being queried is related. I've included a list of some of the tables I want to pull data from in this query below.

Table Name	Number of (potential) foreign keys to other tables in this list
Claim	9
Prescription	6
Pharmacy	4
Prescriber	2
Plan	4
Plan Member	8
Formulary	3
Formulary Row	4
Analysis	5
Suggestion	5

The issue I'm encountering is that so many of these tables are related to each other. For example, Prescription is related to Claim and Pharmacy and Analysis and Suggestion. Another example: Plan is related to Plan Member, Formulary, Claim, Analysis.

Original Code Smell Problem

As I have built out the system, the number of tables - and therefore the number of table joins needed to perform the get all data related to this analysis query - has grown. I now have a large Postgres View which encompasses many tables and the view is only going to get bigger as I improve the system. I don't like having to write and maintain this single extremely large and long query. Some pieces of data I need to pull are easy to get in the query and require only a single join, but others require joins on joins on joins. That seems like code smell to me for a system with this few tables.

Thoughts

If I try to set up and maintain all of these foreign key relationships then I can get faster queries, but when the system receives new data that data is not always complete and when it gets updated it seems like I'd have to re-perform the foreign key linking for all of that entities related entities, and all of their relations, etc. It's certainly possible to do that, but it would require some careful coding and inevitably something will go wrong; so I'll have to write code to allow re-calculating all entities' relationships when an update occurs. But if I do that, it would perhaps invalidate the Analysis and Suggestion table's entries based on the changing data upon which those entries were based which would cause other business problems where the Analysis was created based on data from time T0 and then at T1 the underlying data is updated, and when someone views the data and analysis at T2 they make a Suggestion based on the analysis created at T0 which was made invalid by the update at T1. And the problem would cascade out into the non-digital world. One way around that would be to preserve all data used to make an Analysis at T0, or at least do a time-based query and highlight that the underlying data upon which the analysis at T1 was made has changed... but that now requires me to keep track of such data as well. It's doable, but a pain.

Potential Solutions

Input-driven Approach

Try to segment the data by the data source and write services and queries to pull data from those data sources separately.

The problem with this approach is that although the data is coming from different sources, the data itself is inter-related across sources and there is a direct foreign key relationship between data from those sources.

Domain Driven Approach

This doesn't solve the code smell. All of this data is in the same domain.

Hub and Spoke

I tried to divide the data where I put one table containing primarily foreign keys in the middle (a hub), and referenced some other tables containing primarily foreign keys (a hub), and attached these tables to the hubs and the hubs to each other. This was done to try to reduce the number of foreign keys present in each table.

This didn't work because the relationships between entities in different hubs were stronger than the relationships between entities within the same hubs. Claim-Prescription-Pharmacy-Prescriber-Plan-PlanMember-Formulary-FormularyRow-Analysis-Suggestion are all inherently extremely related to each other so I would need to build a single hub and put all of these entities as spokes on that hub. That's dumb.

Messy Graph

I currently have a messy graph of entity relationships. I think this is producing code smell in my queries, but maybe the actually-correct way to deal with this is to, well, just accept it?!? Maybe this system is just more complicated than I would like and there's no way around that.

design-patterns