jeudi 28 juillet 2016

DB FK & Referential Integrity, how to handle this data?

I have a situation that I am not sure how to handle. Let me start by saying that I am mainly a software programmer, fairly comfortable with databases and design, but am not an expert. My current task is to build a system which will call out to various social media APIs, and download data for numerous accounts which I have credentials for, across the various platforms. I have built C# applications using entity framework to call the APIs, translate the results into entities, and insert them into a DB.

I have designed a database which follows the structure of the API objects that are being returned to me, the structure of which is not really important, except to know that I am building a fairly normalized DB using foreign keys where appropriate.

Consider the example of Twitter, where we have

TABLE [User]
(
  UserID INT PK NOT NULL,
  --etc
)

and

TABLE [Tweet]
(
  TweetID INT PK NOT NULL,
  TweetingUserID INT FOREIGN KEY REFERENCES User(UserID),
  --etc
)

The problem we've seen has popped up, I believe mostly because our API apps are still in "sandbox mode", and we are not getting a full set of data from the API calls, that is, we might get the first 10 users when asking for "followers", regardless of the total number who follow me, but when asking for timeline tweets, I will just get back the first 20 tweets, regardless of who posted them.

SO, when I try to insert all the tweets I was given, after inserting all the users that were returned, some of the time, I will be given a tweet, created by a user who was never returned from the API call. There is a user ID present on the Tweet side, but that ID doesn't exist in the User PK. This causes a PK violation error, and the whole save operation (all entities for a given account) are thrown out.

What is the correct way to deal with a situation like this? My options, as I see them, are

A: Scan the local dbSet & the database for each entity I download that has a FK to check and make sure the referenced entity exists before trying to set the FK property, and setting it to null or a fake "missing entity" record on the PK side. This seems really inefficient, and would require me writing a whole bunch more code for every single API call type.

B: Removing all the FKs from the database so there is nothing to be violated. Leave it up to the dba's to just use outer joins and detect when a join ID is missing when they aggregate/warehouse the data. This seems wrong from a DB design perspective, and would also break all of my reference properties in the EF model.

Is there some technique I don't know of that solves this problem? Are my hands just tied on using FKs because of the quality of the data I'm getting? Have I missed something obvious?

Thanks!

Aucun commentaire:

Enregistrer un commentaire