I'm trying to find the best data-structure/data store solution (highest performance) for the following request:
I have a list of attributes that I need to store for all individual in the US, for example:
+------------+-------+-------------+
| Attribute | Value | SSN |
+------------+-------+-------------+
| hair color | black | 123-45-6789 |
| eye color | brown | 123-45-6789 |
| height | 175 | 123-45-6789 |
| sex | M | 123-45-6789 |
| shoe size | 42 | 123-45-6789 |
As you can guess, with the general population, there are nothing unique and identifiable from those attributes. However, let's assume that if we were to fetch from a combination of 3 or 4 attributes, then I would be able to uniquely identify a person (find their SSN).
Now here's the difficulties, the set of combinations that can uniquely identify a person will evolve over time and be adjusted.
What would be my best bet for storing and querying the data with the scenario mentioned above, that will remain highly performant (<100ms) at scale?
Current attempt:
SELECT * FROM (SELECT * FROM people WHERE hair='black') p1
JOIN (SELECT * FROM people WHERE height=175) p2
JOIN (SELECT * FROM people WHERE eye='brown') p2
ON p1.SSN = p2.SSN
But with a database with millions of rows, as you can guess.. NOT performant.
Thank you!
Aucun commentaire:
Enregistrer un commentaire