mercredi 2 décembre 2020

Database Design - Need help scaling query

I'm trying to find the best data-structure/data store solution (highest performance) for the following request:

I have a list of attributes that I need to store for all individual in the US, for example:

+------------+-------+-------------+
| Attribute  | Value |     SSN     |
+------------+-------+-------------+
| hair color | black | 123-45-6789 |
| eye color  | brown | 123-45-6789 |
| height     | 175   | 123-45-6789 |
| sex        | M     | 123-45-6789 |
| shoe size  | 42    | 123-45-6789 |

As you can guess, with the general population, there are nothing unique and identifiable from those attributes. However, let's assume that if we were to fetch from a combination of 3 or 4 attributes, then I would be able to uniquely identify a person (find their SSN).

Now here's the difficulties, the set of combinations that can uniquely identify a person will evolve over time and be adjusted.

What would be my best bet for storing and querying the data with the scenario mentioned above, that will remain highly performant (<100ms) at scale?

Current attempt:

SELECT * FROM (SELECT * FROM people WHERE hair='black') p1 
JOIN (SELECT * FROM people WHERE height=175) p2 
JOIN (SELECT * FROM people WHERE eye='brown') p2 
ON p1.SSN = p2.SSN

But with a database with millions of rows, as you can guess.. NOT performant.

Thank you!

Aucun commentaire:

Enregistrer un commentaire