I have to design the database for customers having prices for millions of materials they acquire through multiple suppliers for the next 24 months. So the database will store prices on a daily basis for every material supplied by a specific supplier for the next 24 months. So we keep past data. Now lookups will happen on: 1. Find the price for a material by the supplier as of the specific date by a customer. 2. Find the price for a material by the supplier for a time period by a customer.
I can think for the primary key as: 1. Partition key: (customer Id, material Id, supplier Id, date) -- will this end up in perf issues as it will make so many partitions in long run? 2. Partition Key: (customer Id, material Id, supplier Id, monthbucket), clustering key: date -- monthbucket will store data for material on monthly basis on same partition and will have value like '202002' for Feb 2020 dates.
Another question is how can I make sure my data distributes evenly across nodes.
Note: 1. combination of customer, material, supplier and date is unique. 2. two customer can have similar material ids.
do you have any suggestions or questions?
Thanks.
Aucun commentaire:
Enregistrer un commentaire