so my problem is this: I am using an API where I can get the top posts of a website. I query that API every half hour to get the top 10 posts in the website at that time. However, most of the times, there are duplicates in that query (for instance, a post that is 2 hours old, can still be in the top 10. Sometimes posts that are 2 days old can also be there).
What would be a good way to remove the duplicates from the results of the query?
So far, my thought was that I could keep a file with unique posts and check new queries for duplicates from that file. However, this is problematic as I don't know how long I should keep track of old posts, so the file might get really big!
Aucun commentaire:
Enregistrer un commentaire