samedi 16 avril 2016

Questions on scalability of getting most recent posts from users someone is following: two different implementations

I have two different implementations for retrieving posts from the users someone is following and I want to know which one lends itself to a database that will be more scalable. First I'll show you the tables, and then the two implementations.

post table

id (PK)| poster_id (FK) | value | created_on

repost table - necessary for both implementations

reposter_id (FK) | post_id (FK) | created_on

following table

follower_id (FK) | followee_id (FK)

user_feed table - necessary only for second implementation

user_id (FK) | post_id (FK) | created_on

First implementation: Requires the repost table, but no user_feed table. Whenever a user posts something I don't have to add a row for each of their followers in the user_feed table, but the query seems much more computationally expensive than the query in implementation two.

SELECT * FROM (
    SELECT p.id, p.poster_id, p.value, p.created_at
    FROM post AS p
    WHERE p.'user_id'=23 OR (p.'user_id' IN SELECT 'followee' FROM following WHERE 'follower'=23)
    UNION
    SELECT p.id, p.poster_id, p.value, rp.reposter_id, rp.created_at
    FROM post AS p
    INNER JOIN repost AS rp
    ON rp.post_id = p.id
    WHERE p.'user_id'=23 OR (p.'user_id' IN SELECT 'followee' FROM following WHERE 'follower'=23)))
ORDER BY created_at DESC
LIMIT 80;

Second implementation: Requires the user_feed table and the repost table. I add a record to the user_feed table every time someone posts/reposts something. The record is added for each one of the poster's followers. I only keep, say, 120 records for any particular user in the user_feed table. If a post is made and a user already has 120 records in the user_feed table, the oldest record for that user is removed and added to the repost table; and the new one takes its place. If a user requests more records than there are present in the user_feed table for them, then the first implementation is used to retrieve the excess.

SELECT * FROM (
    SELECT p.id, p.poster, p.value, p.created_at
    FROM post As p
    WHERE p.'id' IN (SELECT 'post_id' FROM user_feed WHERE 'user_id'=23)
ORDER BY created_at DESC
LIMIT 80;

Aucun commentaire:

Enregistrer un commentaire