vendredi 29 décembre 2017

Syncing file system changes: Diffing vs replaying file operations

Looking for general input from an architectural point of view.

Problem:

  • I have a virtual file system (think DropBox) in the cloud.
  • I need a logical representation of it on the client.
  • My clients may need to sync up their local representation upon changes.
  • I don't expect the repositories to be too big - there may be bigger ones in some cases though.

Common Solutions:

  • Brute force: Just get the whole model of the file system on every refresh and discard the old one. That's ok for a minimum viable product, but not a solution long-term.
  • Diffing: Compare two representations and find changes. That's the typical approach as far as I can see searching the web. I think it's rather complex to get all the cases though and I didn't see a simple solution to just take over (C# server-side, JavaScript clients), which may be an indicator for this to be non-trivial, too. Happy for inputs/links though!

Alternative solution: replay log

I'm toying with the idea of just keeping a list of commands that represent file operations along with a timestamp. That means that if a client needs to sync up, it just gets the commands since the last sync and executes them on the existing repository, e.g.

  1. Create directory D1 in root
  2. Add file F1 to directory D1
  3. Add directory D2 to directory D1
  4. Move file F1 to directory D2
  5. Delete directory D1

Pros:

  • Greatly reduced complexity
  • Stable
  • Works both ways (offline operations on the client)
  • No need to keep two models and compare them

Cons:

  • Need to keep the replay log
  • An unnoticed syncing bug by cause clients to get out of sync

I'm currently leaning towards the replay log, but would be happy to hear your thoughts about the pattern and potential alternatives. Thanks!

Aucun commentaire:

Enregistrer un commentaire