mardi 20 mars 2018

Better designing pattern of recycle documents in a content management system using MongoDB 3.6

I'm currently working on a learning project in building up a content management system (node.JS/Express.JS) using MongoDB 3.6 (with Mongoose v5). I would like to have a content recycling feature, which allows the user to set the document into a special 'state', instead of to delete a document (e.g. a post) immediately. After a delay time (e.g. 14 days), the system will automatically clean up that document (triggered by mongod TTL index).

Now, I have two designing pattern, and just don't know which one is much preferable in a general productive environment.

For example, I have two collections in MongoDB called 'posts' and 'media', and both are similar in terms of their schema:

const PostsSchema           = new mongoose.Schema({
      (... some fields ...)
      status                : { type: String, default: 'published',
                                enum: ['drafted','published', 'recycled'] }
}
enter code here

(1) embedded a TTL index directly on the collection.

  • a. add a field path called time.recycle.
  • b. create the TTL index based on time.recycle, and set it to expire after 14 days.

And in my app, whenever the user want to delete a document, they are triggering an update event, which updates the targeted document(s) by setting { 'time.recycle': Date.now(), status: 'recycled' }. Because now the document being set a valid Date object on time.recycle path, now the timer start to work. If the user did not restore the document by triggering an reversal event, which removed time.recycle field, the document will automatically white-out by mongod after 14 days.

(2) add another collection called 'recycles': This 'recycles' collection is bonded with the following schema:

const BinSchema             = new mongoose.Schema({
    _id                     : { type: mongoose.Schema.Types.ObjectId },
    bin                     : { type: mongoose.Schema.Types.Mixed },
    restore                 : { type: String, lowercase: true, require: true },
}, {
    timestamps              : { createdAt: 'time.recycled', updatedAt: false },
    versionKey              : false,
})
    .index({ 'time._recycled' : 1 }, { expireAfterSeconds: 14 * 24 * 3600 * 1000 });

Now, whenever the user want to deleting a document, they are move the document from 'posts' or 'media' collection into this 'recycle' collection. Anything stays in this collection longer than 14 days will then be removed.

====

Just a short comparison:

In (1) :

  • only need to make 1 query.
  • the targeted document still stays in the same collection. (further filtering will needed in order to hide it from the user.)

In (2) :

  • 2 queries are needed (one is to create(copy) in 'recycles', the other is to delete it from 'posts' or 'media' ).
  • all recycled documents are stayed in the same collection.

    (it doesn't matter if document is from 'posts' or 'media'.)

    (there is no need to do filtering in order to hide it from the user.)

  • cannot accidentally trigger the TTL event in the background.

I think both patterns have their pros and cons, but not sure which one is much better. I feel (1) is much efficient because of fewer query (but if in a larger set of document, maybe (2)?! because it resulted in a smaller size of collection for future query...?). However, (2) looks much safer, and the content will never lost by mis-operating on the document because there are no TTL index set to the collection.

Any opinions?!

Aucun commentaire:

Enregistrer un commentaire