jeudi 29 novembre 2018

Service B relies on data in Service A: Duplicate data or retrieve on-demand?

This is a microservice design question which is a simplification of a real-life problem I would like to solve.

Service A has entities which can be active or inactive.

[
    {
       id: "a46e6cc7-97ca-4570-b3f3-2be00ca9dab5",
       name: "foo",
       active: true
    },
    {
       id: "eb1ced31-eccc-4ad6-a695-5c6c76cab7a5",
       name: "bar",
       active: false
    },
    {
       id: "ef332044-9e66-4a0b-91ed-c16a2537e848",
       name: "baz",
       active: true
    }
]

Service B has jobs that are related to Service A's entities and should only run if the entities are active (according to business rule).

Option 1: Service B does not store whether the jobs should run.

[
    {
       id: "39cf3321-34d1-4557-b1c4-ca628c191b92",
       entityId: ""a46e6cc7-97ca-4570-b3f3-2be00ca9dab5",
       start: "Thu Nov 29 2018 08:40:27 GMT-0800 (Pacific Standard Time)",
       ended: null,
       recurrence: "hourly"
    },
    {
       id: "77296d22-564f-4289-8327-f23bceb1d400",
       entityId: "a46e6cc7-97ca-4570-b3f3-2be00ca9dab5",
       start: "Tu Nov 27 2018 15:56:01 GMT-0800 (Pacific Standard Time)",
       ended: null,
       recurrence: "hourly"
    },
    {
       id: "2916a920-13a3-46f6-9ffd-d7629163924a",
       entityId: "eb1ced31-eccc-4ad6-a695-5c6c76cab7a5",
       start: "Wed April 01 2018 00:00:00 GMT-0800 (Pacific Standard Time)",
       ended: Thu April 01 2019 00:00:00 GMT-0800 (Pacific Standard Time),
       recurrence: "daily"
    },
]

When a job is scheduled to run it checks

if Service A has j.entityId = true
   run j

using Service A's API.

Option 2: Service B stores whether the job should run

[
    {
       id: "39cf3321-34d1-4557-b1c4-ca628c191b92",
       entityId: ""a46e6cc7-97ca-4570-b3f3-2be00ca9dab5",
       active: true,
       start: "Thu Nov 29 2018 08:40:27 GMT-0800 (Pacific Standard Time)",
       ended: null,
       recurrence: "hourly"
    },
    {
       id: "77296d22-564f-4289-8327-f23bceb1d400",
       entityId: "a46e6cc7-97ca-4570-b3f3-2be00ca9dab5",
       active: true,
       start: "Tu Nov 27 2018 15:56:01 GMT-0800 (Pacific Standard Time)",
       ended: null,
       recurrence: "hourly"
    },
    {
       id: "2916a920-13a3-46f6-9ffd-d7629163924a",
       entityId: "eb1ced31-eccc-4ad6-a695-5c6c76cab7a5",
       active: false,
       start: "Wed April 01 2018 00:00:00 GMT-0800 (Pacific Standard Time)",
       ended: Thu April 01 2019 00:00:00 GMT-0800 (Pacific Standard Time),
       recurrence: "daily"
    },
]

Its storage is kept up-to-date by means of notification from Service A:

Entity e changes => publish e => Service B updates accordingly

Here are the arguments I see in favor of each option.

Option 1 arguments:

  • Less storage cost since data is not duplicated
  • When a job is scheduled to run it always has the most recent information about whether it should be active (more "consistency"?)
  • Don't have to deal with complexity of syncing data across service. In this example there is only Service B that relies on data from A, but imagine the complexity if there were services X0, ..., X1000 that all needed to know whether an entity is active.

Option 2 arguments:

  • The services are truly independent: If A is not running, B can still run
  • Less chatty services (less network transfer cost)
  • Although perhaps more complex, the complexity of duplicating/propagating data forces the services to share nothing or little

Aucun commentaire:

Enregistrer un commentaire