design-patterns: Service B relies on data in Service A: Duplicate data or retrieve on-demand?

jeudi 29 novembre 2018

Service B relies on data in Service A: Duplicate data or retrieve on-demand?

This is a microservice design question which is a simplification of a real-life problem I would like to solve.

Service A has entities which can be active or inactive.

[
    {
       id: "a46e6cc7-97ca-4570-b3f3-2be00ca9dab5",
       name: "foo",
       active: true
    },
    {
       id: "eb1ced31-eccc-4ad6-a695-5c6c76cab7a5",
       name: "bar",
       active: false
    },
    {
       id: "ef332044-9e66-4a0b-91ed-c16a2537e848",
       name: "baz",
       active: true
    }
]

Service B has jobs that are related to Service A's entities and should only run if the entities are active (according to business rule).

Option 1: Service B does not store whether the jobs should run.

[
    {
       id: "39cf3321-34d1-4557-b1c4-ca628c191b92",
       entityId: ""a46e6cc7-97ca-4570-b3f3-2be00ca9dab5",
       start: "Thu Nov 29 2018 08:40:27 GMT-0800 (Pacific Standard Time)",
       ended: null,
       recurrence: "hourly"
    },
    {
       id: "77296d22-564f-4289-8327-f23bceb1d400",
       entityId: "a46e6cc7-97ca-4570-b3f3-2be00ca9dab5",
       start: "Tu Nov 27 2018 15:56:01 GMT-0800 (Pacific Standard Time)",
       ended: null,
       recurrence: "hourly"
    },
    {
       id: "2916a920-13a3-46f6-9ffd-d7629163924a",
       entityId: "eb1ced31-eccc-4ad6-a695-5c6c76cab7a5",
       start: "Wed April 01 2018 00:00:00 GMT-0800 (Pacific Standard Time)",
       ended: Thu April 01 2019 00:00:00 GMT-0800 (Pacific Standard Time),
       recurrence: "daily"
    },
]

When a job is scheduled to run it checks

if Service A has j.entityId = true
   run j

using Service A's API.

Option 2: Service B stores whether the job should run

[
    {
       id: "39cf3321-34d1-4557-b1c4-ca628c191b92",
       entityId: ""a46e6cc7-97ca-4570-b3f3-2be00ca9dab5",
       active: true,
       start: "Thu Nov 29 2018 08:40:27 GMT-0800 (Pacific Standard Time)",
       ended: null,
       recurrence: "hourly"
    },
    {
       id: "77296d22-564f-4289-8327-f23bceb1d400",
       entityId: "a46e6cc7-97ca-4570-b3f3-2be00ca9dab5",
       active: true,
       start: "Tu Nov 27 2018 15:56:01 GMT-0800 (Pacific Standard Time)",
       ended: null,
       recurrence: "hourly"
    },
    {
       id: "2916a920-13a3-46f6-9ffd-d7629163924a",
       entityId: "eb1ced31-eccc-4ad6-a695-5c6c76cab7a5",
       active: false,
       start: "Wed April 01 2018 00:00:00 GMT-0800 (Pacific Standard Time)",
       ended: Thu April 01 2019 00:00:00 GMT-0800 (Pacific Standard Time),
       recurrence: "daily"
    },
]

Its storage is kept up-to-date by means of notification from Service A:

Entity e changes => publish e => Service B updates accordingly

Here are the arguments I see in favor of each option.

Option 1 arguments:

Less storage cost since data is not duplicated
When a job is scheduled to run it always has the most recent information about whether it should be active (more "consistency"?)
Don't have to deal with complexity of syncing data across service. In this example there is only Service B that relies on data from A, but imagine the complexity if there were services X0, ..., X1000 that all needed to know whether an entity is active.

Option 2 arguments:

The services are truly independent: If A is not running, B can still run
Less chatty services (less network transfer cost)
Although perhaps more complex, the complexity of duplicating/propagating data forces the services to share nothing or little

design-patterns

jeudi 29 novembre 2018

Service B relies on data in Service A: Duplicate data or retrieve on-demand?

Aucun commentaire:

Enregistrer un commentaire