dimanche 31 décembre 2017

How to store attribute-matching rule objects that match entities as they flow through a system?

I need help figuring out a good way of storing (and later retrieving / matching) dynamic rules that can be used to match objects as they flow through a workflow system.

In this scenario I am building a system that will have a series of steps operate on objects that are posted to it before the objects are finally ready to be stored for later consumption in another API. Those steps need to be configurable by an administrator; the system should allow them to say "if an object passes through with attributes ABC, then perform operation XYZ on it". Here's a very basic idea of the flow of the system:

+-------+  +-----------+ +-----------+ +-----------+ +-----------+ +-------+
|       |  |           | |           | |           | |           | |       |
|       +--> Ruleset 1 +-> Ruleset 2 +-> Ruleset 3 +->   Object  +->       |
|       |  |           | |           | |           | |  Storage  | |       |
| Input |  +-----+-----+ +------+----+ +-----+-----+ +-----------+ | Final |
|  API  |        ^              ^            ^                     |  API  |
|       |  +-----+--------------+------------+-----+               |       |
|       |  |                 Rule                  |               |       |
|       |  |              Management               |               |       |
+-------+  +---------------------------------------+               +-------+

The matching of objects to configuration rules needs to be attribute based (shown in the example below). That is, the person administering the rules needs to be able to give one or more attribute names, along with the value for that attribute, and the system will need to run that rule any time an object with that attribute name/value combination passes through it. Also, any time a rule is created or edited, we will need to be able to find all already-persisted objects that match the new/edited rule.

For example, consider that the following rules have been configured:

Rule 1 {
   priority: 10,
   $match: {
      type: "gadget",
   },
   $set: {
      priceIncrease: 1.2,
   }
}

Rule 2 {
   priority: 10,
   $match: {
      type: "widget",
   },
   $set: {
      priceIncrease: 1.1,
   }
}

Rule 3 {
   priority: 100,
   $match: {
      type: "widget",
      category: "kitchen",
   },
   $set: {
      priceIncrease: 1.15,
   }
}

Then, the following three entities pass through the system:

Entity 1 {
   type: "widget",
   category: "toys",
   manufacturer: "Acme",
   price: 5,
}

Entity 2 {
   type: "widget",
   category: "kitchen",
   manufacturer: "Acme",
   price: 6,
}

Entity 3 {
   type: "gadget",
   category: "tools",
   manufacturer: "XYZ Co",
   price: 10,
}

As each entity flows through the system, here are the matches:

  1. Entity one (a toy widget) matches only rule two because it is a widget.
  2. Entity two (a kitchen widget) matches rules two and three because it is a kitchen widget.
  3. Entity three (a tool gadget) matches rule one because it is a gadget.

I'm not too concerned about how to store the actual operations that need to take place - you can ignore that part; operations stored in the data store will be tied to code that knows how to understand the data structure of the operation once the rules are matched.

The main concern is: what kind of data store to use, and what kind of data model to use, to store the rules and entities so that I can:

  1. Find rules that match an entity as entities pass through the system, and
  2. Find entities (from the already-persisted entities) that match a rule when a rule is changed or added to the system.

Since we use AWS for everything we do, and try to do as much as possible serverless, it would be a great benefit if the data store could be something that AWS offers as a managed service. For example, are there existing patterns for doing this with:

  • Relational databases (Aurora)
  • NoSQL key/value (DynamoDB / Redis)
  • Graph databases (Neptune)

The dataset won't be huge, but won't be trivial either; millions of entities will pass through the system, and there will be hundreds of rules, so no full table scans for either of the types of operation.

Also, what's the correct terminology for a system as described above? Surely there's already design patterns / terms for this, but I'm struggling to find the right ones.

Thanks!

Aucun commentaire:

Enregistrer un commentaire