mercredi 28 janvier 2015

Performant querying of ACL / Business rules against large database datasets

I am updating a web application which requires more dynamic control of the permissions/access that users have against the domain resources of the application. These rules also need to be easily configurable by the administrator users of the application.


-


THE BACKGROUND, WHAT I HAVE DONE SO FAR:


My first step was to write an ACL Module - I wanted something simple and something type-safe for this which was surprisingly hard to find, so I ended up writing my own implementation.


The ACL Module provides us now with the following:



  • Groups

  • Privileges (e.g. View/Add/Edit)

  • Resources (e.g. Product, Page, File etc)

  • Rules (i.e. Allow/Deny a Privilege for a Group on a Resource)


An example of their usage:



// Setting the ACL rule:
acl.Allow<View>(adminGroup, aProductResource);

// Asserting the ACL rule:
if (acl.IsAllowed<View>(adminGroup, aProductResource))
{
// allowed! :)
}


The Group structure is represented as a Directed Acyclic Graph (i.e. Multiple inheritance, like a family tree for e.g.), and the rules inherit down the group relationships. i.e. a child group inherits the ACL rules from the parent group.


Resources themselves can also have rules directly configured for them (for a group, or for all groups), and they also have a relationship to each other, meaning a child resource inherits the rules from the parent.


Pretty standard sort of ACL stuff I think, but fairly complex when asserting access to a resource as you have to first "merge" all the rules from the parent groups/resources based on your context and then calculate the result.


I keep all the rules loaded in memory for performance reasons, and they are serialisable, allowing for quick saving/loading of them. Mapping the rules to database tables in a normalised fashion seemed nigh impossible.


-


THE PROBLEMS, WHERE I AM AT RIGHT NOW:


Asserting the ACL rules against a single domain resource works perfectly, however problems arise when required to parse over large amounts of data.


The applications I am updating are both "database driven" with fairly large amounts of data in them.


-


PROBLEM 1:


There are instances where users need to be able to browse/page through all the "products" that they have "View" access to. This is problematic for me, as currently to do this there are some stored procedures which do the filtering and returning of the data. To replace this component with my ACL system I will need to load all the data from the database into the application and then parse over the records using the ACL Module to check if users have access to them. This is obviously a very intensive task, and one that I am struggling to find a solution for.


One possible solution for this I have thought of is to actually pre-calculate and cache the ACL checks against each product into a database table which contains a simple Yes/No result for each ACL query. I could then do joins on this table to provide massive dataset querying capabilities that still take the ACL rules into account. This is a lot of background work to be done though every time a "product" is added, and also would require re-calculation every time the rules changed, and could end up becoming a considerably large ruleset in itself especially as more and more groups are added to the ACL system - a record in this table would be required PER group PER resource PER privilege. Ouch.


I hate the idea of having the permissions/rules of the system being represented both with the application itself as well as within stored procedures for large dataset querying. This instantly makes the system quite rigid and brittle for me, nullifying much of my exercise.


QUESTION 1:


Is there any known strategy to allow for this type of ACL Module implementation to have performant querying across large data sets from a database?


I am hoping I am just being completely ignorant to a clever solution to this type of problem. :)


-


PROBLEM 2:


The systems also have an additional layer of business rules which should be used in addition to the ACL rules for restricting access to the data.


For example, "product" can be created in a collaborated fashion where individual users of the system can be invited/assigned to a "product". Once assigned, these users should be able to access a "product". There is no rhyme or reason to this, it could be any user across the sytem, belonging to any Group within the system. So I unfortunately can't switch this out to be based on the ACL Groups without creating an individual Group per user. But there is also a host of other types of conditions like this which exist around "products". Business rule type stuff.


These rules need to be taken into account again when displaying back data to a user. And again this logic is contained in those magically stored procedures contained throughout the system.


I would like to stick with the ACL Module strategy, and I am thinking of building in a Business Rules module that will allow me to specify any other "specific conditions" around the accessibility of a domain resource.


I have a couple of options on how to integrate this type of Business Rules module into the frame:


Option 1 - When querying data the data is first parsed through the ACL Module, and then result is parsed through the Business Rules module. Kind of like a map-reduce scenario. But again, this relies on a pretty heavy per-record processing every time a dataset is queried from the database.


Option 2 - An alternative to this I have thought of, is to introduce Users as a known entity (perhaps via an interface) within the ACL Module, allowing for rules to be configured directly for a user. Using this I could then use the Business Rules module to rather assign the appropriate ACL rules to users based on the state of a product. So when a product is updated, it is always parsed through the Business Rules, which would then work out any ACL rule assignments on a per user basis and grant View access accordingly.


I have a feeling that my Option 2 is probably the better one to go for, because once I sort the performance issues of my first problem then Option 2 shouldn't have much of a performance overhead on the system.


QUESTION 2:


My question for this is, are there any other known solutions to this too, or does Option 2 sound like a plausible and acceptable strategy for this problem?


-


These may be a very broad/vague problem descriptions and questions. I apologise if it is so. Please let me know if this is the wrong forum for these types of question, or if you require any more detail.


Aucun commentaire:

Enregistrer un commentaire