mercredi 3 juin 2020

Deciding on a pattern for Bulk Writing

I have an in memory state tree backing a custom Store I've written so I can access my statetree like a database. I've added Find, Add, Update, etc. I have built a series of repositories on top of my store to abstract away any data access questions for my Services.

During Initialization I seed a lot of data (millions of entries). I am having an issue where the operations my repository has to handle are expensive for adding a bunch of individual entities but my underlying provider easily supports bulk entities.

So here's an example of a service using the repository

        foreach (var issue in issues)
        {
            foreach (var state in states)
            {
                TransferObject transferObject = new TransferObject();

                TransferObject.high = ...

                var stateworker = WorkerRepository.GetworkersByState(state);

                foreach (var worker in stateworker)
                {
                    var issueDetail = GenerateIssueDetails(issue, worker, transferObject, random);

                    IssueDetailsRepository.Add(issueDetail, worker, state, issue);
                }
            }
        }

and here's the Repository add.

    public void Add(IssueDetail issueDetail, Worker worker, State state, Issue issue)
    {
        Store.Engine.Entities.Add(issueDetail);

        Store.Engine.AddToIndex(issue, workerIssueDetail);
        Store.Engine.AddToIndex(state, workerIssueDetail);
        Store.Engine.AddToIndex(worker, workerIssueDetail);

        var compoundIndex = new List<IHasUUID>() { state, issue };
        Store.Engine.AddToIndex(compoundIndex, workerIssueDetail);

        compoundIndex = new List<IHasUUID>() { worker, issue };
        Store.Engine.AddToIndex(compoundIndex, workerIssueDetail);

    }

Specifically the cost of updating my Indexes each loop iteration during the initialization period is painful. I don't need this data until later so I'd love to create each index with the range of data, which AddToIndex already supports.

It feels like adding bulk operations to my repository is a mistake because I'd have to make it stateful (as I add one by one as I loop). I also don't want to create an inline stateful holder and just pass it into the repository. Is there a good pattern to follow for how to aggregate the information for my repository before it reaches it? I imagine the idea is to use this other stateful object, which in turn would call the repository behind the scenes.

I looked at Decorator, but it seems the Interface contract should be the same, and I imagine I'd need a .Commit() or something. I looked into Unit of Work but someone advised me that's mostly just to guarantee transactions. To me it feels closer to what I want but I'm not certain

Any pattern advice? This is a performance critical application.

Aucun commentaire:

Enregistrer un commentaire