HnD | UnitOfWork save optimization, is it ok?, Page 1

UnitOfWork save optimization, is it ok?

Posts

kievBug
User

Posts: 105
Joined: 09-Jan-2009

# Posted on: 13-Nov-2015 15:52:26

Hey guys,

I've been looking into the performance issues our application has and found that UnitOfWork2.Commit is taking more time than is actually spent in the database. It seems most of the time is spent in determining of the sequence of actions.

In our case there are a lot of entities which are connected with each other with different relation types 1:M, M:1 and M:N. Only part of those entities are dirty, and even smaller number is new. So for example the total number of entities is 2000, only 100 of those are dirty and only 5-10 are new.

We try to save changes with UoW multiple times i.e. we process data by small batches, so the process of determining of the sequence is executed multiple times, and every time it is executed on the same set of records i.e. 2000.

Question: we need to determine the sequence only when there is at least one insert of the entity, because only in this case the insert may fail because of the FK violation. In all other cases it is not necessary, right? Is there any way of using this fact to improve the speed of the UoW.Commit?

It seems to me that we can use this fact and determine the sequence only for those cases when there entity depends on the new entity. This would greatly reduce the number of entities to process and thus speed up the process of determining the sequence of actions.

Thanks, Anton

daelmo
Support Team

Posts: 8245
Joined: 28-Nov-2005

# Posted on: 14-Nov-2015 05:23:39

Hi Anton,

How did you get to the conclusion that it's slow or less efficient to use UOW for your specific graph? How did you test that? Is it faster when you don't use it?
What is your LLBLGen version and runtime library version? (http://www.llblgen.com/TinyForum/Messages.aspx?ThreadID=23548)

David Elizondo | LLBLGen Support Team

Otis
LLBLGen Pro Team

Posts: 39614
Joined: 17-Aug-2003

# Posted on: 14-Nov-2015 10:11:34

kievBug wrote:

Question: we need to determine the sequence only when there is at least one insert of the entity, because only in this case the insert may fail because of the FK violation. In all other cases it is not necessary, right? Is there any way of using this fact to improve the speed of the UoW.Commit?

No that's not true, you need to order them in insert and then update, and inside these groups on order. To determine whether a dirty entity is an insert or an update you have to traverse the graph anyway (insert a new customer, update an order with new customer's pk). So we traverse the graph once and determine that in one go. It traverses all reachable entities though, so the # of entities visited is larger than the # of entities attached to the UoW through Save methods.

Though I find it surprising traversing the graph is that slow, as enumerating graph isn't that slow in general in our profiling. Would be interesting to see where the bottleneck is according to your profiling sessions

Frans Bouma | Lead developer LLBLGen Pro

kievBug
User

Posts: 105
Joined: 09-Jan-2009

# Posted on: 18-Nov-2015 15:43:42

Whether it is an insert or update we can find out by looking into IsNew flag of the entity, but do not need a graph for it.

Can you please explain why do we need to do a topology sort when we have only updates of entities?

I agree that in generic case we do need to get the list of all entities which have to be saved when we want to do a recursive save. But once we get this list of entities we need to figure out the sequence of saves only in case when entity(doesn't really matter insert or update) depends on the new entity(insert). So this means that we do need to perform topology sort for smaller set of entities only for relations where new entity is at the end of the relation.

In our specific case, we keep the list of all entities and we know which of those are dirty and add them to UoW - UoW contains all entities which has to be saved, so there is no need in creation of the list of entities. I've also made some changes to the generated code so that the topology sort is provided only with relations which ends with the new entity.

And the results are amazing - features which are processing large amounts of data are performing much faster.

I think you can use the same fact as well and do not perform a topology sort when it is not needed or perform it on a smaller amount of records.

Thanks, Anton

Otis
LLBLGen Pro Team

Posts: 39614
Joined: 17-Aug-2003

# Posted on: 19-Nov-2015 10:46:27

kievBug wrote:

Whether it is an insert or update we can find out by looking into IsNew flag of the entity, but do not need a graph for it.

Can you please explain why do we need to do a topology sort when we have only updates of entities?

Because you don't know which entities are dirty in the graph and whether they're going to be inserted or updated. To find out which entities are going to be processed, it has to traverse the graph. It then gets 2 queues and processes them. Topological sort simply discovers both things: which entities are to be processed and in which order. The order is actually 'free': it's simply the reverse of the order in which the nodes in the graph are visited during the depth-first search.

I agree that in generic case we do need to get the list of all entities which have to be saved when we want to do a recursive save. But once we get this list of entities we need to figure out the sequence of saves only in case when entity(doesn't really matter insert or update) depends on the new entity(insert). So this means that we do need to perform topology sort for smaller set of entities only for relations where new entity is at the end of the relation.

I think you assume we're sorting a lot, but that's not the case Topological sorting is a graph algorithm that works like this: you simply visit every node in the graph using depth-first-search and collect the order in which you visit the nodes. Then you simply reverse that list based on whether A->B means A is depending on B or B is depending on A and you're done.

So the list of all entities to update is only found after traversing the graph. Which is already enough to determine the ordering as well.

In our specific case, we keep the list of all entities and we know which of those are dirty and add them to UoW - UoW contains all entities which has to be saved, so there is no need in creation of the list of entities. I've also made some changes to the generated code so that the topology sort is provided only with relations which ends with the new entity.

And the results are amazing - features which are processing large amounts of data are performing much faster.

I think you can use the same fact as well and do not perform a topology sort when it is not needed or perform it on a smaller amount of records.

Be aware that unless you use the topological algorithm as we've added it, you're effectively making shortcuts in that algorithm. We tried that too in the past and there's always a graph out there that makes it fail. Topological sort is a proven correct algorithm, I'd advise you not to mess with it, you will get burned at some point: cutting corners effectively means you're using a different algorithm and that also means you have to prove that one is correct in all cases you're using it in. Needless to say, that's very hard.

The matter at hand though: performance issues are first and foremost solved by profiling what is actually slow. I couldn't find whether you did profile the code, but I'll assume you did and that you found the graph sorting was the issue here. If you didn't, it might very well be the slowness is somewhere else.

Did you add all entities using AddForSave() and specify false for 'recurse' ? That would make the graph traverser not visit related entities.

Frans Bouma | Lead developer LLBLGen Pro