I think it's a regression, and it's visible only if there are a lot of entities in the uow. When I have e.g. 10000 entities in the uow, it's fast, but when I have 130K+ entities in the uow (10000 graphs of 7 nodes), all marked insertable/new, it starts to show:
Done. Total amount of entities: 138367
Sorting them in queues for work using UoW
Done. Total time taken by sorter: 103749,8583ms
Insert queue contains 138367 elements
Update queue contains 0 elements
This is on 5.6 but it should be the same on 5.5.1. a 1000 graphs:
Done. Total amount of entities: 12151
Sorting them in queues for work using UoW
Done. Total time taken by sorter: 458,4592ms
Insert queue contains 12151 elements
Update queue contains 0 elements
So this clearly takes a nosedive when there are many more entities, and looking at the code that's fairly obvious: the design is flawed as it assumes it's run once, but in the case of a unit of work, it's not, it's ran as many as there are entities in the uow.
This is done by calling uow.ConstructSaveProcessQueues(), which doesn't take into account the batching parameter, as the performance is lost in the sorting based on types. I think if you use a profiler on your code you'll see the same.
Looking into fixing this a.s.a.p. This will require some time however as it requires refactoring of some code. So not sure if I have an updated runtime today.