Slow Update query with custom in-memory cache

Posts   
 
    
Denniseu
User
Posts: 5
Joined: 15-Nov-2018
# Posted on: 15-Nov-2018 12:38:07   

I have a loop that for every iteration currently queries the database a couple of times and may or may not issue an update query. Typically this loop iterates over thousands or tens of thousands of rows of data, and executes rather slowly (finishes in half an hour up to an hour depending on the amount of data, and how much of the data should be updated in the db).

In an effort to optimize this loop I have implemented an in-memory C# Dictionary cache for the LLBL entity objects. This cache reads all data required from the database before the loop, and is used inside the loop for lookups instead of making database queries. The loop still issues Update queries directly to the database.

The problem is that any Update queries are much slower now when doing lookups through the cache rather than continuously fetching data from the database. Before doing an Update for entity X, the code first refetches one particular type of related entities. When I analyzed this problem using SQL Server Profiler I found that the duration of the login/connection used during the refetch was much longer than the actual query time. The query time actually seems rounded down to 0, but then it takes a very long time (several hundred milliseconds up to several seconds - amount of data in the table seems correlated with increased duration) until the connection is closed, reset, and started again for the Update query. Then the Update query is finished quite fast. So there exists something that is taking a lot of time to compute between the refetch and the Update.

After SQL profiling, I added overrides in the LLBL entity to every function starting with On, and logged the calling of each function. For example OnRelatedEntitySet, OnPropertyChanged, etc. I had a theory that there may be some kind of updating of the quite big in-memory cache being done behind the scenes before issuing the Update. However no On function seems to be called an abnormally high number of times. What I saw was that the big time sink happens somewhere after the refetch and before the call to OnBeforeEntitySave.

To recap this is how it looks:

  1. Fill cache with thousands of rows from a table in the database.
  2. Loop starts.
  3. Do unrelated stuff.
  4. Refetch related entities. Finishes quite fast.
  5. SaveEntity is called.
  6. Something happens here, taking a long time.
  7. OnBeforeEntitySave is called, and some other On* functions.
  8. SaveEntity completes.

What could be the cause of this performance hit?

LLBL Framework version is 3.5, targeting an SQL Server DB.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39612
Joined: 17-Aug-2003
# Posted on: 15-Nov-2018 14:20:07   

Please use a .NET profiler like dotTrace or Ants (or even the built-in profiler of visual studio) and check which parts of the code are slow, and then focus on that. It's impossible to say what causes slowness without profiling the .net code first.

It might be things are terribly slow because of linear searches somewhere, but without a .net profiler this is guesswork. The profiler can also give you the overview how much time is spent in the DB (so whether optimizing queries is really beneficial or not), whether there's a lot of time lost in latency (open/close connections etc.), or that e.g. things are simply slow because it iterates over a lot of constructs over and over again.

In general if something is noticeable slow in .NET code, using a profiler like dotTrace will give you insight in what the real bottleneck is in 5 minutes.

Frans Bouma | Lead developer LLBLGen Pro
Denniseu
User
Posts: 5
Joined: 15-Nov-2018
# Posted on: 19-Nov-2018 13:33:41   

Alright so I used dotTrace and it confirms that there exists extra work being done before saving the entity, when using a cache.

Before saving an entity with DataAccessAdapter.SaveEntity, LLBL is calling DetermineActionQueues -> ProduceTopologyOrderedList, which recursively calls ProduceAdjacencyLists many times, which adds up to a significant performance loss. Without using a cache, LLBL only calls ProduceAdjacencyLists once.

Is this behavior needed, and if not, is there any way to disable it?

I attach screen shots of the relevant parts of the two trace files.

Attachments
Filename File size Added on Approval
capture with cache.PNG 205,383 19-Nov-2018 13:37.58 Approved
capture without cache.PNG 23,593 19-Nov-2018 13:38.03 Approved
Walaa avatar
Walaa
Support Team
Posts: 14950
Joined: 21-Aug-2005
# Posted on: 19-Nov-2018 19:35:58   

Cache keeps entities in memory, and apparently the graph keeps growing, but all in all LLBLGen calls don't add up to 1% of the consumed time, as shown on your profiling screenshots. Could you find out what's causing the biggest delays and try to optimize them first?

Denniseu
User
Posts: 5
Joined: 15-Nov-2018
# Posted on: 20-Nov-2018 10:20:02   

The entire optimized function that does the looping only takes up about 0.1% of the total trace time (sorry I didn't show this in the screenshots), so there isn't really much else being done. In the trace I only performed one iteration, but the performance loss is similar when performing multiple iterations. In this case it took about 50% more time (445 vs 291 ms) to execute SavePersonForIntegration when using the cache. This seems to grow the larger the cache is - in another database I have about ~7 times more rows to cache, and the performance loss roughly follows this factor 7 (over 1.5 seconds to perform a single Update).

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39612
Joined: 17-Aug-2003
# Posted on: 20-Nov-2018 13:16:54   

When you fetch 'related entities' they are added to the graph in memory. When you do that enough times the graph in memory grows thus larger and larger, and every time you save an entity in that graph, the save logic by default will recurse over the entire graph. This is the default. If you want to avoid the framework to traverse over the entire graph in memory reachable from the entity you are saving, pass 'false' for 'recurse' in the SaveEntity() overload on the DataAccessAdapter() instance. (it's true by default, hence you see this behavior).

This thus saves only the entity you pass to it, it won't traverse the graph to see if there are other related entities needing to be saved. If that's what should be done (e.g. you have a customer, which references some orders and each other references some order lines, so a 3 level graph/tree) and you just changed the customer, so only the customer has to be persisted, it's enough to pass 'false' for recurse, it will then simply save the passed in customer, if it's changed, and won't traverse the graph reachable from it (so won't visit the orders, and the order lines).

Frans Bouma | Lead developer LLBLGen Pro
Denniseu
User
Posts: 5
Joined: 15-Nov-2018
# Posted on: 20-Nov-2018 14:47:49   

I am aware of the recurse option, but why is it traversing over unrelated entities (other PersonEntity objects in the cache)? Why are they sharing the same graph?

I'll look into setting the recurse option to false, but there may be other entities that should be saved that are actually related (like you said, Person/Customer with related Orders or somesuch). Ideally I'd like these to be traversed and saved, but not any other unrelated Persons.

Walaa avatar
Walaa
Support Team
Posts: 14950
Joined: 21-Aug-2005
# Posted on: 20-Nov-2018 19:04:58   

This cache reads all data required from the database before the loop, and is used inside the loop for lookups instead of making database queries. The loop still issues Update queries directly to the database.

In order not to grow the cache and keep it for the lookups, don't associate entities you fetch with lookups in the cache.

Instead of doing: customer.Country = CacheCountry; Do: customer.CountryID = CacheCountry.ID

Denniseu
User
Posts: 5
Joined: 15-Nov-2018
# Posted on: 21-Nov-2018 09:07:33   

Walaa wrote:

In order not to grow the cache and keep it for the lookups, don't associate entities you fetch with lookups in the cache.

Instead of doing: customer.Country = CacheCountry; Do: customer.CountryID = CacheCountry.ID

Not sure what you mean here, no such assignments are done in the code. And of course the entities I fetch are associated with the cache, they ARE the cache.

I think I may have been unclear on exactly what is going on. The cache is filled with a couple of thousand PersonEntity objects. I've stripped the program down so that nothing else is cached (no other types of entities, well except those prefeteched that are inside the PersonEntity objects). I've also removed the refetch of related entities. Performing a SaveEntity on any of these cached PersonEntities is still extremely slow (at least when the entity is dirty).

I can't see how these entities can be related to each other in a graph. Some of them have foreign keys to other PersonEntities (managers/supervisors), but that should only result in a relation of at most a small handful of other PersonEntities, none of which are dirty... right?

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39612
Joined: 17-Aug-2003
# Posted on: 21-Nov-2018 11:10:29   

Denniseu wrote:

Walaa wrote:

In order not to grow the cache and keep it for the lookups, don't associate entities you fetch with lookups in the cache.

Instead of doing: customer.Country = CacheCountry; Do: customer.CountryID = CacheCountry.ID

Not sure what you mean here, no such assignments are done in the code. And of course the entities I fetch are associated with the cache, they ARE the cache.

I think I may have been unclear on exactly what is going on. The cache is filled with a couple of thousand PersonEntity objects. I've stripped the program down so that nothing else is cached (no other types of entities, well except those prefeteched that are inside the PersonEntity objects). I've also removed the refetch of related entities. Performing a SaveEntity on any of these cached PersonEntities is still extremely slow (at least when the entity is dirty).

Please provide specifics. I can't work with 'extremely slow'. I can persist 1000s of entities in under 200ms to a db on a network. Granted that's with a newer version (you're using a version that's at least 6 years old). Your screenshots don't show how much time is taken by each method, i.e. you call SaveEntity, and how much time is that taking? And how much time is taken by the graph traversal? How many entities are really in this graph? Performance is also about perspective what to expect. Traversing a dense graph of 100,000 entities in memory will take time. The algorithm will do a depth first search over all reachable entities. While this is efficient, it will try to reach all entities. This thus means if you just want to save 1 entity, just don't do a recursive save, it's that simple. I didn't give you that option for nothing simple_smile

Anyway, say you have a cache of order entities. They reference Customer entities. This means both are true:

myOrder.Customer == myCustomer; myCustomer.Orders.Contains(myOrder);

Which means, if the customer instance myCustomer has more than 1 order, and I persist an order referencing this customer, the graph traverser by default will reach all orders in the myCustomer.Orders collection, as it traverses: myOrder.Customer -> myCustomer.Orders

If another entity is related to 'Order' which ties other orders together, e.g. myOrder.Employee==myEmployee, you might get the traverser can reach most orders in the cache.

the time you see in the trace is the time it takes to traverse all entities in the graph. What Walaa suggested is to avoid having entities being associated with other entities in the graph so the graph doesn't grow (if you don't want that, of course).

I can't see how these entities can be related to each other in a graph. Some of them have foreign keys to other PersonEntities (managers/supervisors), but that should only result in a relation of at most a small handful of other PersonEntities, none of which are dirty... right?

It's the traversing, it will reach all entities in the graph that way, simply because it follows relationships, so it can sort the entities in the right order (dependents before dependings) and persist each entity which is indeed dirty.

As you know the one you want to save, just 1, not doing a recursive save makes the code simply avoid the graph traversal so it's much faster in your case: it simply saves the entity and skips the rest.

Question of course is why you need to cache 1000s of entities, if that's only making things slower to begin with...

Frans Bouma | Lead developer LLBLGen Pro