Faster prefetching possible?

Posts   
 
    
uler3161
User
Posts: 8
Joined: 30-Oct-2012
# Posted on: 30-Oct-2012 00:57:39   

Hi,

We're using LLBL v2.6 using Adapter and Sql Server 2005. I wanted to know if anyone has some suggestions on increasing prefetch performance. But first I wanted to explain where I'm at...

I have an object graph that is moderately complex. All tables have indexes. What I noticed is that LLBL will do something like this when prefetching:

select <columns>
from <child>
where <foreignkey> in (select <primarykey> from <parent> where <parentpredicate>)

and you may see:

select <columns>
from <grandchild>
where <foreignkey> in
(
    select <primarykey> from <child> where <foreignkey> in
    ( 
        select <primarykey> from <parent> where <parentpredicate>
    )
)

This will continue as deep as your prefetch goes. Most of the time this works fine, but I noticed if it goes deep enough (I think somewhere around 5 nestings in my case), performance takes a huge dive. Has anyone seen this?

My workaround is to process the prefetches myself and build the graph manually. I fetch the parent entities, then use them to find all the IDs used in my query for the child entities. The child entities are returned by doing an IN clause. Something like this:


select <columns> from <child> where <foreignkey> in (pk1, pk2, pk3...)

This resulted in a massive performance increase when dealing with larger entity collections.

I'm somewhat happy with what I have, but I would still like to tweak it a bit. My bottleneck now seems to be rebuilding the graph. For instance, I have the following kind of situation:

foreach (var child in childEntities)
    parent.ChildEntityCollection.Add(child)

Is there any faster way to do something like this?

Maybe there's not much I can do other than throw hardware at it. I can live with that I suppose. Although I am curious, is there a reason LLBL nests the subqueries to do the prefetch versus doing what I did?

daelmo avatar
daelmo
Support Team
Posts: 8245
Joined: 28-Nov-2005
# Posted on: 30-Oct-2012 06:33:20   

Hi there and welcome to the forums wink The answer to all your questions is **ParameterisedPrefetchPathThreshold **and it's explained at Optimizing PrefetchPaths documentation section.

David Elizondo | LLBLGen Support Team
uler3161
User
Posts: 8
Joined: 30-Oct-2012
# Posted on: 30-Oct-2012 06:46:34   

Thanks for the suggestion. I remember messing around with this parameter and I don't remember it helping.

I read the information at the link you posted and I would like to make sure I understand what it says. It sounds like if there aren't very many parent entities (the number is defined by ParameterisedPrefetchPathThreshold), then it uses an IN clause, else if there are many, it uses a subquery. For example, if ParameterizedPrefetchPathThreshold is set to 10, then should I see this:

For 10 or less parents:

select <columns> from <child> where <foreignkey> in (p1, p2, ... p10)

For more than 10:

select <columns> from <child> where <foreignkey> in
(select <primarykey> from <parent> where <predicate>)

If that's the case, I'm seeing the opposite in terms of performance, but it only manifests itself when the subqueries get nested deep enough. Would that mean I should set ParameterizedPrefetchPathThreshold to an insanely large number?

Walaa avatar
Walaa
Support Team
Posts: 14993
Joined: 21-Aug-2005
# Posted on: 30-Oct-2012 18:06:37   

I read the information at the link you posted and I would like to make sure I understand what it says. It sounds like if there aren't very many parent entities (the number is defined by ParameterisedPrefetchPathThreshold), then it uses an IN clause, else if there are many, it uses a subquery. For example, if ParameterizedPrefetchPathThreshold is set to 10, then should I see this:

For 10 or less parents: Code: select <columns> from <child> where <foreignkey> in (p1, p2, ... p10)

For more than 10: Code: select <columns> from <child> where <foreignkey> in (select <primarykey> from <parent> where <predicate>)

Correct.

If that's the case, I'm seeing the opposite in terms of performance, but it only manifests itself when the subqueries get nested deep enough. Would that mean I should set ParameterizedPrefetchPathThreshold to an insanely large number?

It depends on the target database, normally a number between 50 and 200 would be fine.

uler3161
User
Posts: 8
Joined: 30-Oct-2012
# Posted on: 30-Oct-2012 18:09:52   

Ok, so with what I'm seeing, I really need to force an IN with ids instead of subquery, but I think I have 10000+ parent ids. Does that mean a ParameterizedPrefetchPathThreshold of over 10000?

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39861
Joined: 17-Aug-2003
# Posted on: 30-Oct-2012 18:11:59   

uler3161 wrote:

Ok, so with what I'm seeing, I really need to force an IN with ids instead of subquery, but I think I have 10000+ parent ids. Does that mean a ParameterizedPrefetchPathThreshold of over 10000?

no, as that won't work (db's have a maximum # of parameters per query, e.g. sqlserver's maximum is 2100). So limit the parent set a bit, or consider paging.

Frans Bouma | Lead developer LLBLGen Pro
uler3161
User
Posts: 8
Joined: 30-Oct-2012
# Posted on: 30-Oct-2012 18:28:02   

I could limit the parent set and then just do more parent fetching, but I think that's also going to prove to be poor performing. Given the data I'm working with, if I were to set the threshold to 2100, I would have only be able to pull in a handful of parents at a time if I wanted every prefetch to use a list of IDs. That doesn't work so well when I have > 12000 parents.

My workaround seems to make all this problem go away. It's just that I'd like to be able to build the entities on the client side faster. Any suggestions?

Walaa avatar
Walaa
Support Team
Posts: 14993
Joined: 21-Aug-2005
# Posted on: 30-Oct-2012 18:34:29   

Fetching 10000+ entities with their related entities (i.e. a huge graph) is going to be constly, performance wise.

In general entities should be fetched when you need to perform actions on them (updates/deletes). Otherwise if you want these data for viewing only, you'd better fetch a flat list of fields (TpedList or a Dynamic List).

uler3161
User
Posts: 8
Joined: 30-Oct-2012
# Posted on: 30-Oct-2012 18:37:33   

I am potentially doing a large number of updates/deletes nightly. I can also see having to process a large batch of entities and need the entire graph.

Walaa avatar
Walaa
Support Team
Posts: 14993
Joined: 21-Aug-2005
# Posted on: 30-Oct-2012 19:06:22   

I believe the sub-query approach should be fast enough. You might consult a DBA, in case you need some indexes, on your tables.

Another approach is to fetch all 10000+ parents, then separately fetch batches of the children into another collection and hook them together at the client side.

I'd say you collect batches of parent IDs (500 in a batch for example), and fetch children filtered on an IN predicate for those 500 ...and so on till you finish with the parents.

uler3161
User
Posts: 8
Joined: 30-Oct-2012
# Posted on: 30-Oct-2012 19:13:36   

It's not really a matter of indexes. They all have proper indexes. What I did to test was to run the SQL Profiler to get the query that is taking a long time. I manually executed the subquery to return the list of parent ids. Then I replaced the subquery with this list and it was much faster. I'm not sure why SQL Server isn't doing the same behind the scenes.

The approach you specify is what I'm actually doing, although I'm doing 10,000 parent ID batches instead of 500 (I'm not using parameters, so I get past the 2100 limit). I was just trying to find the most optimal way to "hook them together". I didn't know if there were any tricks that would make it faster. As you can guess by the number of entities I'm working with, I do an awful lot of Add calls on entity collections and I was hoping to make those faster.