Prefecth Performance

Posts   
 
    
Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 27-Jan-2005 08:57:02   

Frans,

There's an area of my app that I really need to tighten up performance on. Can you comment on the performance implications of the prefetch implementation?

1) I am seeing (via SQL Profiler) that prefeth uses sub-queries for related tables. SELECT * FROM related_table WHERE id IN (SELECT id FROM table). Is this faster than a JOIN?

2) For fetching single entities (not collections) I have come to the conclusion that using prefetch is not a good idea. Essentially for a single entity I end up:

a) SELECT * FROM table b) SELECT * FROM related_table1 WHERE id IN (SELECT id FROM table) c) SELECT * FROM related_table2 WHERE id IN (SELECT id FROM table)

Since I will know the PK ID of the main table after (a) I'd be much better off doing:

a) SELECT * FROM table b) SELECT * FROM related_table1 WHERE id = @ID c) SELECT * FROM related_table2 WHERE id = @ID

by using this method:

adapter.FetchEntityCollection(folderEntity.ChildFolderLinkCollection, folderEntity.GetRelationInfoChildFolderLinkCollection());

Or are you relying on some internal SQL Server caching to cache the results of the SELECT id FROM table query (above)

The other thing is that prefetch doesn't keep the connection open for the prefeches by default... is this a bug? (I'd like it to! simple_smile )

Marcus

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39625
Joined: 17-Aug-2003
# Posted on: 27-Jan-2005 09:44:16   

Marcus wrote:

Frans,

There's an area of my app that I really need to tighten up performance on. Can you comment on the performance implications of the prefetch implementation?

1) I am seeing (via SQL Profiler) that prefeth uses sub-queries for related tables. SELECT * FROM related_table WHERE id IN (SELECT id FROM table). Is this faster than a JOIN?

In most cases it is. The problem with joins is that you get a lot of redundant data. There are cases where joins can be faster, but this has to be specified by the user, which is planned, however these cases is mostly just 2 related entities, i.e. simple graphs.

The redundant data from joins really causes a lot of problems, for example when DISTINCT can't be used due to a blob / text/image field in the resultset. The core then has to filter out duplicates on the client, which is a bit slower than when SqlServer would do it.

2) For fetching single entities (not collections) I have come to the conclusion that using prefetch is not a good idea. Essentially for a single entity I end up: a) SELECT * FROM table b) SELECT * FROM related_table1 WHERE id IN (SELECT id FROM table) c) SELECT * FROM related_table2 WHERE id IN (SELECT id FROM table)

Since I will know the PK ID of the main table after (a) I'd be much better off doing:

a) SELECT * FROM table b) SELECT * FROM related_table1 WHERE id = @ID c) SELECT * FROM related_table2 WHERE id = @ID

by using this method:

adapter.FetchEntityCollection(folderEntity.ChildFolderLinkCollection, folderEntity.GetRelationInfoChildFolderLinkCollection());

Or are you relying on some internal SQL Server caching to cache the results of the SELECT id FROM table query (above)

The subquery can use cached results inside the database, but as the query is bigger in b) than it is in a) it needs to recalculate the execution plan anyway. Prefetch paths aren't always the best way to deal with things. They can help when you have to load a lot of data in a graph, as it saves you from a lot of queries (10 customers, each 10 orders with 5 order lines, gives normally 10 + 1010 + 1010*5 queries, with prefetch paths just 3). However when it's not worth the overhead, you can also execute the queries separately, if the amount of queries would be the same anyway (and with a single entity that's the case).

The other thing is that prefetch doesn't keep the connection open for the prefeches by default... is this a bug? (I'd like it to! simple_smile ) Marcus

No this is not a bug. It has to do some processing in between queries. It therefore closes the connection each time, which is just giving back the connection to the connection pool. For heavy load applications this is best, as every connection has to be fully utilized. If you want it to keep open the connection, set the KeepConnectionOpen flag to true before the prefetchpath fetch. Though getting a connection from the pool takes 20ms tops.

Frans Bouma | Lead developer LLBLGen Pro
Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 27-Jan-2005 09:54:56   

That's plenty for me to work with. Thanks.

Posts: 134
Joined: 04-Mar-2005
# Posted on: 19-Apr-2005 20:15:48   

I know this a old-ish thread but my issue is pretty much the same. I've read the other posts and understand why prefetch paths are implemented using sub queries, but wouldn't it be better to detect the special case of a prefetch on a single entity and generate different SQL?

To continue Marcus' example from above, the following queries are what I'm seeing:

a) SELECT * FROM table where id = @ID b) SELECT * FROM related_table1 WHERE id IN (SELECT id FROM table where id = @ID) c) SELECT * FROM related_table2 WHERE id IN (SELECT id FROM table where id = @ID)

Correct me if I'm wrong, but I don't think there would be any case where this wouldn't be more efficiently generated as:

a) SELECT * FROM table b) SELECT * FROM related_table1 WHERE id = @ID c) SELECT * FROM related_table2 WHERE id = @ID

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39625
Joined: 17-Aug-2003
# Posted on: 20-Apr-2005 10:19:59   

your 1) query isn't the same.

Only in a 2 level, 1:1 relations using prefetch path this can be done. If you have 3 levels, the 3rd level uses the query used in the second level, and IMHO this can't be done as you optimized away the query in the second level.

There are a few things which can be done to optimize some querying in the prefetch paths. For example, if you have a single entity and you just want to load that entity's related entities, there is not much to win with a prefetch path, only some typing.

Frans Bouma | Lead developer LLBLGen Pro
Posts: 134
Joined: 04-Mar-2005
# Posted on: 20-Apr-2005 14:58:50   

flushed typo!

My two sets of queries should have read:

a) SELECT * FROM table where id = @ID b) SELECT * FROM related_table1 WHERE id IN (SELECT id FROM table where id = @ID) c) SELECT * FROM related_table2 WHERE id IN (SELECT id FROM table where id = @ID)

and

a) SELECT * FROM table where id = @ID b) SELECT * FROM related_table1 WHERE id = @ID c) SELECT * FROM related_table2 WHERE id = @ID

I was concentrating on the b and c queries...

Using the prefetch paths is convenient and, as you say, less typing. It would be nice if they were the "best" way to go all (or at least most of) the time rather than making me think about what I'm doing... wink

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39625
Joined: 17-Aug-2003
# Posted on: 20-Apr-2005 15:20:48   

heh simple_smile

Well, with every element that's solved behind the scenes comes a generalization that is not always wanted. A prefetch path can be A way to fetch data efficiently, but it also fetches data in bulk, and if you fetch 100 customers and you only need to view for one customer the orders, it's nto that wise to fetch the customers and the orders, as that will produce more data to be fetched. Just an example how realizing what you're doing can make a big difference wink

This is also true for example with multi-entity filters: sometimes a join (relationcollection) is faster, but there are also situations in which a FieldCompareSetPredicate is faster.

Frans Bouma | Lead developer LLBLGen Pro
Posts: 134
Joined: 04-Mar-2005
# Posted on: 20-Apr-2005 15:53:41   

I guess, reading between the lines, that you're saying there's no way in hell you're going to change the way the prefetch formulates the query so I'm just going to have to think while coding. I'm sure I read somewhere on your site that using LLBL Gen meant I didn't have to think... wink

90% of my queries will be returning one entity and various child and grandchild records for that entity. I will restructure the way I'm doing this to use relations instead of prefetches.

Thanks as always for your replies.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39625
Joined: 17-Aug-2003
# Posted on: 20-Apr-2005 16:19:19   

ChicagoKiwi wrote:

I guess, reading between the lines, that you're saying there's no way in hell you're going to change the way the prefetch formulates the query so I'm just going to have to think while coding. I'm sure I read somewhere on your site that using LLBL Gen meant I didn't have to think... wink

heh simple_smile well, no thinking in areas where thinking shouldn't be applied. And you can proceed and use prefetch paths, why not. simple_smile

Though when it comes to performance, you should consider the fact that when you want performance you probably need to do some extra work, as the 'easy way out' is probably too generic to be the top performer wink

90% of my queries will be returning one entity and various child and grandchild records for that entity. I will restructure the way I'm doing this to use relations instead of prefetches.

For grandchild records, prefetch paths ARE more efficient if you're fetching 3 level graphs.

Frans Bouma | Lead developer LLBLGen Pro
Posts: 134
Joined: 04-Mar-2005
# Posted on: 20-Apr-2005 16:56:29   

Otis wrote:

For grandchild records, prefetch paths ARE more efficient if you're fetching 3 level graphs.

But if I'm fetching the entire grandchild collection (e.g. all the OrderDetails for the Customer) then

SELECT * FROM OrderDetails WHERE OrderID IN (SELECT OrderID FROM Orders WHERE CustomerID = @customerID)

is materially the same as

SELECT OrderDetails.* FROM OrderDetails JOIN Orders on Orders.OrderID = OrderDetails.OrderID WHERE Orders.CustomerID = @customerID

If I went the prefetch path route and I wanted to display the Customer's OrderDetails in a grid then I'd have to collect each Orders' Details into another EntityCollection after the retrieval, right?

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39625
Joined: 17-Aug-2003
# Posted on: 21-Apr-2005 09:45:52   

ChicagoKiwi wrote:

Otis wrote:

For grandchild records, prefetch paths ARE more efficient if you're fetching 3 level graphs.

But if I'm fetching the entire grandchild collection (e.g. all the OrderDetails for the Customer) then

SELECT * FROM OrderDetails WHERE OrderID IN (SELECT OrderID FROM Orders WHERE CustomerID = @customerID)

is materially the same as

SELECT OrderDetails.* FROM OrderDetails JOIN Orders on Orders.OrderID = OrderDetails.OrderID WHERE Orders.CustomerID = @customerID

If I went the prefetch path route and I wanted to display the Customer's OrderDetails in a grid then I'd have to collect each Orders' Details into another EntityCollection after the retrieval, right?

Yes. You then use either one of the queries you specified.

Frans Bouma | Lead developer LLBLGen Pro